-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode
, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)
#11186
Conversation
6564eda
to
392fedb
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/4f989b589ee0132/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/af92d9b6878c880/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/4f989b589ee0132/output.txt Total script time: 17.65 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/af92d9b6878c880/output.txt Total script time: 26.13 mins
Image differences available at: http://54.215.176.217:8877/af92d9b6878c880/reftest-analyzer.html#web=eq.log |
392fedb
to
773bab9
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b04b984af1a9320/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/834a1d343d0f2b9/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/b04b984af1a9320/output.txt Total script time: 17.69 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/834a1d343d0f2b9/output.txt Total script time: 26.24 mins
Image differences available at: http://54.215.176.217:8877/834a1d343d0f2b9/reftest-analyzer.html#web=eq.log |
…e`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655) *Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code. However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works. In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
773bab9
to
f5be2d6
Compare
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/4dbb497776e5de6/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/bce2859f211d4d2/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/bce2859f211d4d2/output.txt Total script time: 17.70 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/4dbb497776e5de6/output.txt Total script time: 26.26 mins
Image differences available at: http://54.215.176.217:8877/4dbb497776e5de6/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/11f13c362876d09/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/11f13c362876d09/output.txt Total script time: 1.67 mins Published |
Thank you; it keeps amazing me how many variations of non-compliant PDF files exist... |
/botio makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 1 Live output at: http://54.215.176.217:8877/37bddcd6b97f043/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/dfce98395e7b6fc/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/dfce98395e7b6fc/output.txt Total script time: 16.16 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/37bddcd6b97f043/output.txt Total script time: 24.05 mins
|
thank you for fixing this |
Please note: I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic on top of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.
In the PDF file in the issue the Encoding contains glyphNames of the
Cdd
format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.Fixes #9655.