Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the heuristics, in PartialEvaluator._buildSimpleFontToUnicode, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655) #11186

Merged
merged 1 commit into from
Oct 6, 2019

Conversation

Snuffleupagus
Copy link
Collaborator

Please note: I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic on top of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the Cdd format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.

Fixes #9655.

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/4f989b589ee0132/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/af92d9b6878c880/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/4f989b589ee0132/output.txt

Total script time: 17.65 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/af92d9b6878c880/output.txt

Total script time: 26.13 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/af92d9b6878c880/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b04b984af1a9320/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/834a1d343d0f2b9/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/b04b984af1a9320/output.txt

Total script time: 17.69 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/834a1d343d0f2b9/output.txt

Total script time: 26.24 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/834a1d343d0f2b9/reftest-analyzer.html#web=eq.log

…e`, for glyphNames of the Cdd{d}/cdd{d} format (issue 9655)

*Please note:* I've been thinking about possible ways of addressing this issue for a while now, but all of the solutions I came up with became too complicated and thus hurt readability of the code.
However, it occured to me that we're essentially trying to add a heuristic *on top* of another heuristic, and that it shouldn't matter how efficient the code is as long as it works.

In the PDF file in the issue the Encoding contains glyphNames of the `Cdd` format, which our existing heuristics will treat as base 10 values. However, in this particular file they actually contain base 16 values, which we thus attempt to detect and fix such that text-selection works.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/4dbb497776e5de6/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/bce2859f211d4d2/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/bce2859f211d4d2/output.txt

Total script time: 17.70 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/4dbb497776e5de6/output.txt

Total script time: 26.26 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/4dbb497776e5de6/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/11f13c362876d09/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/11f13c362876d09/output.txt

Total script time: 1.67 mins

Published

@timvandermeij timvandermeij merged commit cead77e into mozilla:master Oct 6, 2019
@timvandermeij
Copy link
Contributor

timvandermeij commented Oct 6, 2019

Thank you; it keeps amazing me how many variations of non-compliant PDF files exist...

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/37bddcd6b97f043/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/dfce98395e7b6fc/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/dfce98395e7b6fc/output.txt

Total script time: 16.16 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

pdfjsbot commented Oct 6, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/37bddcd6b97f043/output.txt

Total script time: 24.05 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the issue-9655 branch October 6, 2019 19:38
@kuyeduwu
Copy link

kuyeduwu commented Oct 7, 2019

thank you for fixing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content messed up in the textLayer div
4 participants