Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-factor searching for incomplete objects in XRef.indexObjects (issue 15803) #15854

Merged
merged 1 commit into from
Dec 24, 2022

Conversation

Snuffleupagus
Copy link
Collaborator

When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.

Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for multiple potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the RegExp.lastIndex property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility

Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.

…sue 15803)

When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.

Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility

Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
@Snuffleupagus Snuffleupagus linked an issue Dec 19, 2022 that may be closed by this pull request
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/b73f886c5fd725f/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/e4b9e021f810d2e/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/b73f886c5fd725f/output.txt

Total script time: 25.63 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 10
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/b73f886c5fd725f/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/e4b9e021f810d2e/output.txt

Total script time: 32.12 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 10

Image differences available at: http://54.193.163.58:8877/e4b9e021f810d2e/reftest-analyzer.html#web=eq.log

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@Snuffleupagus Snuffleupagus merged commit 8aed0c3 into mozilla:master Dec 24, 2022
@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/7bbefea88400115/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.193.163.58:8877/86ce761771a336c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/7bbefea88400115/output.txt

Total script time: 21.39 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/86ce761771a336c/output.txt

Total script time: 23.02 mins

  • Lint: Passed
  • Make references: FAILED

@Snuffleupagus
Copy link
Collaborator Author

/botio-windows makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/0fdfdcf24433c71/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/0fdfdcf24433c71/output.txt

Total script time: 23.47 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Example PDF not working
3 participants