-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-factor searching for incomplete objects in XRef.indexObjects
(issue 15803)
#15854
Conversation
…sue 15803) When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work. Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for *multiple* potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the `RegExp.lastIndex` property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibility Note that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/b73f886c5fd725f/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/e4b9e021f810d2e/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/b73f886c5fd725f/output.txt Total script time: 25.63 mins
Image differences available at: http://54.241.84.105:8877/b73f886c5fd725f/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/e4b9e021f810d2e/output.txt Total script time: 32.12 mins
Image differences available at: http://54.193.163.58:8877/e4b9e021f810d2e/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/7bbefea88400115/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.193.163.58:8877/86ce761771a336c/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/7bbefea88400115/output.txt Total script time: 21.39 mins
|
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/86ce761771a336c/output.txt Total script time: 23.02 mins
|
/botio-windows makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/0fdfdcf24433c71/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/0fdfdcf24433c71/output.txt Total script time: 23.47 mins
|
When trying to find incomplete objects, i.e. those missing the "endobj"-string at the end, there's unfortunately a number of possible operators that we need to check for. Otherwise we could miss e.g. the "trailer" at the end of a corrupt PDF document, which is why the referenced document didn't work.
Currently we do all searching on the "raw" bytes of the PDF document, for efficiency, however this doesn't really work when we need to check for multiple potential command-strings. To keep the complexity manageable we'll instead use regular expressions here, but we can at least avoid creating lots of substrings thanks to the
RegExp.lastIndex
property; which is well supported across browsers according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex#browser_compatibilityNote that this repeated regular expression usage could perhaps be slightly less efficient than the old code, however this method is only invoked for corrupt PDF documents.