-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) #12028
Conversation
…tual image data (issue 11124) This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1] There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here. *Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere. --- [1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
0a7884b
to
28d2ada
Compare
/botio test |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/6e710de98eb75dd/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/8ed659faf6b747c/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/8ed659faf6b747c/output.txt Total script time: 25.65 mins
Image differences available at: http://54.67.70.0:8877/8ed659faf6b747c/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/6e710de98eb75dd/output.txt Total script time: 30.18 mins
Image differences available at: http://54.215.176.217:8877/6e710de98eb75dd/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/1f6a83d5215c7e5/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/1f6a83d5215c7e5/output.txt Total script time: 3.41 mins Published |
Looks good! /botio makeref |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 1 Live output at: http://54.215.176.217:8877/93e3b06355616ab/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/7782167f5927502/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/7782167f5927502/output.txt Total script time: 24.00 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/93e3b06355616ab/output.txt Total script time: 28.18 mins
|
Should not be the end of image data found by looking at /W, /H, /BPC, /CS and /F entries and at image data itself? For example DCT image data ends at EOI marker. For filters where size of data is not known from data itself it should be clear from width, height, bpc and color space how many bytes are needed. And only after that search for EI and only use such "heuristic" as fallback in case pdf.js does not support given filter. |
This should reduce the possibility of accidentally truncating some inline images, while not causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch every single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.
Please note: The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.
Fixes #11124
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause unacceptable performance regressions in e.g. issue #2618.