Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) #12028

Merged
merged 1 commit into from
Jun 26, 2020

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Jun 26, 2020

This should reduce the possibility of accidentally truncating some inline images, while not causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch every single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

Please note: The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

Fixes #11124


[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause unacceptable performance regressions in e.g. issue #2618.

…tual image data (issue 11124)

This should reduce the possibility of accidentally truncating some inline images, while *not* causing the "EI" detection to become significantly slower.[1]
There's obviously a possibility that these added checks are not sufficient to catch *every* single case of "EI" sequences within the actual inline image data, but without specific test-cases I decided against over-engineering the solution here.

*Please note:* The interpolation issues are somewhat orthogonal to the main issue here, which is the truncated image, and it's already tracked elsewhere.

---
[1] I've looked at the issue a few times, and this is the first approach that I was able to come up with that didn't cause *unacceptable* performance regressions in e.g. issue 2618.
@Snuffleupagus Snuffleupagus changed the title Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11142) Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) Jun 26, 2020
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/6e710de98eb75dd/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/8ed659faf6b747c/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/8ed659faf6b747c/output.txt

Total script time: 25.65 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/8ed659faf6b747c/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/6e710de98eb75dd/output.txt

Total script time: 30.18 mins

  • Font tests: Passed
  • Unit tests: FAILED
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/6e710de98eb75dd/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/1f6a83d5215c7e5/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/1f6a83d5215c7e5/output.txt

Total script time: 3.41 mins

Published

@timvandermeij timvandermeij merged commit d8d6a98 into mozilla:master Jun 26, 2020
@timvandermeij
Copy link
Contributor

Looks good!

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 1

Live output at: http://54.215.176.217:8877/93e3b06355616ab/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/7782167f5927502/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/7782167f5927502/output.txt

Total script time: 24.00 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/93e3b06355616ab/output.txt

Total script time: 28.18 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the issue-11124 branch June 27, 2020 08:11
@misos1
Copy link

misos1 commented Jul 5, 2020

Should not be the end of image data found by looking at /W, /H, /BPC, /CS and /F entries and at image data itself? For example DCT image data ends at EOI marker. For filters where size of data is not known from data itself it should be clear from width, height, bpc and color space how many bytes are needed. And only after that search for EI and only use such "heuristic" as fallback in case pdf.js does not support given filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading inline image data is incorrectly implemented [BUG]
4 participants