Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the *full* inline image as the cacheKey in Parser.makeInlineImage (bug 1799927) #15679

Merged
merged 3 commits into from
Nov 10, 2022

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Nov 10, 2022

Please note: This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple computeAdler32 function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the MurmurHash3_64 class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current Parser-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of Stream-instances for already cached inline images.

@Snuffleupagus Snuffleupagus force-pushed the bug-1799927-2 branch 3 times, most recently from 6ad431d to fc778a7 Compare November 10, 2022 16:18
…e` (bug 1799927)

*Please note:* This only fixes the "wrong letter" part of bug 1799927.

It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.
Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent.

One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618.
Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
….hexdigest`

These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
@mozilla mozilla deleted a comment from pdfjsbot Nov 10, 2022
@mozilla mozilla deleted a comment from pdfjsbot Nov 10, 2022
@mozilla mozilla deleted a comment from pdfjsbot Nov 10, 2022
@mozilla mozilla deleted a comment from pdfjsbot Nov 10, 2022
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/537525d0937dfa1/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/162a68c0d8b87c7/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/537525d0937dfa1/output.txt

Total script time: 25.38 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 7
  different first/second rendering: 1

Image differences available at: http://54.241.84.105:8877/537525d0937dfa1/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/162a68c0d8b87c7/output.txt

Total script time: 32.56 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 1

Image differences available at: http://54.193.163.58:8877/162a68c0d8b87c7/reftest-analyzer.html#web=eq.log

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thank you.

@Snuffleupagus Snuffleupagus merged commit 595711b into mozilla:master Nov 10, 2022
@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/74cdfaf50391fdc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 1

Live output at: http://54.193.163.58:8877/6aab8bc241aa125/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/74cdfaf50391fdc/output.txt

Total script time: 21.47 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/6aab8bc241aa125/output.txt

Total script time: 25.53 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus Snuffleupagus deleted the bug-1799927-2 branch November 10, 2022 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants