-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the *full* inline image as the cacheKey in Parser.makeInlineImage
(bug 1799927)
#15679
Conversation
6ad431d
to
fc778a7
Compare
…e` (bug 1799927) *Please note:* This only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. Rather than switching to another hashing algorithm, e.g. the `MurmurHash3_64` class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the current `Parser`-instance which means that it's not persistent. One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images.
This helps improve performance for some PDF documents with a huge number of inline images, e.g. the PDF document from issue 2618. Given that we no longer create `Stream`-instances unconditionally, we also don't need `Dict`-instances for cached inline images (since we only access the filter).
….hexdigest` These variables are left-over from the initial implementation, back when `String.prototype.padStart` didn't exist and we thus had to pad manually (using a loop).
fc778a7
to
e8ec6af
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/537525d0937dfa1/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/162a68c0d8b87c7/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/537525d0937dfa1/output.txt Total script time: 25.38 mins
Image differences available at: http://54.241.84.105:8877/537525d0937dfa1/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/162a68c0d8b87c7/output.txt Total script time: 32.56 mins
Image differences available at: http://54.193.163.58:8877/162a68c0d8b87c7/reftest-analyzer.html#web=eq.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you.
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/74cdfaf50391fdc/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @Snuffleupagus received. Current queue size: 1 Live output at: http://54.193.163.58:8877/6aab8bc241aa125/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/74cdfaf50391fdc/output.txt Total script time: 21.47 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/6aab8bc241aa125/output.txt Total script time: 25.53 mins
|
Please note: This only fixes the "wrong letter" part of bug 1799927.
It appears that the simple
computeAdler32
function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly.Rather than switching to another hashing algorithm, e.g. the
MurmurHash3_64
class, we simply cache using a stringified version of the inline image data as the cacheKey to prevent any future collisions. While this will (naturally) lead to slightly higher peak memory usage, it'll however be limited to the currentParser
-instance which means that it's not persistent.One small benefit of these changes is that we can avoid creating lots of
Stream
-instances for already cached inline images.