-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ChunkedStream.makeSubStream
to actually check if (some) data exists when the length
parameter is undefined
#10696
Update ChunkedStream.makeSubStream
to actually check if (some) data exists when the length
parameter is undefined
#10696
Conversation
… exists when the `length` parameter is undefined Note how `XRef.fetchUncompressed`, which is used *a lot* for most PDF documents, is calling the `makeSubStream` method without providing a `length` argument. In practice this results in the `makeSubStream` method, on the `ChunkedStream` instance, calling the `ensureRange` method with `NaN` as the end position, thus resulting in no data being requested despite it possibly being necessary. This may be quite bad, since in this particular case it will lead to a new `ChunkedStream` being created *and* also a new `Parser`/`Lexer` instance. Given that it's quite possible that even the very first `Parser.getObj` call could throw `MissingDataException`, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives). You obviously need to be very careful to not have `ChunkedStream.makeSubStream` accidentally requesting the *entire* file, hence its `this.end` property is of no use here, but it should be possible to at least check that the `start` of the data is present before any potentially expensive parsing occurs.
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/a5831309000cd1e/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/3ddeaa2f625efd8/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/a5831309000cd1e/output.txt Total script time: 18.22 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/3ddeaa2f625efd8/output.txt Total script time: 25.72 mins
|
This is *similar* to the existing check using in `ChunkedStream.ensureRange`.
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/a8ec9ca6b14ee07/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/916b59049e53914/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/a8ec9ca6b14ee07/output.txt Total script time: 18.05 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/916b59049e53914/output.txt Total script time: 25.67 mins
Image differences available at: http://54.215.176.217:8877/916b59049e53914/reftest-analyzer.html#web=eq.log |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/1baf4504b136638/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/1bb16f98e064373/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/1baf4504b136638/output.txt Total script time: 18.09 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/1bb16f98e064373/output.txt Total script time: 25.63 mins
Image differences available at: http://54.215.176.217:8877/1bb16f98e064373/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/dec17c1dbfd8e0b/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/df4b9669ae64cf9/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/df4b9669ae64cf9/output.txt Total script time: 18.11 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/dec17c1dbfd8e0b/output.txt Total script time: 25.72 mins
|
Looking at the |
@@ -98,6 +98,10 @@ class ChunkedStream { | |||
} | |||
|
|||
ensureByte(pos) { | |||
if (pos < this.progressiveDataLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume pos
is 0-indexed, so the <
check will work. However, why are we using <=
then in ensureRange
below? I think the end
is also a 0-indexed position, so it that wrong below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that I've understood MissingDataException
, and related code, is that begin
is inclusive while end
is exclusive, which should thus explain things; note the formatting used in
Lines 32 to 35 in 47f208d
function MissingDataException(begin, end) { | |
this.begin = begin; | |
this.end = end; | |
this.message = `Missing data [${begin}, ${end})`; |
To be absolutely sure though, I suppose that you might want @yurydelendik to weigh in here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurydelendik Sorry to bother you, but any chance that you have time to comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, given that a ensureByte(pos)
call should essentially be equal to a ensureRange(pos, pos + 1)
call, that would mean a if (pos + 1 <= this.progressiveDataLength) {
check in the latter case which is how I arrived at the condition under discussion here.
Thank you! |
Note how
XRef.fetchUncompressed
, which is used a lot for most PDF documents, is calling themakeSubStream
method without providing alength
argument.In practice this results in the
makeSubStream
method, on theChunkedStream
instance, calling theensureRange
method withNaN
as the end position, thus resulting in no data being requested despite it possibly being necessary.This may be quite bad, since in this particular case it will lead to a new
ChunkedStream
being created and also a newParser
/Lexer
instance. Given that it's quite possible that even the very firstParser.getObj
call could throwMissingDataException
, this could thus lead to wasted time/resources (since re-parsing is necessary once the data finally arrives).You obviously need to be very careful to not have
ChunkedStream.makeSubStream
accidentally requesting the entire file, hence itsthis.end
property is of no use here, but it should be possible to at least check that thestart
of the data is present before any potentially expensive parsing occurs.