-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local caching of "simple" Graphics State (ExtGState) data in PartialEvaluator.{getOperatorList, getTextContent}
(issue 2813)
#12087
Conversation
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/de2b47a8d00ce69/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/5e608eacd2ef0bd/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/de2b47a8d00ce69/output.txt Total script time: 26.66 mins
Image differences available at: http://54.67.70.0:8877/de2b47a8d00ce69/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/5e608eacd2ef0bd/output.txt Total script time: 31.29 mins
Image differences available at: http://54.215.176.217:8877/5e608eacd2ef0bd/reftest-analyzer.html#web=eq.log |
6ea824e
to
3bc5932
Compare
f2fdc0e
to
7d131e6
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/83903137cb25f06/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/30816e7aa97ed50/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/83903137cb25f06/output.txt Total script time: 26.65 mins
Image differences available at: http://54.67.70.0:8877/83903137cb25f06/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/30816e7aa97ed50/output.txt Total script time: 30.57 mins
Image differences available at: http://54.215.176.217:8877/30816e7aa97ed50/reftest-analyzer.html#web=eq.log |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b82ead3d1bc23ec/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/b82ead3d1bc23ec/output.txt Total script time: 3.36 mins Published |
7d131e6
to
e02b693
Compare
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/637a5dd89842027/output.txt |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/90d6977baa9680d/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/90d6977baa9680d/output.txt Total script time: 26.63 mins
Image differences available at: http://54.67.70.0:8877/90d6977baa9680d/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/637a5dd89842027/output.txt Total script time: 29.76 mins
Image differences available at: http://54.215.176.217:8877/637a5dd89842027/reftest-analyzer.html#web=eq.log |
…tialEvaluator.getOperatorList` (issue 2813) This patch will help pathological cases the most, with issue 2813 being a particularily problematic example. While there's only *four* `/ExtGState` resources, there's a total `29062` of `setGState` operators. Even though parsing of a single `/ExtGState` resource is quite fast, having to re-parse them thousands of times does add up quite significantly. For simplicity we'll only cache "simple" `/ExtGState` resource, since e.g. the general `SMask` case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing). By caching "simple" `/ExtGState` resource, we thus improve performance by: - Not having to fetch/validate/parse the same `/ExtGState` data over and over. - Handling of repeated `setGState` operators becomes *synchronous* during the `OperatorList` building, instead of having to defer to the event-loop/microtask-queue since the `/ExtGState` parsing is done asynchronously. --- Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with `master`) of the document in issue 2813 is *a lot* slower than in the development viewer (making normal benchmarking infeasible). However, testing this manually in the development viewer (using `pdfBug=Stats`) shows a *reduction* of `~10 %` in the rendering time of the PDF document in issue 2813.
…tialEvaluator.getTextContent` It turns out that `getTextContent` suffers from *similar* problems with repeated GStates as `getOperatorList`; please see the previous patch. While only `/ExtGState` resources containing Fonts will actually be *parsed* by `PartialEvaluator.getTextContent`, we're still forced to fetch/validate repeated `/ExtGState` resources even though *most* of them won't affect the textContent (since they mostly contain purely graphical state). With these changes we also no longer need to immediately reset the current text-state when encountering a `setGState` operator, which may thus improve text-selection in some cases.
e02b693
to
981ff41
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/5cb559f3e609ef5/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/39fc5ad6e7d7c5a/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/5cb559f3e609ef5/output.txt Total script time: 26.55 mins
Image differences available at: http://54.67.70.0:8877/5cb559f3e609ef5/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/39fc5ad6e7d7c5a/output.txt Total script time: 30.33 mins
Image differences available at: http://54.215.176.217:8877/39fc5ad6e7d7c5a/reftest-analyzer.html#web=eq.log |
Since this method calls `Dict.get` to fetch data, there could thus be `Error`s thrown in corrupt PDF documents when attempting to resolve an indirect object. To ensure that this won't ever become a problem, we change the method to be `async` such that a rejected Promise would be returned and general OperatorList parsing won't break.
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/18ab602b53471b8/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/18ab602b53471b8/output.txt Total script time: 3.29 mins Published |
Thank you for looking into this! |
Add local caching of "simple" Graphics State (ExtGState) data in
PartialEvaluator.getOperatorList
(issue 2813)This patch will help pathological cases the most, with issue Extrememly slow rendering on this pdf #2813 being a particularily problematic example. While there's only four
/ExtGState
resources, there's a total29062
ofsetGState
operators. Even though parsing of a single/ExtGState
resource is quite fast, having to re-parse them thousands of times does add up quite significantly.For simplicity we'll only cache "simple"
/ExtGState
resource, since e.g. the generalSMask
case cannot be easily cached (without re-factoring other code, which may have undesirable effects on general parsing).By caching "simple"
/ExtGState
resource, we thus improve performance by:/ExtGState
data over and over.setGState
operators becomes synchronous during theOperatorList
building, instead of having to defer to the event-loop/microtask-queue since the/ExtGState
parsing is done asynchronously.Obviously I had intended to include (standard) benchmark results with this patch, but for reasons I don't understand the test run-time (even with
master
) of the document in issue Extrememly slow rendering on this pdf #2813 is a lot slower than in the development viewer (making normal benchmarking infeasible).However, testing this manually in the development viewer (using
pdfBug=Stats
) shows a reduction of~10 %
in the rendering time of the PDF document in issue Extrememly slow rendering on this pdf #2813.Add local caching of non-font Graphics State (ExtGState) data in
PartialEvaluator.getTextContent
It turns out that
getTextContent
suffers from similar problems with repeated GStates asgetOperatorList
; please see the previous patch.While only
/ExtGState
resources containing Fonts will actually be parsed byPartialEvaluator.getTextContent
, we're still forced to fetch/validate repeated/ExtGState
resources even though most of them won't affect the textContent (since they mostly contain purely graphical state).With these changes we also no longer need to immediately reset the current text-state when encountering a
setGState
operator, which may thus improve text-selection in some cases.