-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 7613, Parse AcroForm Dictionary #8625
Conversation
First of all, thank you for working on this! The core of the change looks good, but I'm also noticing a lot of other changes that do not appear to be related to the appearances issue. One of those changes is making the factory asynchronous; why is that necessary/helpful? Another one is the In short, it looks like this patch could be simplified a lot, which will also make reviewing considerably easier. The core of the change appears to be https://github.com/mozilla/pdf.js/pull/8625/files#diff-18ca06e2bd5be4c9132b51f78388b8f0R697; let's keep it mostly at that and avoid other changes that are not directly related to the appearance issue. |
Hi! Thank you very much for the code review, I really appreciate it. Let me explain my vision of the problem and the general approach. The annotation field font information is stored as /DA field like '/Helv 0 Tf 0 g', currently it's saved in the defaultAnnotation property as a string and therefore should be parsed further. I definitely didn't want to write a mini-parser of any kind and wanted to re-utilize an existing PartialEvaluator. But it required the following changes
As for the xref - I feel it can be simplified, but it's only way I found to save parsed font info for the annotation layer factory. |
Thank you for the clarification; it helps to understand your choices. The re-use strategy is good and it is definitely what we need to do here. I'll check if we can simplify the code, e.g., I have a feeling that the worker task may be solved differently to require less code, but I'm not too familiar with that part of the code base myself as well. If others are reading along, any pointers are appreciated :-) |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/3152ba1194fa817/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/3152ba1194fa817/output.txt Total script time: 2.36 mins Published |
…ton types, onblur effects for controls with the same id
Hi! Is there any plans to review and potentially merge these changes? They're used in the site https://www.smartformsondemand.org/ and it seems working fine in production. |
I think there is some interesting code in this pull request, but at the moment I'm lacking the time to properly go over the changes. Just keep the pull request open so it remains in our review queue and we can look at it. If possible, can we split this functionality into a commit per feature? It looks like a lot of functionality is in here (obtaining the AcroForm dictionary, parsing it, styling the various components) and it's much easier to review and test if there is one commit per functional change. Usually this is done by splitting it up into smaller pull requests, for example a first one that only contains the functionality of obtaining the AcroForm dictionary. After that, a pull request for parsing it can be made, et cetera. |
Ok, sounds like a plan. It'll take some time to slice it and test every sub-commit, but it seems reasonable. So let me confirm. I'll start submitting these changes in smaller pull requests and you review and merge the incoming request. Then next part can be created unless we get all functionality added. |
Yes, let's do it one pull request at a time. Make every pull request contain one piece of complete functionality including tests (unit and/or reference tests) and keep it as simple as possible. This will make it much easier to verify and review and easier to get the functionality merged. Thank you! |
Hi @dmitryskey, As far as I can tell, 1) we split out initial logic and landed in #9822 , but 2) we haven't ported the rest, and 3) we still want these changes. Is that accurate? Do you mind if I try to rescue the rest of this PR? I started unbitrotting, but thought I should check before continuing further. |
Hi @escapewindow, Yes - we split this PR into two parts - make annotation layer parsing async (which was landed as #9822) and further parsing of the annotations in order to extract font info etc. The main problem with the rest of the code was that I wasn't able to use the same worker and had to create a new one for the additional parsing. It works perfectly fine in my own branch which I use for the very specific PDF "US I9", but in general we have to eliminate this part because of performance concerns. As far as I remember there was another PR where author parsed annotation info with regex, but for the arbitrary PDF it might have even bigger performance issues. Feel free to refactor this part, the most critical part s getting rid of a separate AnnotationWorkerTask. I'd rather start with converting this PR to ES6 first like it was done in other requests, and then try to improve the approach in general. Feel free to ask any questions and thank you for taking care about this PR, once merged it'll make my branch for I-9 a way easier to maintain. Also I'd rather recommend to use the master branch from https://github.com/dmitryskey/pdf.js, this PR is outdated and I'd rather to create a new one from this repository |
#7613, Parse AcroForm dictionary and 'DR' appearance and set annotation wizard font properties.
It's related to the section:
Appearances
There are two commits, first is dealing with fix itself, second updates unit tests