Issue 7613, Parse AcroForm Dictionary #8625

dmitryskey · 2017-07-06T16:35:45Z

#7613, Parse AcroForm dictionary and 'DR' appearance and set annotation wizard font properties.

It's related to the section:

Appearances

It seems like the current font code for text widgets is dead as fontRefName is never set...
Parse AcroForm dictionary

There are two commits, first is dealing with fix itself, second updates unit tests

…t properties

timvandermeij · 2017-07-14T11:58:28Z

First of all, thank you for working on this!

The core of the change looks good, but I'm also noticing a lot of other changes that do not appear to be related to the appearances issue. One of those changes is making the factory asynchronous; why is that necessary/helpful? Another one is the xref check in the factory; why do we need that (xref should always be there)? Finally, why do we need the entire AnnotationWorkerTask?

In short, it looks like this patch could be simplified a lot, which will also make reviewing considerably easier. The core of the change appears to be https://github.com/mozilla/pdf.js/pull/8625/files#diff-18ca06e2bd5be4c9132b51f78388b8f0R697; let's keep it mostly at that and avoid other changes that are not directly related to the appearance issue.

dmitryskey · 2017-07-14T13:40:52Z

Hi!

Thank you very much for the code review, I really appreciate it. Let me explain my vision of the problem and the general approach. The annotation field font information is stored as /DA field like '/Helv 0 Tf 0 g', currently it's saved in the defaultAnnotation property as a string and therefore should be parsed further. I definitely didn't want to write a mini-parser of any kind and wanted to re-utilize an existing PartialEvaluator. But it required the following changes

It works asynchronously, so I had to change the way annotation factory was working. Also as a result unit tests were modified
Parser needs a worker. But I really tried to minimize changes and didn't want to change API for annotations property (document.js:319), therefore I preferred to create an internal worker mockup. This can be area of improvement, I just not such familiar with PDF.JS code base, it may be better to change this property to the function with worker parameter, get rid of internal mockup and just use the standard worker

As for the xref - I feel it can be simplified, but it's only way I found to save parsed font info for the annotation layer factory.

timvandermeij · 2017-07-14T14:27:26Z

Thank you for the clarification; it helps to understand your choices. The re-use strategy is good and it is definitely what we need to do here. I'll check if we can simplify the code, e.g., I have a feeling that the worker task may be solved differently to require less code, but I'm not too familiar with that part of the code base myself as well. If others are reading along, any pointers are appreciated :-)

timvandermeij · 2017-07-15T21:45:45Z

/botio-linux preview

pdfjsbot · 2017-07-15T21:45:46Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3152ba1194fa817/output.txt

pdfjsbot · 2017-07-15T21:48:08Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/3152ba1194fa817/output.txt

Total script time: 2.36 mins

Published

…ton types, onblur effects for controls with the same id

dmitryskey · 2018-01-24T15:45:21Z

Hi!

Is there any plans to review and potentially merge these changes? They're used in the site https://www.smartformsondemand.org/ and it seems working fine in production.

timvandermeij · 2018-01-24T22:10:42Z

I think there is some interesting code in this pull request, but at the moment I'm lacking the time to properly go over the changes. Just keep the pull request open so it remains in our review queue and we can look at it. If possible, can we split this functionality into a commit per feature? It looks like a lot of functionality is in here (obtaining the AcroForm dictionary, parsing it, styling the various components) and it's much easier to review and test if there is one commit per functional change. Usually this is done by splitting it up into smaller pull requests, for example a first one that only contains the functionality of obtaining the AcroForm dictionary. After that, a pull request for parsing it can be made, et cetera.

dmitryskey · 2018-01-24T22:58:06Z

Ok, sounds like a plan. It'll take some time to slice it and test every sub-commit, but it seems reasonable.

So let me confirm. I'll start submitting these changes in smaller pull requests and you review and merge the incoming request. Then next part can be created unless we get all functionality added.

timvandermeij · 2018-01-25T21:16:23Z

Yes, let's do it one pull request at a time. Make every pull request contain one piece of complete functionality including tests (unit and/or reference tests) and keep it as simple as possible. This will make it much easier to verify and review and easier to get the functionality merged. Thank you!

escapewindow · 2020-07-18T01:43:08Z

Hi @dmitryskey,

As far as I can tell, 1) we split out initial logic and landed in #9822 , but 2) we haven't ported the rest, and 3) we still want these changes. Is that accurate?

Do you mind if I try to rescue the rest of this PR? I started unbitrotting, but thought I should check before continuing further.

dmitryskey · 2020-07-18T13:12:23Z

Hi @escapewindow,

Yes - we split this PR into two parts - make annotation layer parsing async (which was landed as #9822) and further parsing of the annotations in order to extract font info etc.

The main problem with the rest of the code was that I wasn't able to use the same worker and had to create a new one for the additional parsing. It works perfectly fine in my own branch which I use for the very specific PDF "US I9", but in general we have to eliminate this part because of performance concerns.

As far as I remember there was another PR where author parsed annotation info with regex, but for the arbitrary PDF it might have even bigger performance issues.

Feel free to refactor this part, the most critical part s getting rid of a separate AnnotationWorkerTask. I'd rather start with converting this PR to ES6 first like it was done in other requests, and then try to improve the approach in general.

Feel free to ask any questions and thank you for taking care about this PR, once merged it'll make my branch for I-9 a way easier to maintain. Also I'd rather recommend to use the master branch from https://github.com/dmitryskey/pdf.js, this PR is outdated and I'd rather to create a new one from this repository

root and others added 5 commits July 5, 2017 15:08

mozilla#7613, Parse AcroForm dictionary and set annotation wizard fon…

8fc65e0

…t properties

mozilla#7613, Parse AcroForm dictionary and set annotation wizard fon…

935b60c

…t properties

Merge remote-tracking branch 'upstream/master'

3d1562c

Merge with master

a0db250

Merge with master

836988d

timvandermeij added annotations 4-work-in-progress labels Jul 14, 2017

dmitryskey added 17 commits July 26, 2017 17:53

Merge with the master branch

589a3c1

Fix annotation fonts initialization for pages after the first one

09f5aec

Merge remote-tracking branch 'upstream/master'

b806d5e

Merge remote-tracking branch 'upstream/master'

8f179f5

Additional validation of parameters

92cf274

Merge remote-tracking branch 'upstream/master'

0c169fe

Merge with master

6b7cd74

Merge with upstream/master

bcc9665

Merge with upstream/master

58200df

merge with upstream/master

94a422a

Merge remote-tracking branch 'upstream/master'

94380e0

Merge remote-tracking branch 'upstream/master'

f80e808

Merge remote-tracking branch 'upstream/master'

64c966d

Merge remote-tracking branch 'upstream/master'

7069267

Merge remote-tracking branch 'upstream/master'

a75993f

Merge remote-tracking branch 'upstream/master'

20849aa

Merge remote-tracking branch 'upstream/master'

3ef1098

dmitryskey added 14 commits November 5, 2017 15:58

Merge remote-tracking branch 'upstream/master'

0212ff3

Merge remote-tracking branch 'upstream/master'

710db0a

Merge with upstream/master

8e976a0

Add auto-sizing, more accurate DDL implementation, checkbox/radio but…

e3ae959

…ton types, onblur effects for controls with the same id

Merge with upstream/master

037a25c

Merge remote-tracking branch 'upstream/master'

9f3173a

Change header

78dff6f

Merge with upstream/master

759029d

Code refactoring

264a82c

Merge remote-tracking branch 'upstream/master'

3a0a056

Merge remote-tracking branch 'upstream/master'

c5ac86e

Merge with upstream

1cf339a

Add keyboard search to DDL

edadcb3

Merge remote-tracking branch 'upstream/master'

797f842

Adjust DDL scrolling

24ef8e6

dmitryskey mentioned this pull request Jan 29, 2018

[api-minor] Refactor annotation rendering #9417

Closed

dmitryskey closed this Feb 2, 2018

timvandermeij removed the 4-work-in-progress label Feb 2, 2018

escapewindow mentioned this pull request Jul 29, 2020

WIP - Parse acroform dictionary #12139

Closed

11 tasks

escapewindow mentioned this pull request Aug 13, 2020

wip - add support for defaultAppearance fonts in text areas #12207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 7613, Parse AcroForm Dictionary #8625

Issue 7613, Parse AcroForm Dictionary #8625

dmitryskey commented Jul 6, 2017 •

edited

Loading

timvandermeij commented Jul 14, 2017 •

edited

Loading

dmitryskey commented Jul 14, 2017

timvandermeij commented Jul 14, 2017

timvandermeij commented Jul 15, 2017

pdfjsbot commented Jul 15, 2017

pdfjsbot commented Jul 15, 2017

dmitryskey commented Jan 24, 2018 •

edited

Loading

timvandermeij commented Jan 24, 2018 •

edited

Loading

dmitryskey commented Jan 24, 2018 •

edited

Loading

timvandermeij commented Jan 25, 2018 •

edited

Loading

escapewindow commented Jul 18, 2020

dmitryskey commented Jul 18, 2020 •

edited

Loading

Issue 7613, Parse AcroForm Dictionary #8625

Issue 7613, Parse AcroForm Dictionary #8625

Conversation

dmitryskey commented Jul 6, 2017 • edited Loading

timvandermeij commented Jul 14, 2017 • edited Loading

dmitryskey commented Jul 14, 2017

timvandermeij commented Jul 14, 2017

timvandermeij commented Jul 15, 2017

pdfjsbot commented Jul 15, 2017

From: Bot.io (Linux m4)

Received

pdfjsbot commented Jul 15, 2017

From: Bot.io (Linux m4)

Success

Published

dmitryskey commented Jan 24, 2018 • edited Loading

timvandermeij commented Jan 24, 2018 • edited Loading

dmitryskey commented Jan 24, 2018 • edited Loading

timvandermeij commented Jan 25, 2018 • edited Loading

escapewindow commented Jul 18, 2020

dmitryskey commented Jul 18, 2020 • edited Loading

dmitryskey commented Jul 6, 2017 •

edited

Loading

timvandermeij commented Jul 14, 2017 •

edited

Loading

dmitryskey commented Jan 24, 2018 •

edited

Loading

timvandermeij commented Jan 24, 2018 •

edited

Loading

dmitryskey commented Jan 24, 2018 •

edited

Loading

timvandermeij commented Jan 25, 2018 •

edited

Loading

dmitryskey commented Jul 18, 2020 •

edited

Loading