Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take the /CIDToGIDMap into account when getting the glyph mapping for CFF fonts (issue 15559) #15563

Merged
merged 1 commit into from
Oct 14, 2022

Conversation

Snuffleupagus
Copy link
Collaborator

Please note: I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).

@Snuffleupagus Snuffleupagus linked an issue Oct 11, 2022 that may be closed by this pull request
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/1f6c4666e962756/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/2f7cca3840a6ca4/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/2f7cca3840a6ca4/output.txt

Total script time: 25.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 6
  different first/second rendering: 2

Image differences available at: http://54.241.84.105:8877/2f7cca3840a6ca4/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/1f6c4666e962756/output.txt

Total script time: 29.42 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 2

Image differences available at: http://54.193.163.58:8877/1f6c4666e962756/reftest-analyzer.html#web=eq.log


if (invCidToGidMap && invCidToGidMap[charCode] !== undefined) {
charCode = invCidToGidMap[charCode];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about that stuff, so I'm trying understand.
The function getGlyphMapping is getting a map CharCode=>GID.
The cmap is a CID<=>CharCode map and invCidToGid map is finally a GID=>CID map so if I understand correctly, the charCode we get at l. 75 from the cmap would be a GID and the resulting charCode at l. 78 would be a CID... sorry it doesn't make sense for me.

Copy link
Collaborator Author

@Snuffleupagus Snuffleupagus Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #15563 (comment), I don't really understand this well enough to know how it should work.
However this patch is the only way that I could come up with that actually fixes the issue, since trying to apply the /CIDToGIDMap elsewhere in this function doesn't actually fix the issue :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have (25573 is the char code for the first ideograph just after FutureABC L3):

  • charsets: 25573 (gid) => 40307 (cid)
  • cidToGidMap: 25573 (cid) => 40307 (gid)
  • charsets: 17688 (gid) => 25573 (cid)

If I remove the /CidToGidMap entry for the font in the pdf then the rendering is incorrect in either Chrome and Acrobat so this map really matters.
A possibility is that the gid in charsets is a detail of implementation and the exposed id is finally the cid from charsets which could be a kind of gid, hence cidToGidMap would map from a cid in the pdf to the kind of gid in the font.
I'm loosing myself when I read the above sentence...
Anyway, your patch could make sense.
I'm not sure if we should apply invCidToGid to cid or to charCode, wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's interesting to read pdium code:
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fpdfapi/font/cpdf_cidfont.cpp#689

and the value from CIDToGIDMap is returned here: https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fpdfapi/font/cpdf_cidfont.cpp#824
and it seems to be the gid and the only way to get the correct final glyph id is to reverse the value from charsets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should really add some comments to explain why we're doing that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm loosing myself when I read the above sentence...

Yeah, all of this is really difficult to get your head around. Looking through both the PDF and the CFF specifications doesn't really help either.
According to the relevant section in the PDF spec it even sounds like these fonts shouldn't be using a /CIDToGIDMap map (but it wouldn't be the first time that the PDF spec is incomplete/incorrect), since the relevant fonts are actually CFF and not TrueType (excerpt below):

This entry may appear only in a Type 2 CIDFont whose associated
TrueType font program is embedded in the PDF file.

I'm not sure if we should apply invCidToGid to cid or to charCode, wdyt ?

Based on quick testing, changing that doesn't seem to work unfortunately.

And we should really add some comments to explain why we're doing that.

I'm really struggling to explain what's happening here, do you have any suggestions as to its wording?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really struggling to explain what's happening here, do you have any suggestions as to its wording?

If only I was able to....
Maybe just copy/paste the example I put in #15563 (comment)
and something like "it seems that the GID in CIDToGIDMap corresponds to the CID in the CFF font".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to add a brief comment, and hopefully anyone curious about further details can reference this issue via blame (adding details from a specific font and glyph to the code doesn't really feel all that much clearer, at least to me).

… CFF fonts (issue 15559)

*Please note:* I don't really know what I'm doing here, however the patch appears to fix the referenced issue when comparing the rendering with Adobe Reader (with the caveat that I don't speak the language in question).
@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/99518b1f4a06daa/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/99518b1f4a06daa/output.txt

Total script time: 2.09 mins

Published

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixing a bug and there are no regressions, so it's likely not that bad.

@Snuffleupagus Snuffleupagus merged commit 15d4d80 into mozilla:master Oct 14, 2022
@Snuffleupagus
Copy link
Collaborator Author

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/e9ead3d2ee0b9f7/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/e9ead3d2ee0b9f7/output.txt

Total script time: 22.07 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@Snuffleupagus
Copy link
Collaborator Author

@calixteman It looks like the Windows bot has stopped responding altogether, can you please take a look?

@Snuffleupagus Snuffleupagus deleted the issue-15559 branch October 15, 2022 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

file rendering error
3 participants