Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF-16 little-endian strings in the stringToPDFString helper function (bug 1593902) #11307

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Nov 5, 2019

The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we actually do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the Title string in the PDF document is using a little-endian UTF-16 BOM instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902

Edit: The PDF spec only mentions big endian as supported (in addition to PDFDocEncoding of course), see https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G6.1957385, hence this is actually a case where a PDF generator is (yet again) creating corrupt/invalid data.

@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/55be147199747ae/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/3e33c69275b4164/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/3e33c69275b4164/output.txt

Total script time: 18.65 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/55be147199747ae/output.txt

Total script time: 26.61 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

…r function (bug 1593902)

The bug report seem to suggest that we don't support UTF-16 strings with a BOM (byte order mark), which we *actually* do as evident by both the code and a unit-test.
The issue at play here is rather that we previously only supported big-endian UTF-16 BOM, and the `Title` string in the PDF document is using a *little-endian* UTF-16 BOM instead.

Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1593902
@Snuffleupagus Snuffleupagus force-pushed the stringToPDFString-little-endian branch from 6c78b5d to 80342e2 Compare November 5, 2019 11:43
@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Nov 5, 2019

Forgot to extend one of the existing unit-tests...

/botio unittest

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Invalid

Command not implemented: unit.

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Windows)


Invalid

Command not implemented: unit.

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/b102711ac88ee97/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/3dbfc51b236a1fd/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/b102711ac88ee97/output.txt

Total script time: 2.66 mins

  • Unit Tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/3dbfc51b236a1fd/output.txt

Total script time: 5.28 mins

  • Unit Tests: Passed

@Snuffleupagus
Copy link
Collaborator Author

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/ed28aaa58caba3d/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Nov 5, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/ed28aaa58caba3d/output.txt

Total script time: 1.69 mins

Published

@timvandermeij timvandermeij merged commit 4e0b020 into mozilla:master Nov 5, 2019
@timvandermeij
Copy link
Contributor

Nice find!

@Snuffleupagus Snuffleupagus deleted the stringToPDFString-little-endian branch November 6, 2019 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants