Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pdf.ttf with sharp2.ttf, keep name the same #220

Merged
merged 1 commit into from
Feb 12, 2016

Conversation

jbarlow83
Copy link

As discussed at length in issue #182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince. It does
seem to fix Kindle and OS X Preview.

As discussed at length in issue tesseract-ocr#182, the existing pdf.ttf causes difficulties
for certain PDF viewers, in part because the old file had zero advance width.

With testing, sharp2.ttf seems to be the best available compromise, although
it's not perfect and causes some visual difficulties in Evince.  It does
seem to fix Kindle and OS X Preview.
@amitdo
Copy link
Collaborator

amitdo commented Feb 12, 2016

Did someone test that there is no regression in pdf output with Adobe Acrobat
and Chromium?

@jbarlow83
Copy link
Author

@jbarlow83
Copy link
Author

I have checked: Acrobat XI, Google Chrome PDF Viewer, OS X Preview, Safari PDF viewer; all on El Capitan.

@jbarlow83
Copy link
Author

See #182 for other tests people did with sharp2.ttf

@jbreiden
Copy link
Contributor

Chromium and Adobe Reader on Linux are fine. I have reports that Ghostscipt and friends are okay. I should probably double check Android right now.

@jbreiden
Copy link
Contributor

Latest stock Android (Marshmallow) is fine.

@amitdo
Copy link
Collaborator

amitdo commented Feb 12, 2016

Okay, but I think you should also test this with an image that has more than two words...

zdenop added a commit that referenced this pull request Feb 12, 2016
Replace pdf.ttf with sharp2.ttf, keep name the same
@zdenop zdenop merged commit 4393d04 into tesseract-ocr:master Feb 12, 2016
@rossj
Copy link

rossj commented Feb 12, 2016

Using sample-1.pdf from above in the latest pdf.js results in the 2nd word highlighting properly, while the 1st word's highlight is offset. I don't think this is a regression (and it might actually be an improvement) as there were highlighting-offset issues using the previous pdf.ttf with pdf.js. Related to mozilla/pdf.js#6863.

screen shot 2016-02-12 at 8 10 39 am

@jbreiden
Copy link
Contributor

I've been tracking the Firefox offset problem in mozilla/pdf.js#6509 and it looks like it has gotten a little worse with this change. (I take that back; your screenshot is worse but mine looks the same. Maybe you are using a different zoom level in Firefox or something)

zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
Replace pdf.ttf with sharp2.ttf, keep name the same
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
Replace pdf.ttf with sharp2.ttf, keep name the same
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
Replace pdf.ttf with sharp2.ttf, keep name the same
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this pull request Mar 28, 2021
Replace pdf.ttf with sharp2.ttf, keep name the same
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants