Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

Closed
hernick-qc opened this issue Oct 6, 2017 · 4 comments

Comments

@hernick-qc
Copy link

The current version of Dockerfile.tess4 is based on Ubuntu 16.10, but the ppa:alex-p/tesseract-ocr no longer offers tesseract-4 builds for 16.10. However, 17.04 is now supported by the PPA, and simply changing the Dockerfile FROM ubuntu:16.10 to ubuntu:17.04 works great for me.

I'm using OCRmyPDF-tess4 to automatically OCR all documents scanned on our OSA Sharp MFP with PaperCut MF and it works great, the users love it, and the quality of the results are better than with tess3. Only downside, it takes nearly half an hour to OCR a 100 page document on a Xeon E3-1240 V2.

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Oct 8, 2017 via email

jbarlow83 pushed a commit that referenced this issue Oct 8, 2017
Due to 16.10 PPAs no longer being generated by alex-p
@jbarlow83
Copy link
Collaborator

jbarlow83 commented Oct 9, 2017

My fix to this issue is blocked by tesseract-ocr/tesseract#1167

That is, because of this segfault ocrmypdf's test suite will not pass, and so the Docker images will not generated (and wouldn't work anyway).

@jbarlow83
Copy link
Collaborator

The issue above also describes the workaround, which is to replace the installed Tesseract 4's tessdata/eng.traineddata with the version from tesseract-ocr/tessdata/eng.traineddata, and for any other language of interest.

@jbarlow83
Copy link
Collaborator

Workaround added to v5.4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants