-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191
Comments
I’ll make the change.
There’s not much I can do about the time to OCR. However you may want to
ensure that the Docker container runs with access to all CPUs since some
configurations only let it see one.
…On Fri, Oct 6, 2017 at 11:57 hernick-qc ***@***.***> wrote:
The current version of Dockerfile.tess4 is based on Ubuntu 16.10, but the
ppa:alex-p/tesseract-ocr no longer offers tesseract-4 builds for 16.10.
However, 17.04 is now supported by the PPA, and simply changing the
Dockerfile FROM ubuntu:16.10 to ubuntu:17.04 works great for me.
I'm using OCRmyPDF-tess4 to automatically OCR all documents scanned on our
OSA Sharp MFP with PaperCut MF and it works great, the users love it, and
the quality of the results are better than with tess3. Only downside, it
takes nearly half an hour to OCR a 100 page document on a Xeon E3-1240 V2.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#191>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvcM1u6of6S-o-YWie6QwyIC4HF8WbDks5spmoOgaJpZM4Pw1E->
.
|
Due to 16.10 PPAs no longer being generated by alex-p
My fix to this issue is blocked by tesseract-ocr/tesseract#1167 That is, because of this segfault ocrmypdf's test suite will not pass, and so the Docker images will not generated (and wouldn't work anyway). |
The issue above also describes the workaround, which is to replace the installed Tesseract 4's |
Workaround added to v5.4.1 |
The current version of
Dockerfile.tess4
is based on Ubuntu 16.10, but theppa:alex-p/tesseract-ocr
no longer offers tesseract-4 builds for 16.10. However, 17.04 is now supported by the PPA, and simply changing the DockerfileFROM ubuntu:16.10
toubuntu:17.04
works great for me.I'm using OCRmyPDF-tess4 to automatically OCR all documents scanned on our OSA Sharp MFP with PaperCut MF and it works great, the users love it, and the quality of the results are better than with tess3. Only downside, it takes nearly half an hour to OCR a 100 page document on a Xeon E3-1240 V2.
The text was updated successfully, but these errors were encountered: