Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

hernick-qc · 2017-10-06T17:57:33Z

The current version of Dockerfile.tess4 is based on Ubuntu 16.10, but the ppa:alex-p/tesseract-ocr no longer offers tesseract-4 builds for 16.10. However, 17.04 is now supported by the PPA, and simply changing the Dockerfile FROM ubuntu:16.10 to ubuntu:17.04 works great for me.

I'm using OCRmyPDF-tess4 to automatically OCR all documents scanned on our OSA Sharp MFP with PaperCut MF and it works great, the users love it, and the quality of the results are better than with tess3. Only downside, it takes nearly half an hour to OCR a 100 page document on a Xeon E3-1240 V2.

The text was updated successfully, but these errors were encountered:

jbarlow83 · 2017-10-08T01:53:14Z

I’ll make the change. There’s not much I can do about the time to OCR. However you may want to ensure that the Docker container runs with access to all CPUs since some configurations only let it see one.

…

On Fri, Oct 6, 2017 at 11:57 hernick-qc ***@***.***> wrote: The current version of Dockerfile.tess4 is based on Ubuntu 16.10, but the ppa:alex-p/tesseract-ocr no longer offers tesseract-4 builds for 16.10. However, 17.04 is now supported by the PPA, and simply changing the Dockerfile FROM ubuntu:16.10 to ubuntu:17.04 works great for me. I'm using OCRmyPDF-tess4 to automatically OCR all documents scanned on our OSA Sharp MFP with PaperCut MF and it works great, the users love it, and the quality of the results are better than with tess3. Only downside, it takes nearly half an hour to OCR a 100 page document on a Xeon E3-1240 V2. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#191>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABvcM1u6of6S-o-YWie6QwyIC4HF8WbDks5spmoOgaJpZM4Pw1E-> .

Due to 16.10 PPAs no longer being generated by alex-p

jbarlow83 · 2017-10-09T22:22:52Z

My fix to this issue is blocked by tesseract-ocr/tesseract#1167

That is, because of this segfault ocrmypdf's test suite will not pass, and so the Docker images will not generated (and wouldn't work anyway).

jbarlow83 · 2017-10-11T22:40:08Z

The issue above also describes the workaround, which is to replace the installed Tesseract 4's tessdata/eng.traineddata with the version from tesseract-ocr/tessdata/eng.traineddata, and for any other language of interest.

jbarlow83 · 2017-10-13T19:20:24Z

Workaround added to v5.4.1

jbarlow83 pushed a commit that referenced this issue Oct 8, 2017

Use Ubuntu 17.04 instead of 16.10 for Docker image (issue #191)

aed9814

Due to 16.10 PPAs no longer being generated by alex-p

jbarlow83 closed this as completed Oct 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

hernick-qc commented Oct 6, 2017

jbarlow83 commented Oct 8, 2017 via email

jbarlow83 commented Oct 9, 2017 •

edited

Loading

jbarlow83 commented Oct 11, 2017

jbarlow83 commented Oct 13, 2017

Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

Dockerfile.tess4 won't work due to 16.10 no longer being supported by ppa:alex-p/tesseract-ocr #191

Comments

hernick-qc commented Oct 6, 2017

jbarlow83 commented Oct 8, 2017 via email

jbarlow83 commented Oct 9, 2017 • edited Loading

jbarlow83 commented Oct 11, 2017

jbarlow83 commented Oct 13, 2017

jbarlow83 commented Oct 9, 2017 •

edited

Loading