-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: crashes with tesseract 5.4.0 #1328
Comments
This is a tesseract issue - we will need to wait for them resolve it |
Since the issue is with Tesseract itself, downgrading is the only option at the moment |
The bug is in the the legacy engine.
It's not the only option, unless ypu want Tesseract to use the legacy engine. You can bypass this bug by using a model from the |
@amitdo ocrmypdf uses orientation and script detection (osd.traineddata) which currently only has the legacy option even in tessdata_fast. Your workaround will help people looking to get tesseract 5.4.0 working on OCR (without using any feature that requires page orientation detection) but it's not a full solution. For maintainers looking for a full solution that passes the test suite, unfortunately ocrmypdf with tesseract 5.4.0 is not workable and will have to wait for 5.4.1. |
Yeah, I forgot about OSD. |
I just updated tesseract to version 5.4.1-1 on Arch Linux and the problem is gone. |
For's for me aswell... ArchLinux w/ ocrmypdf 16.3.1-1 + tesseract 5.4.1-1 |
In 16.4.0 we refuse to use tesseract 5.4.0. 5.4.1 with any version works. |
What were you trying to do?
with tesseract 5.4.0 (released 2 days ago) ocrmypdf crashes with
SubprocessOutputError
; tried with multiple pdfs; downgraded to tesseract 5.3.4 and everything is fine again.Where are you installing/running from?
PyPI (pip, poetry, pipx, etc.); see https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=ocrmypdf
OCRmyPDF version
16.3.1
What operating system are you working on?
Linux
Operating system details and version
Archlinux, Kernel 6.9.3-arch1-1
Simple sanity checks
Relevant log output
The text was updated successfully, but these errors were encountered: