-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tesseract 4.0 hangs when processing a particular image #2288
Comments
|
The problem also exists with latest code. This might be another example for issue #2196. |
Tesseract hangs in an endless loop here:
Issue #2196 has a different stack, so it looks like we have two issues with images causing an endless loop in the layout detection. |
Yes, endless loop is problem - that is the reason I keep issue open. |
@amitdo : I do not think the main issue is Tesseract's binarization method... It works good in most of cases (see e.g. 2264) - but not it all. I expect if we replace it with something else, we will get similar reports with other kind of images. Anyway patch for automatic selection best of binarization algorithm is welcomed ;-) And of course infinite loop in tesseract should be fixed too. |
Automatic selection would be great, but a first step could be to offer some binarization algorithms, so the user has a choice (command line option or config parameter). |
I'm facing this issue too. Are there any updates or workarounds that I can try, including what @stweil suggested? |
Same here. Ubuntu 18.04, tesseract 4.0.0-beta.1. |
(Details here) |
This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288 (comment) Some dependencies had to be bumped for being compatible with the new Alpine libraries.
This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288 (comment) Some dependencies had to be bumped for being compatible with the new Alpine libraries.
This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288
This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288
@lewislun, was this issue solved for your case with version 4.1.1 or the current code in the master branch? |
This make tesseract 4.1 avaialbe, which fixes some things like infinite processing loops on some documents: tesseract-ocr/tesseract#2288
I am still seeing this on 4.1.1 and png files |
@jcrogel: without image, that can help to find problem you comment is useless. |
@saikalyan9981 Works fine with current code from repo. Time taken is different based on the traineddata file being used.
|
@Shreeshrii Thanks a lot, I'll use v5.0.0. I think the issue is with v4.1.1 |
I just ran tesseract 4.1.1 (output of tesseract --version) on the above image without any issues. |
With the code from #3418, the processing ends after 7 seconds, when Sauvola binarization is used, but the output is garbage. |
Environment
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Current Behavior:
hangs when running the following command:
tesseract failed-image.jpeg output.txt
output message:
Tesseract does not stop nor give any message after that.
other images work fine, i only have trouble processing this particular image.
I have found that the image after processed by tesseract (or leptonica?) is weird, dont know if it is related.
failed-image.jpeg: https://drive.google.com/open?id=1HsgCbtuNpgf_XxzjkekXU9-uuiWDsV0H
tessinput.tif: https://drive.google.com/open?id=1sE8Nn5rykSWPT6PMF3nFSonPMT9y-H61
Expected Behavior:
Tesseract should either give an error message or finish ocr on the image even if the image quality is bad.
The text was updated successfully, but these errors were encountered: