-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: Words dropped during Devanagari recognition #664
Comments
Another sample, where the whole first line is skipped, in addition to missing words forbes1849devscript.txt
edit: tif file converted to png for uploading. |
Is it related to #633 (comment) ? |
It seems some words are being recognized as 'blanks' - see the following from the debug info - while processing image shown in #664 (comment)
and
|
Closing this and linking to issue #681 |
Using Image linked above https://cloud.githubusercontent.com/assets/5095331/22055988/c65e0f96-dd83-11e6-9f06-bea70dd85be6.png
The correct words are
and
|
Text/words are dropped during Devanagari recognition with --oem 1 option.
It seems to be related to line segmentation / box creation because the same words are also skipped in the box file created by tesseract run with 'makebox' config file.
Please see attached -
image being OCRed,
image showing boxfile skipping the words,
ground-truth file and
OCRed text
OCR evaluation report.
arabic-deva1.txt
arabic-deva1-san.txt
arabic-deva1-san_report.html.txt
The text was updated successfully, but these errors were encountered: