LSTM: Words dropped during Devanagari recognition #664

Shreeshrii · 2017-01-18T08:13:24Z

Text/words are dropped during Devanagari recognition with --oem 1 option.

It seems to be related to line segmentation / box creation because the same words are also skipped in the box file created by tesseract run with 'makebox' config file.

Please see attached -

image being OCRed,
image showing boxfile skipping the words,
ground-truth file and
OCRed text
OCR evaluation report.

arabic-deva1.txt

arabic-deva1-san.txt

arabic-deva1-san_report.html.txt

Shreeshrii · 2017-01-19T06:21:38Z

Another sample, where the whole first line is skipped, in addition to missing words

forbes1849devscript.txt
forbes1849devscript-tif1-hin.txt

image
ground truth file
OCRed text with -l hin

edit: tif file converted to png for uploading.

Shreeshrii · 2017-01-26T09:56:39Z

Is it related to #633 (comment) ?

Shreeshrii · 2017-01-26T10:38:25Z

It seems some words are being recognized as 'blanks' - see the following from the debug info - while processing image shown in #664 (comment)

Processing word with lang hin at:Bounding box=(236,2830)->(1276,2924)
Trying word using lang hin, oem 1
Best choice: accepted=1, adaptable=0, done=1 : Lang result :       : R=50, C=-1, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
pos	NORM	NORM	NORM	NORM	NORM
str	 	 	 	 	 
state:	1 	1 	1 	1 	1 
C	-1.000	-1.000	-1.000	-1.000	-1.000

and

Processing word with lang hin at:Bounding box=(234,2248)->(1969,2326)
Trying word using lang hin, oem 1
Best choice: accepted=1, adaptable=0, done=1 : Lang result : मम : R=0.947715, C=-1.37049, F=1, Perm=8, xht=[0,3.40282e+38], ambig=0
pos	NORM	NORM
str	म	म
state:	1 	1 
C	-0.086	-0.089
Best choice: accepted=1, adaptable=0, done=1 : Lang result :              : R=120, C=-1, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
pos	NORM	NORM	NORM	NORM	NORM	NORM	NORM	NORM	NORM	NORM	NORM	NORM
str	 	 	 	 	 	 	 	 	 	 	 	 
state:	1 	1 	1 	1 	1 	1 	1 	1 	1 	1 	1 	1 
C	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000

Shreeshrii · 2017-05-11T11:14:24Z

Closing this and linking to issue #681

Shreeshrii · 2017-05-16T09:49:30Z

Using Image linked above https://cloud.githubusercontent.com/assets/5095331/22055988/c65e0f96-dd83-11e6-9f06-bea70dd85be6.png

Best choice certainty=-3.09489, space=-0.195364, scaled=-21.6642, final=-21.6642
 : शूण्वन्तु : R=13.0464, C=-3.09489, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
pos     NORM    NORM    NORM    NORM    NORM
str     शू      ण्      व       न्      तु
state:  1       1       1       1       1
C       -3.095  -0.260  -0.219  -0.298  -0.195
Deleting word with certainty -21.6642
 : शूण्वन्तु : R=13.0464, C=-21.6642, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
pos     NORM    NORM    NORM    NORM    NORM
str     शू      ण्      व       न्      तु
state:  1       1       1       1       1
C       -3.095  -0.260  -0.219  -0.298  -0.195
Best choice certainty=-1.30628, space=-0.195364, scaled=-9.14397, final=-9.14397
 : क्रषय: : R=5.64425, C=-1.30628, F=1, Perm=2, xht=[0,3.40282e+38], ambig=0
pos     NORM    NORM    NORM    NORM    NORM
str     क्      र       ष       य       :
state:  1       1       1       1       1
C       -1.306  -0.262  -0.229  -0.195  -0.204

The correct words are

शृण्वन्तु 
श ृ ण् व न् तु

and

ऋषयः 
ऋ ष य ः

Shreeshrii mentioned this issue Jan 26, 2017

LSTM: Words dropped during recognition #681

Closed

Shreeshrii closed this as completed May 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM: Words dropped during Devanagari recognition #664

LSTM: Words dropped during Devanagari recognition #664

Shreeshrii commented Jan 18, 2017 •

edited

Loading

Shreeshrii commented Jan 19, 2017 •

edited

Loading

Shreeshrii commented Jan 26, 2017

Shreeshrii commented Jan 26, 2017

Shreeshrii commented May 11, 2017

Shreeshrii commented May 16, 2017

LSTM: Words dropped during Devanagari recognition #664

LSTM: Words dropped during Devanagari recognition #664

Comments

Shreeshrii commented Jan 18, 2017 • edited Loading

Shreeshrii commented Jan 19, 2017 • edited Loading

Shreeshrii commented Jan 26, 2017

Shreeshrii commented Jan 26, 2017

Shreeshrii commented May 11, 2017

Shreeshrii commented May 16, 2017

Shreeshrii commented Jan 18, 2017 •

edited

Loading

Shreeshrii commented Jan 19, 2017 •

edited

Loading