Slower Performance in Latest Tesseract #1171

ibr123 · 2017-10-17T14:43:39Z

Hi,

i have installed Tesseract: 4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 and its working fine with the detection and all, but i have noticed that the performance is slower than before (comparing with 5 months ago tesseract, and leptonica 1.74.1).

in the past the time was around 4 or 5 seconds but lately its almost the double, that command that im using is the normal tesseract detection command which is: **tesseract image results -l lang--tessdata-dir ./tessdata --oem 1 ** , so am i missing something or is there some sort of a parameter that i should add after the updates to the tesseract or leptonica? or any other way to enhance the performance speed? (for both single thread case or multi thread case)

Thank you

amitdo · 2017-10-17T15:14:11Z

Slower Performance in Latest Tesseract

It's not clear if you're comparing a newer 4.00 to older 4.00 or 4.00 to 3.05.

amitdo · 2017-10-17T15:18:15Z

Also, do you use the newest traineddata for 4.0?

Shreeshrii · 2017-10-18T02:27:59Z

Use traineddata files from tessdata_fast repository for speed in recognition. ShreeDevi

…

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Tue, Oct 17, 2017 at 9:00 PM, Amit D. ***@***.***> wrote: Also, do you use the newest traineddata? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1171 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_o_BlcoI0mWe0dClpPN5puFlZejciks5stMgKgaJpZM4P8Rw1> .

amitdo · 2017-10-18T03:46:46Z

or any other way to enhance the performance speed? (for both single thread case or multi thread case)

If you use multi-threading try disabling OpenMP.
OMP_THREAD_LIMIT=1 tesseract in.png out --oem 1

.

ibr123 · 2017-10-18T06:06:00Z

@amitdo actually im comparing the latest (4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 ) with the older version (4.00.00dev-549-g2b854e3 with leptonica 1.74.1)

@Shreeshrii "tessdata_fast" is a news to me, i'm already using the official traineddata, but i dont know about this one, can you please give me the link to it?, also i already created a tuned LSTM, can i also combine it with the new tessdata_fast as well?

Thank you both

stweil · 2017-10-18T06:20:42Z

The latest traineddata files are at https://github.com/tesseract-ocr/tessdata_best and https://github.com/tesseract-ocr/tessdata_fast. But if you want to compare the performance of an older Tesseract 4.00 with the latest version, you will have to use the same traineddata for both, usually from https://github.com/tesseract-ocr/tessdata. I'd disable multithreading for the test (set environment variable OMP_THREAD_LIMIT=1).

Shreeshrii · 2017-10-18T09:02:21Z

Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line

If you have the data for your finetuning, you can create the 'faster' integer type of traineddata by using
convert_to_int with stop_training.

ibr123 · 2017-10-18T11:48:21Z

@Shreeshrii so i assume that if i fine tuned an LSTM file (made by older version tools) it won't combine with the new traineddate? (for example a traineddata from: https://github.com/tesseract-ocr/tessdata_best)
also you mean by "data for your fine tuning" as the following?

and the steps in the link that you have shared are to enhance accuracy, detection speed or both?

@stweil the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

Thanks for the answers

stweil · 2017-10-18T13:43:48Z

the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

tessdata_fast is faster than tessdata_best, yes.
tessdata_best is generally better, but not always. I also noticed cases where tessdata_fast is better. And there are even cases where the old Tesseract gives the best recognition rates of all current tessdata.

Shreeshrii · 2017-10-18T14:01:39Z

For training, you have to start with tessdata_best models. You can create your traineddata in the integer faster format. You will have to test with your language and data.

…

On 18-Oct-2017 7:28 PM, "ibr123" ***@***.***> wrote: if i wanted to fine tune using the tool — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1171 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_oyx5CIz_10_spwJn3BbM-AvfinFUks5stgQXgaJpZM4P8Rw1> .

ibr123 · 2017-10-18T14:03:48Z

if i wanted to fine tune using the tool "lstmtraining" while i'm using the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which are generated by tesstrain.sh)file that are created by older Tesseract version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?

Shreeshrii · 2017-10-18T14:09:30Z

You can give it a try. There have been significant changes, that break compatibility between commits since this is development code in alpha stage. If you get an error, you will have to recreate the lstmf files.

…

On 18-Oct-2017 7:34 PM, "ibr123" ***@***.***> wrote: if i wanted to fine tune using the tool "lstmtraining" while i'm using the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which are generated by tesstrain.sh)file that are created by older Tesseract version, such as (4.00.00dev-549-g2b854e3) ? meaning are lstmf files compatible between tesseract versions? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1171 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1> .

Shreeshrii · 2017-10-18T14:10:45Z

I do not know about the specific commit numbers you refer to. You may want to check the github history of commits.

…

On 18-Oct-2017 7:39 PM, "ShreeDevi Kumar" ***@***.***> wrote: You can give it a try. There have been significant changes, that break compatibility between commits since this is development code in alpha stage. If you get an error, you will have to recreate the lstmf files. On 18-Oct-2017 7:34 PM, "ibr123" ***@***.***> wrote: > if i wanted to fine tune using the tool "lstmtraining" while i'm using > the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files > (which are generated by tesstrain.sh)file that are created by older > Tesseract version, such as (4.00.00dev-549-g2b854e3) ? > meaning are lstmf files compatible between tesseract versions? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#1171 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1> > . >

ibr123 · 2017-10-18T14:14:44Z

thanks

ibr123 closed this as completed Oct 18, 2017

zdenop mentioned this issue Oct 30, 2018

Enabling openmp leads to 10x performance regression on mingw-w64 #2035

Closed

amitdo added the performance label May 14, 2020

FriedRiceWithEggs mentioned this issue Nov 28, 2022

Gosseract is much slower than Pytesseract otiai10/gosseract#250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slower Performance in Latest Tesseract #1171

Slower Performance in Latest Tesseract #1171

ibr123 commented Oct 17, 2017

amitdo commented Oct 17, 2017 •

edited

Loading

amitdo commented Oct 17, 2017 •

edited

Loading

Shreeshrii commented Oct 18, 2017 via email

amitdo commented Oct 18, 2017

ibr123 commented Oct 18, 2017

stweil commented Oct 18, 2017 •

edited

Loading

Shreeshrii commented Oct 18, 2017

ibr123 commented Oct 18, 2017 •

edited

Loading

stweil commented Oct 18, 2017

Shreeshrii commented Oct 18, 2017 via email

ibr123 commented Oct 18, 2017

Shreeshrii commented Oct 18, 2017 via email

Shreeshrii commented Oct 18, 2017 via email

ibr123 commented Oct 18, 2017

Slower Performance in Latest Tesseract #1171

Slower Performance in Latest Tesseract #1171

Comments

ibr123 commented Oct 17, 2017

amitdo commented Oct 17, 2017 • edited Loading

amitdo commented Oct 17, 2017 • edited Loading

Shreeshrii commented Oct 18, 2017 via email

amitdo commented Oct 18, 2017

ibr123 commented Oct 18, 2017

stweil commented Oct 18, 2017 • edited Loading

Shreeshrii commented Oct 18, 2017

ibr123 commented Oct 18, 2017 • edited Loading

stweil commented Oct 18, 2017

Shreeshrii commented Oct 18, 2017 via email

ibr123 commented Oct 18, 2017

Shreeshrii commented Oct 18, 2017 via email

Shreeshrii commented Oct 18, 2017 via email

ibr123 commented Oct 18, 2017

amitdo commented Oct 17, 2017 •

edited

Loading

amitdo commented Oct 17, 2017 •

edited

Loading

stweil commented Oct 18, 2017 •

edited

Loading

ibr123 commented Oct 18, 2017 •

edited

Loading