-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower Performance in Latest Tesseract #1171
Comments
It's not clear if you're comparing a newer 4.00 to older 4.00 or 4.00 to 3.05. |
Also, do you use the newest traineddata for 4.0? |
Use traineddata files from tessdata_fast repository for speed in
recognition.
ShreeDevi
…____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Oct 17, 2017 at 9:00 PM, Amit D. ***@***.***> wrote:
Also, do you use the newest traineddata?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1171 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o_BlcoI0mWe0dClpPN5puFlZejciks5stMgKgaJpZM4P8Rw1>
.
|
If you use multi-threading try disabling OpenMP. . |
@amitdo actually im comparing the latest (4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 ) with the older version (4.00.00dev-549-g2b854e3 with leptonica 1.74.1) @Shreeshrii "tessdata_fast" is a news to me, i'm already using the official traineddata, but i dont know about this one, can you please give me the link to it?, also i already created a tuned LSTM, can i also combine it with the new tessdata_fast as well? Thank you both |
The latest traineddata files are at https://github.com/tesseract-ocr/tessdata_best and https://github.com/tesseract-ocr/tessdata_fast. But if you want to compare the performance of an older Tesseract 4.00 with the latest version, you will have to use the same traineddata for both, usually from https://github.com/tesseract-ocr/tessdata. I'd disable multithreading for the test (set environment variable |
If you have the data for your finetuning, you can create the 'faster' integer type of traineddata by using |
@Shreeshrii so i assume that if i fine tuned an LSTM file (made by older version tools) it won't combine with the new traineddate? (for example a traineddata from: https://github.com/tesseract-ocr/tessdata_best) @stweil the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ? Thanks for the answers |
|
For training, you have to start with tessdata_best models. You can create
your traineddata in the integer faster format.
You will have to test with your language and data.
…On 18-Oct-2017 7:28 PM, "ibr123" ***@***.***> wrote:
if i wanted to fine tune using the tool
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1171 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_oyx5CIz_10_spwJn3BbM-AvfinFUks5stgQXgaJpZM4P8Rw1>
.
|
if i wanted to fine tune using the tool "lstmtraining" while i'm using the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which are generated by tesstrain.sh)file that are created by older Tesseract version, such as (4.00.00dev-549-g2b854e3) ? |
You can give it a try. There have been significant changes, that break
compatibility between commits since this is development code in alpha stage.
If you get an error, you will have to recreate the lstmf files.
…On 18-Oct-2017 7:34 PM, "ibr123" ***@***.***> wrote:
if i wanted to fine tune using the tool "lstmtraining" while i'm using the
latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which
are generated by tesstrain.sh)file that are created by older Tesseract
version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1171 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1>
.
|
I do not know about the specific commit numbers you refer to. You may want
to check the github history of commits.
…On 18-Oct-2017 7:39 PM, "ShreeDevi Kumar" ***@***.***> wrote:
You can give it a try. There have been significant changes, that break
compatibility between commits since this is development code in alpha stage.
If you get an error, you will have to recreate the lstmf files.
On 18-Oct-2017 7:34 PM, "ibr123" ***@***.***> wrote:
> if i wanted to fine tune using the tool "lstmtraining" while i'm using
> the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files
> (which are generated by tesstrain.sh)file that are created by older
> Tesseract version, such as (4.00.00dev-549-g2b854e3) ?
> meaning are lstmf files compatible between tesseract versions?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#1171 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE2_o6UXEXXc9MEveLBjrtgdNMWPYbLNks5stgVbgaJpZM4P8Rw1>
> .
>
|
thanks |
Hi,
i have installed Tesseract: 4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 and its working fine with the detection and all, but i have noticed that the performance is slower than before (comparing with 5 months ago tesseract, and leptonica 1.74.1).
in the past the time was around 4 or 5 seconds but lately its almost the double, that command that im using is the normal tesseract detection command which is: **tesseract image results -l lang--tessdata-dir ./tessdata --oem 1 ** , so am i missing something or is there some sort of a parameter that i should add after the updates to the tesseract or leptonica? or any other way to enhance the performance speed? (for both single thread case or multi thread case)
Thank you
The text was updated successfully, but these errors were encountered: