Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure training #323

Closed
ccampisano opened this issue Dec 16, 2022 · 19 comments
Closed

failure training #323

ccampisano opened this issue Dec 16, 2022 · 19 comments

Comments

@ccampisano
Copy link

Hi,
I was able to build tesseract from git and run tesstrain script, but the latter failed this way:

corrado@debian:~/tesstrain$ make training MODEL_NAME=cdi
set -x; \
tesseract "data/cdi-ground-truth/12-174.png" data/cdi-ground-truth/12-174 --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/12-174.png data/cdi-ground-truth/12-174 --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/06-corrado.png" -t "data/cdi-ground-truth/06-corrado.gt.txt" > "data/cdi-ground-truth/06-corrado.box"
set -x; \
tesseract "data/cdi-ground-truth/06-corrado.png" data/cdi-ground-truth/06-corrado --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/06-corrado.png data/cdi-ground-truth/06-corrado --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/04-santa-marinella.png" -t "data/cdi-ground-truth/04-santa-marinella.gt.txt" > "data/cdi-ground-truth/04-santa-marinella.box"
set -x; \
tesseract "data/cdi-ground-truth/04-santa-marinella.png" data/cdi-ground-truth/04-santa-marinella --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/04-santa-marinella.png data/cdi-ground-truth/04-santa-marinella --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/14-emissione.png" -t "data/cdi-ground-truth/14-emissione.gt.txt" > "data/cdi-ground-truth/14-emissione.box"
set -x; \
tesseract "data/cdi-ground-truth/14-emissione.png" data/cdi-ground-truth/14-emissione --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/14-emissione.png data/cdi-ground-truth/14-emissione --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/08-ca.png" -t "data/cdi-ground-truth/08-ca.gt.txt" > "data/cdi-ground-truth/08-ca.box"
set -x; \
tesseract "data/cdi-ground-truth/08-ca.png" data/cdi-ground-truth/08-ca --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/08-ca.png data/cdi-ground-truth/08-ca --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/10-fd.png" -t "data/cdi-ground-truth/10-fd.gt.txt" > "data/cdi-ground-truth/10-fd.box"
set -x; \
tesseract "data/cdi-ground-truth/10-fd.png" data/cdi-ground-truth/10-fd --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/10-fd.png data/cdi-ground-truth/10-fd --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/13-ita.png" -t "data/cdi-ground-truth/13-ita.gt.txt" > "data/cdi-ground-truth/13-ita.box"
set -x; \
tesseract "data/cdi-ground-truth/13-ita.png" data/cdi-ground-truth/13-ita --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/13-ita.png data/cdi-ground-truth/13-ita --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/11-m.png" -t "data/cdi-ground-truth/11-m.gt.txt" > "data/cdi-ground-truth/11-m.box"
set -x; \
tesseract "data/cdi-ground-truth/11-m.png" data/cdi-ground-truth/11-m --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/11-m.png data/cdi-ground-truth/11-m --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/15-scadenza.png" -t "data/cdi-ground-truth/15-scadenza.gt.txt" > "data/cdi-ground-truth/15-scadenza.box"
set -x; \
tesseract "data/cdi-ground-truth/15-scadenza.png" data/cdi-ground-truth/15-scadenza --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/15-scadenza.png data/cdi-ground-truth/15-scadenza --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/09-63452.png" -t "data/cdi-ground-truth/09-63452.gt.txt" > "data/cdi-ground-truth/09-63452.box"
set -x; \
tesseract "data/cdi-ground-truth/09-63452.png" data/cdi-ground-truth/09-63452 --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/09-63452.png data/cdi-ground-truth/09-63452 --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/01-repubblica-italiana.png" -t "data/cdi-ground-truth/01-repubblica-italiana.gt.txt" > "data/cdi-ground-truth/01-repubblica-italiana.box"
set -x; \
tesseract "data/cdi-ground-truth/01-repubblica-italiana.png" data/cdi-ground-truth/01-repubblica-italiana --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/01-repubblica-italiana.png data/cdi-ground-truth/01-repubblica-italiana --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/02-ministero-interno.png" -t "data/cdi-ground-truth/02-ministero-interno.gt.txt" > "data/cdi-ground-truth/02-ministero-interno.box"
set -x; \
tesseract "data/cdi-ground-truth/02-ministero-interno.png" data/cdi-ground-truth/02-ministero-interno --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/02-ministero-interno.png data/cdi-ground-truth/02-ministero-interno --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/05-campisano.png" -t "data/cdi-ground-truth/05-campisano.gt.txt" > "data/cdi-ground-truth/05-campisano.box"
set -x; \
tesseract "data/cdi-ground-truth/05-campisano.png" data/cdi-ground-truth/05-campisano --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/05-campisano.png data/cdi-ground-truth/05-campisano --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/07-luogo-data.png" -t "data/cdi-ground-truth/07-luogo-data.gt.txt" > "data/cdi-ground-truth/07-luogo-data.box"
set -x; \
tesseract "data/cdi-ground-truth/07-luogo-data.png" data/cdi-ground-truth/07-luogo-data --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/07-luogo-data.png data/cdi-ground-truth/07-luogo-data --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/03-carta-di-identita.png" -t "data/cdi-ground-truth/03-carta-di-identita.gt.txt" > "data/cdi-ground-truth/03-carta-di-identita.box"
set -x; \
tesseract "data/cdi-ground-truth/03-carta-di-identita.png" data/cdi-ground-truth/03-carta-di-identita --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/03-carta-di-identita.png data/cdi-ground-truth/03-carta-di-identita --psm 13 lstm.train
read_params_file: Can't open lstm.train
python3 shuffle.py 0 "data/cdi/all-lstmf"
+ head -n 13 data/cdi/all-lstmf
+ tail -n 2 data/cdi/all-lstmf
combine_lang_model \
  --input_unicharset data/cdi/unicharset \
  --script_dir data/langdata \
  --numbers data/cdi/cdi.numbers \
  --puncs data/cdi/cdi.punc \
  --words data/cdi/cdi.wordlist \
  --output_dir data \
   \
  --lang cdi
Failed to read data from: data/cdi/cdi.wordlist
Failed to read data from: data/cdi/cdi.punc
Failed to read data from: data/cdi/cdi.numbers
Loaded unicharset of size 32 from file data/cdi/unicharset
Setting unichar properties
Other case c of C is not in unicharset
Other case o of O is not in unicharset
Other case r of R is not in unicharset
Other case a of A is not in unicharset
Other case d of D is not in unicharset
Other case s of S is not in unicharset
Other case n of N is not in unicharset
Other case t of T is not in unicharset
Other case m of M is not in unicharset
Other case i of I is not in unicharset
Other case e of E is not in unicharset
Other case l of L is not in unicharset
Other case f of F is not in unicharset
Other case p of P is not in unicharset
Other case u of U is not in unicharset
Other case b of B is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: data/langdata/cdi/cdi.config
Null char=2
lstmtraining \
  --debug_interval 0 \
  --traineddata data/cdi/cdi.traineddata \
  --learning_rate 0.002 \
  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c`head -n1 data/cdi/unicharset`]" \
  --model_output data/cdi/checkpoints/cdi \
  --train_listfile data/cdi/list.train \
  --eval_listfile data/cdi/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Warning: given outputs 32 not equal to unicharset of 31.
Num outputs,weights in Series:
  1,36,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  TxyLfys48:48, 12480
  Lfx96:96, 55680
  RxLrx96:96, 74112
  Lfx192:192, 221952
  Fc31:31, 5983
Total weights = 370367
Built network:[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys48Lfx96RxLrx96Lfx192Fc31] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c32]
Training parameters:
  Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5
null char=30
Deserialize header failed: data/cdi-ground-truth/12-174.lstmf
Deserialize header failed: data/cdi-ground-truth/13-ita.lstmf
Deserialize header failed: data/cdi-ground-truth/08-ca.lstmf
Deserialize header failed: data/cdi-ground-truth/03-carta-di-identita.lstmf
Deserialize header failed: data/cdi-ground-truth/07-luogo-data.lstmf
Deserialize header failed: data/cdi-ground-truth/11-m.lstmf
Deserialize header failed: data/cdi-ground-truth/09-63452.lstmf
Deserialize header failed: data/cdi-ground-truth/01-repubblica-italiana.lstmf
Deserialize header failed: data/cdi-ground-truth/10-fd.lstmf
Load of page 0 failed!
Load of images failed!!
make: *** [Makefile:326: data/cdi/checkpoints/cdi_checkpoint] Error 1

any hints?

thx and rgrds,
corrado

@Shawnsdaddy
Copy link

Having the same issue

@zdenop
Copy link
Contributor

zdenop commented Dec 17, 2022

Please provide the test case (all files) to reproduce the problem.

@ccampisano
Copy link
Author

@zdenop here's the training material

thx and regards,
corrado
cdi-ground-truth.zip

@zdenop
Copy link
Contributor

zdenop commented Dec 20, 2022

Please post also each steps (commands you run) what you did for reproducing problem.

@ccampisano
Copy link
Author

the only command I ran was "make training MODEL_NAME=cdi"

@sven-nm
Copy link

sven-nm commented Dec 21, 2022

Having exactly the same issue here since reinstalling tesseract, despite lstm.train being in tessdata_dir/configs

make training MODEL_NAME=test_trained START_MODEL=grc OUTPUT_DIR=/scratch/sven/ocr_exp/models/test/train GROUND_TRUTH_DIR=/scratch/sven/ocr_exp/datasets/test CORES=12 EPOCHS=1 
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
read_params_file: Can't open lstm.train
python3 shuffle.py 0 "/scratch/sven/ocr_exp/models/test/train/all-lstmf"
/bin/bash: line 1: bc: command not found
/bin/bash: line 4: bc: command not found
+ head -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
head: invalid number of lines: ''
+ tail -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
tail: invalid number of lines: ''
make: *** [Makefile:191: /scratch/sven/ocr_exp/models/test/train/list.train] Error 1

@zdenop
Copy link
Contributor

zdenop commented Dec 30, 2022

read_params_file: Can't open lstm.train indicates that there is a problem with the tesseract installation. How did you install tesseract?

bc: command not found indicated that bc utility is not in the path.

@ccampisano
Copy link
Author

read_params_file: Can't open lstm.train indicates that there is a problem with the tesseract installation. How did you install tesseract?

bc: command not found indicated that bc utility is not in the path.

I installed tesseract from the git repo, doing configure, make, etc.

How should I install it?

BTW: "bc" was installed (Already to the newest version 1.07.1-2+b2)

@zdenop
Copy link
Contributor

zdenop commented Jan 3, 2023

@ccampisano 'bc' is issue of @sven-nm who think is has the same problem as you...
please post installation log of tesseract.

@ccampisano
Copy link
Author

@zdenop I didn't record the installation log, but it went fine.
I'll redo and report here asap.

@zdenop
Copy link
Contributor

zdenop commented Jan 4, 2023

See simular issue #325 - please try clean installation (uninstall everything and install from scratch).
First try sample data and if it works, try your data...

@ccampisano
Copy link
Author

@zdenop please find attached installation logs, I followed instructions in the repo's readme.

Notice I had a problem during configure and had to run it with --disable-dependency-tracking

install.log
config.log

Please let me know what to do next, my aim is to be able to create custom traindata.

@zdenop
Copy link
Contributor

zdenop commented Jan 10, 2023

Can you please post output of following commands?
echo $TESSDATA_PREFIX
and
tesseract a b -l c

@ccampisano
Copy link
Author

@zdenop here's the results:

corrado@tesseract:~$ echo $TESSDATA_PREFIX

corrado@tesseract:~$ tesseract a b -l c
Error opening data file /usr/local/share/tessdata/c.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'c'
Tesseract couldn't load any languages!
Could not initialize tesseract.

@zdenop
Copy link
Contributor

zdenop commented Jan 10, 2023

According data you posted you installed tesseract to /usr/local/bin, and tesseract search for its data in subdirectories of /usr/local/share/tessdata/, (lstm.train is installed to /usr/local/share/tessdata/configs)... So tesseract is installed correctly .
Can you please double check if there is no other tesseract instalation (e.g. in /usr/bin )?

Can you now run make training MODEL_NAME=cdi?

@ccampisano
Copy link
Author

ccampisano commented Jan 10, 2023

@zdenop there is no other tesseract installation:

corrado@tesseract:~$ ls /usr/bin/ | grep tess
corrado@tesseract:~$ which tesseract 
/usr/local/bin/tesseract

root@tesseract:~# apt remove tesseract-ocr
Lettura elenco dei pacchetti... Fatto
Generazione albero delle dipendenze... Fatto
Lettura informazioni sullo stato... Fatto   
Il pacchetto "tesseract-ocr" non è installato e quindi non è stato rimosso
0 aggiornati, 0 installati, 0 da rimuovere e 0 non aggiornati.

BTW:

  1. I didn't run make training and sudo make training-install yet, should I? (see here)
  2. should I run make training MODEL_NAME=cdi from the tesseract folder where I worked so far, or in the tesstrain folder?
  3. where to put the training data folder?

thanks
corrado

@zdenop
Copy link
Contributor

zdenop commented Jan 10, 2023

Yes, please run sudo make training-install first.

Maybe please first run training on example data (see e.g. this tutorial - just skip installing tesseract as you already did it manually... )

Also you need to install eng.traineddata and osd.traineddata (make tesseract-langs in tesstrain - see README.)

@ccampisano
Copy link
Author

@zdenop thanks for your support, I was able to run the traininig correctly (and didn't need osd.traineddata).

The trained file was correctly generated, but:

  • it is quite small (3MB compared to 10MB of regular "ita" trained file), maybe because I used a few images (say 20, not even covering the whole alphabet)
  • its performances are very poor, compared to the regular "ita" file

how could I improve this?

@zdenop
Copy link
Contributor

zdenop commented Jan 10, 2023

Congratulation!

'its performances are very poor, compared to the regular "ita" file'

It is in line with documentation. Did you read it? Or did you expected that with 10 minutes training you will get better result than Google with its resources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants