failure training #323

ccampisano · 2022-12-16T11:18:15Z

Hi,
I was able to build tesseract from git and run tesstrain script, but the latter failed this way:

corrado@debian:~/tesstrain$ make training MODEL_NAME=cdi
set -x; \
tesseract "data/cdi-ground-truth/12-174.png" data/cdi-ground-truth/12-174 --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/12-174.png data/cdi-ground-truth/12-174 --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/06-corrado.png" -t "data/cdi-ground-truth/06-corrado.gt.txt" > "data/cdi-ground-truth/06-corrado.box"
set -x; \
tesseract "data/cdi-ground-truth/06-corrado.png" data/cdi-ground-truth/06-corrado --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/06-corrado.png data/cdi-ground-truth/06-corrado --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/04-santa-marinella.png" -t "data/cdi-ground-truth/04-santa-marinella.gt.txt" > "data/cdi-ground-truth/04-santa-marinella.box"
set -x; \
tesseract "data/cdi-ground-truth/04-santa-marinella.png" data/cdi-ground-truth/04-santa-marinella --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/04-santa-marinella.png data/cdi-ground-truth/04-santa-marinella --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/14-emissione.png" -t "data/cdi-ground-truth/14-emissione.gt.txt" > "data/cdi-ground-truth/14-emissione.box"
set -x; \
tesseract "data/cdi-ground-truth/14-emissione.png" data/cdi-ground-truth/14-emissione --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/14-emissione.png data/cdi-ground-truth/14-emissione --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/08-ca.png" -t "data/cdi-ground-truth/08-ca.gt.txt" > "data/cdi-ground-truth/08-ca.box"
set -x; \
tesseract "data/cdi-ground-truth/08-ca.png" data/cdi-ground-truth/08-ca --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/08-ca.png data/cdi-ground-truth/08-ca --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/10-fd.png" -t "data/cdi-ground-truth/10-fd.gt.txt" > "data/cdi-ground-truth/10-fd.box"
set -x; \
tesseract "data/cdi-ground-truth/10-fd.png" data/cdi-ground-truth/10-fd --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/10-fd.png data/cdi-ground-truth/10-fd --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/13-ita.png" -t "data/cdi-ground-truth/13-ita.gt.txt" > "data/cdi-ground-truth/13-ita.box"
set -x; \
tesseract "data/cdi-ground-truth/13-ita.png" data/cdi-ground-truth/13-ita --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/13-ita.png data/cdi-ground-truth/13-ita --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/11-m.png" -t "data/cdi-ground-truth/11-m.gt.txt" > "data/cdi-ground-truth/11-m.box"
set -x; \
tesseract "data/cdi-ground-truth/11-m.png" data/cdi-ground-truth/11-m --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/11-m.png data/cdi-ground-truth/11-m --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/15-scadenza.png" -t "data/cdi-ground-truth/15-scadenza.gt.txt" > "data/cdi-ground-truth/15-scadenza.box"
set -x; \
tesseract "data/cdi-ground-truth/15-scadenza.png" data/cdi-ground-truth/15-scadenza --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/15-scadenza.png data/cdi-ground-truth/15-scadenza --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/09-63452.png" -t "data/cdi-ground-truth/09-63452.gt.txt" > "data/cdi-ground-truth/09-63452.box"
set -x; \
tesseract "data/cdi-ground-truth/09-63452.png" data/cdi-ground-truth/09-63452 --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/09-63452.png data/cdi-ground-truth/09-63452 --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/01-repubblica-italiana.png" -t "data/cdi-ground-truth/01-repubblica-italiana.gt.txt" > "data/cdi-ground-truth/01-repubblica-italiana.box"
set -x; \
tesseract "data/cdi-ground-truth/01-repubblica-italiana.png" data/cdi-ground-truth/01-repubblica-italiana --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/01-repubblica-italiana.png data/cdi-ground-truth/01-repubblica-italiana --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/02-ministero-interno.png" -t "data/cdi-ground-truth/02-ministero-interno.gt.txt" > "data/cdi-ground-truth/02-ministero-interno.box"
set -x; \
tesseract "data/cdi-ground-truth/02-ministero-interno.png" data/cdi-ground-truth/02-ministero-interno --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/02-ministero-interno.png data/cdi-ground-truth/02-ministero-interno --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/05-campisano.png" -t "data/cdi-ground-truth/05-campisano.gt.txt" > "data/cdi-ground-truth/05-campisano.box"
set -x; \
tesseract "data/cdi-ground-truth/05-campisano.png" data/cdi-ground-truth/05-campisano --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/05-campisano.png data/cdi-ground-truth/05-campisano --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/07-luogo-data.png" -t "data/cdi-ground-truth/07-luogo-data.gt.txt" > "data/cdi-ground-truth/07-luogo-data.box"
set -x; \
tesseract "data/cdi-ground-truth/07-luogo-data.png" data/cdi-ground-truth/07-luogo-data --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/07-luogo-data.png data/cdi-ground-truth/07-luogo-data --psm 13 lstm.train
read_params_file: Can't open lstm.train
PYTHONIOENCODING=utf-8 python3 generate_line_box.py -i "data/cdi-ground-truth/03-carta-di-identita.png" -t "data/cdi-ground-truth/03-carta-di-identita.gt.txt" > "data/cdi-ground-truth/03-carta-di-identita.box"
set -x; \
tesseract "data/cdi-ground-truth/03-carta-di-identita.png" data/cdi-ground-truth/03-carta-di-identita --psm 13 lstm.train
+ tesseract data/cdi-ground-truth/03-carta-di-identita.png data/cdi-ground-truth/03-carta-di-identita --psm 13 lstm.train
read_params_file: Can't open lstm.train
python3 shuffle.py 0 "data/cdi/all-lstmf"
+ head -n 13 data/cdi/all-lstmf
+ tail -n 2 data/cdi/all-lstmf
combine_lang_model \
  --input_unicharset data/cdi/unicharset \
  --script_dir data/langdata \
  --numbers data/cdi/cdi.numbers \
  --puncs data/cdi/cdi.punc \
  --words data/cdi/cdi.wordlist \
  --output_dir data \
   \
  --lang cdi
Failed to read data from: data/cdi/cdi.wordlist
Failed to read data from: data/cdi/cdi.punc
Failed to read data from: data/cdi/cdi.numbers
Loaded unicharset of size 32 from file data/cdi/unicharset
Setting unichar properties
Other case c of C is not in unicharset
Other case o of O is not in unicharset
Other case r of R is not in unicharset
Other case a of A is not in unicharset
Other case d of D is not in unicharset
Other case s of S is not in unicharset
Other case n of N is not in unicharset
Other case t of T is not in unicharset
Other case m of M is not in unicharset
Other case i of I is not in unicharset
Other case e of E is not in unicharset
Other case l of L is not in unicharset
Other case f of F is not in unicharset
Other case p of P is not in unicharset
Other case u of U is not in unicharset
Other case b of B is not in unicharset
Setting script properties
Config file is optional, continuing...
Failed to read data from: data/langdata/cdi/cdi.config
Null char=2
lstmtraining \
  --debug_interval 0 \
  --traineddata data/cdi/cdi.traineddata \
  --learning_rate 0.002 \
  --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c`head -n1 data/cdi/unicharset`]" \
  --model_output data/cdi/checkpoints/cdi \
  --train_listfile data/cdi/list.train \
  --eval_listfile data/cdi/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Warning: given outputs 32 not equal to unicharset of 31.
Num outputs,weights in Series:
  1,36,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  TxyLfys48:48, 12480
  Lfx96:96, 55680
  RxLrx96:96, 74112
  Lfx192:192, 221952
  Fc31:31, 5983
Total weights = 370367
Built network:[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys48Lfx96RxLrx96Lfx192Fc31] from request [1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c32]
Training parameters:
  Debug interval = 0, weights = 0.1, learning rate = 0.002, momentum=0.5
null char=30
Deserialize header failed: data/cdi-ground-truth/12-174.lstmf
Deserialize header failed: data/cdi-ground-truth/13-ita.lstmf
Deserialize header failed: data/cdi-ground-truth/08-ca.lstmf
Deserialize header failed: data/cdi-ground-truth/03-carta-di-identita.lstmf
Deserialize header failed: data/cdi-ground-truth/07-luogo-data.lstmf
Deserialize header failed: data/cdi-ground-truth/11-m.lstmf
Deserialize header failed: data/cdi-ground-truth/09-63452.lstmf
Deserialize header failed: data/cdi-ground-truth/01-repubblica-italiana.lstmf
Deserialize header failed: data/cdi-ground-truth/10-fd.lstmf
Load of page 0 failed!
Load of images failed!!
make: *** [Makefile:326: data/cdi/checkpoints/cdi_checkpoint] Error 1

any hints?

thx and rgrds,
corrado

The text was updated successfully, but these errors were encountered:

Shawnsdaddy · 2022-12-16T23:33:27Z

Having the same issue

zdenop · 2022-12-17T08:56:21Z

Please provide the test case (all files) to reproduce the problem.

ccampisano · 2022-12-19T07:58:06Z

@zdenop here's the training material

thx and regards,
corrado
cdi-ground-truth.zip

zdenop · 2022-12-20T16:06:22Z

Please post also each steps (commands you run) what you did for reproducing problem.

ccampisano · 2022-12-20T16:12:59Z

the only command I ran was "make training MODEL_NAME=cdi"

sven-nm · 2022-12-21T15:25:41Z

Having exactly the same issue here since reinstalling tesseract, despite lstm.train being in tessdata_dir/configs

make training MODEL_NAME=test_trained START_MODEL=grc OUTPUT_DIR=/scratch/sven/ocr_exp/models/test/train GROUND_TRUTH_DIR=/scratch/sven/ocr_exp/datasets/test CORES=12 EPOCHS=1 
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_87 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_71 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_88 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_92 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_65 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_17 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_69 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_24 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_73 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_60 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_74 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_91 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_68 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_7 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_64 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_21 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_10 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_93 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_19 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_2 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_82 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_25 --psm 13 lstm.train
read_params_file: Can't open lstm.train
set -x; \
tesseract "/scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png" /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
+ tesseract /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75.png /scratch/sven/ocr_exp/datasets/test/sophoclesplaysa05campgoog_0336_75 --psm 13 lstm.train
read_params_file: Can't open lstm.train
python3 shuffle.py 0 "/scratch/sven/ocr_exp/models/test/train/all-lstmf"
/bin/bash: line 1: bc: command not found
/bin/bash: line 4: bc: command not found
+ head -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
head: invalid number of lines: ''
+ tail -n '' /scratch/sven/ocr_exp/models/test/train/all-lstmf
tail: invalid number of lines: ''
make: *** [Makefile:191: /scratch/sven/ocr_exp/models/test/train/list.train] Error 1

zdenop · 2022-12-30T17:47:16Z

read_params_file: Can't open lstm.train indicates that there is a problem with the tesseract installation. How did you install tesseract?

bc: command not found indicated that bc utility is not in the path.

ccampisano · 2023-01-02T07:59:57Z

read_params_file: Can't open lstm.train indicates that there is a problem with the tesseract installation. How did you install tesseract?

bc: command not found indicated that bc utility is not in the path.

I installed tesseract from the git repo, doing configure, make, etc.

How should I install it?

BTW: "bc" was installed (Already to the newest version 1.07.1-2+b2)

zdenop · 2023-01-03T17:36:36Z

@ccampisano 'bc' is issue of @sven-nm who think is has the same problem as you...
please post installation log of tesseract.

ccampisano · 2023-01-04T16:11:18Z

@zdenop I didn't record the installation log, but it went fine.
I'll redo and report here asap.

zdenop · 2023-01-04T16:13:59Z

See simular issue #325 - please try clean installation (uninstall everything and install from scratch).
First try sample data and if it works, try your data...

ccampisano · 2023-01-09T08:10:09Z

@zdenop please find attached installation logs, I followed instructions in the repo's readme.

Notice I had a problem during configure and had to run it with --disable-dependency-tracking

install.log
config.log

Please let me know what to do next, my aim is to be able to create custom traindata.

zdenop · 2023-01-10T13:18:35Z

Can you please post output of following commands?
echo $TESSDATA_PREFIX
and
tesseract a b -l c

ccampisano · 2023-01-10T14:00:50Z

@zdenop here's the results:

corrado@tesseract:~$ echo $TESSDATA_PREFIX

corrado@tesseract:~$ tesseract a b -l c
Error opening data file /usr/local/share/tessdata/c.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'c'
Tesseract couldn't load any languages!
Could not initialize tesseract.

zdenop · 2023-01-10T14:55:54Z

According data you posted you installed tesseract to /usr/local/bin, and tesseract search for its data in subdirectories of /usr/local/share/tessdata/, (lstm.train is installed to /usr/local/share/tessdata/configs)... So tesseract is installed correctly .
Can you please double check if there is no other tesseract instalation (e.g. in /usr/bin )?

Can you now run make training MODEL_NAME=cdi?

ccampisano · 2023-01-10T15:16:14Z

@zdenop there is no other tesseract installation:

corrado@tesseract:~$ ls /usr/bin/ | grep tess
corrado@tesseract:~$ which tesseract 
/usr/local/bin/tesseract

root@tesseract:~# apt remove tesseract-ocr
Lettura elenco dei pacchetti... Fatto
Generazione albero delle dipendenze... Fatto
Lettura informazioni sullo stato... Fatto   
Il pacchetto "tesseract-ocr" non è installato e quindi non è stato rimosso
0 aggiornati, 0 installati, 0 da rimuovere e 0 non aggiornati.

BTW:

I didn't run make training and sudo make training-install yet, should I? (see here)
should I run make training MODEL_NAME=cdi from the tesseract folder where I worked so far, or in the tesstrain folder?
where to put the training data folder?

thanks
corrado

zdenop · 2023-01-10T16:05:09Z

Yes, please run sudo make training-install first.

Maybe please first run training on example data (see e.g. this tutorial - just skip installing tesseract as you already did it manually... )

Also you need to install eng.traineddata and osd.traineddata (make tesseract-langs in tesstrain - see README.)

ccampisano · 2023-01-10T16:15:57Z

@zdenop thanks for your support, I was able to run the traininig correctly (and didn't need osd.traineddata).

The trained file was correctly generated, but:

it is quite small (3MB compared to 10MB of regular "ita" trained file), maybe because I used a few images (say 20, not even covering the whole alphabet)
its performances are very poor, compared to the regular "ita" file

how could I improve this?

zdenop · 2023-01-10T16:52:06Z

Congratulation!

'its performances are very poor, compared to the regular "ita" file'

It is in line with documentation. Did you read it? Or did you expected that with 10 minutes training you will get better result than Google with its resources?

zdenop closed this as completed Jan 10, 2023

Forthoney mentioned this issue Feb 6, 2024

Can't open lstm.train despite (probably) having all training tools #366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failure training #323

failure training #323

ccampisano commented Dec 16, 2022

Shawnsdaddy commented Dec 16, 2022

zdenop commented Dec 17, 2022

ccampisano commented Dec 19, 2022

zdenop commented Dec 20, 2022

ccampisano commented Dec 20, 2022

sven-nm commented Dec 21, 2022

zdenop commented Dec 30, 2022

ccampisano commented Jan 2, 2023

zdenop commented Jan 3, 2023

ccampisano commented Jan 4, 2023

zdenop commented Jan 4, 2023

ccampisano commented Jan 9, 2023

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023 •

edited

Loading

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023

zdenop commented Jan 10, 2023

failure training #323

failure training #323

Comments

ccampisano commented Dec 16, 2022

Shawnsdaddy commented Dec 16, 2022

zdenop commented Dec 17, 2022

ccampisano commented Dec 19, 2022

zdenop commented Dec 20, 2022

ccampisano commented Dec 20, 2022

sven-nm commented Dec 21, 2022

zdenop commented Dec 30, 2022

ccampisano commented Jan 2, 2023

zdenop commented Jan 3, 2023

ccampisano commented Jan 4, 2023

zdenop commented Jan 4, 2023

ccampisano commented Jan 9, 2023

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023 • edited Loading

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023

zdenop commented Jan 10, 2023

ccampisano commented Jan 10, 2023 •

edited

Loading