Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possibly an invalid tessdata path: AT_SYMLINK_NOFOLLOW ? #195

Closed
jbarth-ubhd opened this issue Oct 9, 2023 · 6 comments
Closed

possibly an invalid tessdata path: AT_SYMLINK_NOFOLLOW ? #195

jbarth-ubhd opened this issue Oct 9, 2023 · 6 comments

Comments

@jbarth-ubhd
Copy link

jbarth-ubhd commented Oct 9, 2023

Did create a singularity container from docker OCR-D today, this way: singularity build ocrd.sif docker://ocrd/all:maximum

and then I started this command:

+ /home/hd/hd_hd/xx_xxxxx/local/bin/time singularity exec --bind /home/hd/hd_hd/xx_xxxxx/ocrd_models/tessdata:/usr
►/local/share/tessdata --bind /home/hd/hd_hd/xx_xxxxx/ocrd_models:/usr/local/share/ocrd-resources -e --env-file /
►home/hd/hd_hd/xx_xxxxx/ocrd.env --env MAGICK_TEMPORARY_PATH=/scratch/xx_xxxxx_job_1646883_m03n17 --env TMPDIR=/
►scratch/xx_xxxxx_job_1646883_m03n17 /home/hd/hd_hd/xx_xxxxx/ocrd.sif ocrd-tesserocr-crop -I OCR-D-001 -O OCR-D-002

here the output:

GID: readonly variable
UID: readonly variable
12:15:45.718 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-tesserocr-crop'
Traceback (most recent call last):
  File "/build/core/ocrd/ocrd/processor/helpers.py", line 128, in run_processor
    processor.process()
  File "/build/ocrd_tesserocr/ocrd_tesserocr/crop.py", line 59, in process
    with tesserocr.PyTessBaseAPI() as tessapi:
  File "tesserocr.pyx", line 1219, in tesserocr.PyTessBaseAPI.__cinit__
  File "tesserocr.pyx", line 1233, in tesserocr.PyTessBaseAPI._init_api
RuntimeError: Failed to init API, possibly an invalid tessdata path: /usr/local/share/tessdata/
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-tesserocr-crop", line 33, in <module>
    sys.exit(load_entry_point('ocrd-tesserocr', 'console_scripts', 'ocrd-tesserocr-crop')())
...

But I never saw this error message using ocrd-tesserocr-crop, and didn't change /home/hd/hd_hd/xx_xxxxx/ocrd_models == /usr/local/share/ocrd-resources

so I did strace -f -- the only line with tessdata string:

[pid 1404221] newfstatat(AT_FDCWD, "/gpfs/bwfor/home/hd/hd_hd/xx_xxxxx/ocrd_models/tessdata", {st_mode=S_IFDIR|
►0755, st_size=8192, ...}, AT_SYMLINK_NOFOLLOW) = 0

Why AT_SYMLINK_NOFOLLOW?

Content of tessdata dir:

[xx_xxxxx@o05i14 tessdata]$ find . -type f -printf "%-50p %10s\n"|sort
./configs/alto                                             23
./configs/ambigs.train                                    146
./configs/api_config                                       26
./configs/bazaar                                          113
./configs/bigram                                          129
./configs/box.train                                       311
./configs/box.train.stderr                                311
./configs/digits                                           37
./configs/get.images                                       24
./configs/hocr                                             40
./configs/inter                                            59
./configs/kannada                                         101
./configs/linebox                                          70
./configs/logfile                                          25
./configs/lstmbox                                          26
./configs/lstmdebug                                        98
./configs/lstm.train                                      282
./configs/makebox                                          26
./configs/Makefile.am                                     365
./configs/pdf                                              22
./configs/quiet                                            21
./configs/rebox                                            65
./configs/strokewidth                                     377
./configs/tsv                                              22
./configs/txt                                             166
./configs/unlv                                             45
./configs/wordstrbox                                       29
./deu.traineddata                                     8628461
./eng.traineddata                                    15400601
./frak2021_1.069.traineddata                          5060763
./frak2021.traineddata                                3421140
./fra.traineddata                                     3972885
./GT4HistOCR_50000000.997_191951.traineddata          4591424
./osd.traineddata                                    10562727
./pdf.ttf                                                 572
./script/Latin.traineddata                          101402885
./tessconfigs/batch                                        49
./tessconfigs/batch.nochop                                 37
./tessconfigs/matdemo                                     243
./tessconfigs/msdemo                                      368
./tessconfigs/nobatch                                       1
./tessconfigs/segdemo                                     295
@jbarth-ubhd
Copy link
Author

Still a problem: can't run docker→signularity image in Cluster... older versions of OCR-D did work in singularity.

@jbarth-ubhd
Copy link
Author

Did update ocrd.sif from latest docker image today (2024-Feb-21) and did manage to successfully process a complete workflow with ocrd-tesseract-recognize through singularity

@bertsky
Copy link
Collaborator

bertsky commented Feb 22, 2024

@jbarth-ubhd you need to use a named volume now (because we need to mix both the user-downloaded and the pre-installed models). In Docker, this would be -v ocrd-models:/models (or some other name).

Could you please try with a recent (last week) ocrd/tesserocr image?

@jbarth-ubhd
Copy link
Author

yes, see #195 (comment) . Did read the current documentation and changed singularity startup parameters accordingly.

@bertsky
Copy link
Collaborator

bertsky commented Feb 23, 2024

Oh, ok, so this can be closed (is what you are saying)?

@jbarth-ubhd
Copy link
Author

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants