Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesserocr does not read UZN files #305

Closed
DevKretov opened this issue Jul 8, 2022 · 2 comments
Closed

Tesserocr does not read UZN files #305

DevKretov opened this issue Jul 8, 2022 · 2 comments

Comments

@DevKretov
Copy link

DevKretov commented Jul 8, 2022

Hello,

when I want to specify the regions of interest via .UZN file (zones file), tesserocr does not pay attention to this file, which is specified according to this tutorial.

The code I use:

from tesserocr import PyTessBaseAPI

image_save_path = 'some/path/to/jpg/file.jpg'
# uzn path is 'some/path/to/jpg/file.uzn' 

_tesseract_api = PyTessBaseAPI(
    lang='ces',
    psm=4,
    oem=1,
    path=os.getenv('TESSDATA_PREFIX')
)
_tesseract_api.ReadConfigFile("tsv")
_tesseract_api.ReadConfigFile("logfile")
_tesseract_api.SetImageFile(image_save_path)
_tesseract_api.Recognize()

_tesseract_api.GetUTF8Text()

The code returns the whole contents of the page, not the one specified in the OZN file.

Is it a bug or am I doing something wrong? Thanks!

@zdenop
Copy link
Contributor

zdenop commented Jul 9, 2022

First of all: why you want to use uzn file if you can use API/SetRectangle? uzn file is for tesseract executable users...
Next: tesseract-ocr/tesseract#3837

@DevKretov
Copy link
Author

I want to use UZN file in order to get away from Tesseract's inner segmentation, which I cannot control and which fails on my documents - it does not find all regions of text in sparsely distributed text on a page.

Finally, I was able to set up UZN file with the help of API/ProcessPage, where I specified the filename parameter with the path to the image, where the UZN file is also present. Finally, it worked.

@sirfz sirfz closed this as completed Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants