You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recognition quality is not a Tesseract issue. It depends on the neural network which is used. In this case most models were trained with texts and languages which require a space after a comma, for example, so it is expected that such models will add such spaces.
If you want to decode uuencoded text, training such data would help. Or try to whitelist possible and to blacklist unexpected characters.
Current Behavior
This text which was generated by the UNIX
uuencode
command:is OCRed incorrectly:
Tesseract added spaces which aren't present, and failed to detect clearly visible back-quotes, among other issues.
Versions: tesseract-5.3.4, tesseract-data-4.1.0
FreeBSD 14.0
Expected Behavior
n/a
Suggested Fix
n/a
tesseract -v
tesseract 5.3.4
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.40 : libtiff 4.4.0 : zlib 1.3 : libwebp 1.3.2
Found SSE4.1
Found OpenMP 201811
Found libarchive 3.7.2 zlib/1.3 liblzma/5.4.4 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
Found libcurl/8.5.0 OpenSSL/3.0.12 zlib/1.3 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.11.0 nghttp2/1.58.0
Operating System
No response
Other Operating System
FreeBSD 14.0
uname -a
FreeBSD xx.xx.xx 14.0-STABLE FreeBSD 14.0-STABLE #1 stable/14-n266076-2001d7f6a272: Sat Dec 30 13:33:21 PST 2023 [email protected]:/disk-samsung/obj/disk-samsung/freebsd-src/amd64.amd64/sys/GENERIC amd64
Compiler
No response
CPU
n/a
Virtualization / Containers
n/a
Other Information
n/a
The text was updated successfully, but these errors were encountered: