-
-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selecting multiple languages for OCR #305
Comments
OCR works as a new layer. Image editing (e.g. rotation, crop) is what you want to avoid to keep the original quality. |
Great to know, thanks! Also, could I suggest adding the Tesseract version in the Releases section of Github (when a new version is included) and also on the About section of the programme? I'm currently not sure which Tesseract version is included. Version 5.3.4 was recently released, though the Mannheim binaries are still on 5.3.3. Thanks again! |
I don't update Tesseract often as changes rarely affect the functionality NAPS2 uses. You can check the version used here. |
Thanks for the information: it's currently on 5.2.0, as I can see. It'd be nice to have the latest version, but I understand it must take time to update it. However, I'd like to point out that 5.3.3 included a fix for an issue that can affect the quality of the OCR: |
Thanks for pointing that out, I'll update that for the next NAPS2 version. |
Hi, |
Multiple Languages can now be selected as an option (in the "OCR language" dropdown) in 7.4.0. Also 7.4.0 has updated Tesseract to 5.3.4. |
Thank you for adding multiple language selection on the latest release. Appreciated! |
I wanted to ask you a question, though: I can see no binaries for Tesseract 5.3.4 from Mannheim. Did you get the binaries from another source or just compiled them yourself? |
I compile them myself. https://github.com/cyanfish/naps2-tesseract has the compiled binaries and my scripts that include all the flags etc to keep the compiled size down <5MB. |
Interesting: thanks! |
Hi,
I wanted to suggest the possibility of selecting more than one language for the OCR engine, which would help with multilingual documents. The way it works now, you can only select one language at a time.
On a separate note, I wanted to ask a question (I apologize if the issue is explained somewhere else and I couldn't find the information). When you open a PDF document and then apply OCR on it, is the OCR added as a new layer on the document with no further changes made on it or is a completely new PDF generated with a inevitable reduction in the quality of the original?
Thanks!
The text was updated successfully, but these errors were encountered: