-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for ocrmac
OCR engine on macOS
#276
Conversation
@nuridol This looks amazing, thank you for the PR! I enabled the test. It seems you need to run
Let's iterate! |
- Integrates `ocrmac` as an OCR engine option for macOS users. - Adds configuration options and dependencies for `ocrmac`. - Updates documentation to reflect new engine support. This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility. Signed-off-by: Suhwan Seo <[email protected]>
Signed-off-by: Suhwan Seo <[email protected]>
c67d2eb
to
719cfe9
Compare
Thank you for your feedback! I've committed the updates you suggested. |
Let me keep running the tests, and let's see what comes out. We have a code-freeze by Nov 15th for productization, so I wont merge it before that, to ensure that other dependencies come in. However, it looks like a very nice addition. just to confirm:
|
@nuridol Next to do, run the formatting (multiple times), poetry run pre-commit run --all-files You should see this, after a few repeated times running the command, taa@Munlochy docling % poetry run pre-commit run --all-files
Black....................................................................Passed
isort....................................................................Passed
MyPy.....................................................................Passed
nbQA Black...............................................................Passed
nbQA isort...............................................................Passed
Poetry check.............................................................Passed |
I dont see any updates on the tests, did you see the ocr-tests we have (https://github.com/DS4SD/docling/blob/main/tests/test_e2e_ocr_conversion.py)? |
…non-Mac systems - Resolved formatting and linting issues - Updated `--ocr-engine` CLI option documentation for `ocrmac` - Added RuntimeError for attempts to use `ocrmac` on non-Mac platforms Signed-off-by: Suhwan Seo <[email protected]>
Hi, apologies for the multiple back-and-forth. Thanks again for considering this a valuable addition, and I understand the plan regarding the code freeze. Here’s an update:
Thanks, and I look forward to the next steps! |
@nuridol we are ready to finalize this work. Would you have some time to rebase the current branch with the latest main? |
- Integrates `ocrmac` as an OCR engine option for macOS users. - Adds configuration options and dependencies for `ocrmac`. - Updates documentation to reflect new engine support. This change allows macOS users to utilize `ocrmac` for improved OCR performance and compatibility. Signed-off-by: Suhwan Seo <[email protected]>
- Added `OcrMacOptions` to `custom_convert.py` and `full_page_ocr.py` examples. - Included usage comments and examples for `OcrMacOptions` in OCR pipelines. - Updated installation guide to include instructions for installing `ocrmac`, noting macOS version requirements (10.15+). - Highlighted that `ocrmac` leverages Apple's Vision framework as an OCR backend. This enhances documentation for users working on macOS to leverage `ocrmac` effectively. Signed-off-by: Suhwan Seo <[email protected]>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 🛡 GitHub branch protections and repository rulesets requirementsWonderful, this rule succeeded.
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
041c847
to
2576e65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful!
I've updated the branch by rebasing it with the latest main. |
- Added `ocrmac` as an optional dependency in `pyproject.toml` and `poetry.lock`. - Updated the `[tool.poetry.extras]` section to include `ocrmac`. - Modified end-to-end OCR conversion tests to support `OcrMacOptions` on macOS. Signed-off-by: Suhwan Seo <[email protected]>
- Added `sys_platform == 'darwin'` marker to the `ocrmac` dependency in `pyproject.toml` to specify macOS compatibility. - Updated the content hash in `poetry.lock` to reflect the changes. This ensures the `ocrmac` dependency is only installed on macOS systems. Signed-off-by: Suhwan Seo <[email protected]>
ocrmac
as an OCR engine option for macOS users.ocrmac
.This change allows macOS users to utilize
ocrmac
for improved OCR performance and compatibility.Checklist:
conventional commits.