-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ocr): added support for PaddleOCR engine #393
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Signed-off-by: Swaymaw <[email protected]> Signed-off-by: Swaymaw <[email protected]>
…ling document converter Signed-off-by: Swaymaw <[email protected]>
Signed-off-by: Swaymaw <[email protected]>
Signed-off-by: Swaymaw <[email protected]>
Signed-off-by: Swaymaw <[email protected]>
Signed-off-by: Michele Dolfi <[email protected]> Signed-off-by: Swaymaw <[email protected]>
pyproject.toml
Outdated
@@ -95,6 +95,7 @@ torchvision = [ | |||
|
|||
[tool.poetry.extras] | |||
tesserocr = ["tesserocr"] | |||
paddleocr = ["paddlepaddle", "paddleocr"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For adding extras, these should also be in the main dependencies, with the optional=true
flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- which python version?
- can you please try to rebase with the latest main branch? we just merged something about numpy conflicts as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the lines I added in pyproject.toml to add the necessary libraries for paddle_ocr:
This is the error I am getting:
I have rebased with the latest main branch however i am still facing the numpy version conflict between paddleocr and deepsearch-glm. For now, I have removed paddlepaddle and paddleocr from the requirement and the extras in pyproject.toml as a simple workaround for now can be to manually install these libraries whenever someone wants to use PaddleOCR engine. We have instructions for the same in installation.md and if someone tries to use it we throw an import error with the instructions to install the library as well.
Hello, Does a RapidOCR implementation could be possible too? (Wrapper of PaddleOCR, a lot easier to install) ! |
What is the added delta with RapidOCR compared to PaddleOCR? |
Signed-off-by: Swaymaw <[email protected]>
It is just the poetry.lock file nothing much has changed code-wise. |
@Swaymaw we will check if adding the packages as extras work for us. Meanwhile, can you please make sure to add those manual dependencies in the CI tests? |
Hope this merge successfully ! |
@dolfim-ibm Should I close this pull request to avoid any confusion? |
|
This change allows users to seamlessly work with PaddleOCR engine which provides higher accuracy and performance in use-cases which require working with complex PDF files.
Checklist:
conventional commits.