Skip to content

Commit

Permalink
community[minor]: 04 - Refactoring PDFMiner parser (#29526)
Browse files Browse the repository at this point in the history
This is one part of a larger Pull Request (PR) that is too large to be
submitted all at once. This specific part focuses on updating the XXX
parser.

For more details, see [PR
28970](#28970).

---------

Co-authored-by: Eugene Yurtsev <[email protected]>
  • Loading branch information
pprados and eyurtsev authored Feb 6, 2025
1 parent 4460d20 commit 6ff0d5c
Show file tree
Hide file tree
Showing 8 changed files with 2,551 additions and 765 deletions.
2,030 changes: 1,975 additions & 55 deletions docs/docs/integrations/document_loaders/pdfminer.ipynb

Large diffs are not rendered by default.

712 changes: 164 additions & 548 deletions docs/docs/integrations/document_loaders/pymupdf.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion libs/community/extended_testing_deps.txt
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ openapi-pydantic>=0.3.2,<0.4
oracle-ads>=2.9.1,<3
oracledb>=2.2.0,<3
pandas>=2.0.1,<3
pdfminer-six>=20221105,<20240706
pdfminer-six==20231228
pdfplumber>=0.11
pgvector>=0.1.6,<0.2
playwright>=1.48.0,<2
Expand Down Expand Up @@ -104,3 +104,4 @@ mlflow[genai]>=2.14.0
databricks-sdk>=0.30.0
websocket>=0.2.1,<1
writer-sdk>=1.2.0
unstructured[pdf]>=0.15
Loading

0 comments on commit 6ff0d5c

Please sign in to comment.