Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading a pdf results in a StopIteration error #383

Closed
charlescearl opened this issue Nov 19, 2024 · 9 comments
Closed

Loading a pdf results in a StopIteration error #383

charlescearl opened this issue Nov 19, 2024 · 9 comments
Assignees
Labels
bug Something isn't working error-handling

Comments

@charlescearl
Copy link

Bug

Running spacy-layout on a Apple M3 Pro with 36GB memory.
Python version 3.11.7

The following code is invoked in a python Jupyter notebook:

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("a4b3a1f45daf416a950584c918f0a007.pdf", max_file_size=10)

Where a4b3a1f45daf416a950584c918f0a007.pdf is a 33 page 1.6M pdf containing text and pictures and tables.

The following error occurs

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".venv/lib/python3.11/site-packages/pydantic/validate_call_decorator.py", line 60, in wrapper_function
    return validate_call_wrapper(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/pydantic/_internal/_validate_call.py", line 96, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/docling/document_converter.py", line 161, in convert
    return next(all_res)
           ^^^^^^^^^^^^^
StopIteration

Steps to reproduce

Start a python interpreter in a Python 3.11.7 in which docling has been installed.
run the following

from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("a4b3a1f45daf416a950584c918f0a007.pdf", max_file_size=10)

Docling version

Docling version: 2.5.2
Docling Core version: 2.4.0
Docling IBM Models version: 2.0.3
Docling Parse version: 2.0.4

Python version

Python 3.11.7

@charlescearl charlescearl added the bug Something isn't working label Nov 19, 2024
@dolfim-ibm
Copy link
Contributor

Thanks for the report, we are not able to reproduce the same error. On the other hand, while trying out, we found another small bug fixed in #388.

Could you please check again after the fix in #388? In case, can you share the document?

@cau-git
Copy link
Contributor

cau-git commented Nov 20, 2024

@charlescearl could you share the document you see trouble with, or is that private/confidential data?

@charlescearl
Copy link
Author

Hi @cau-git. I am unfortunately seeing the same error with the new version:

Python 3.12.0 (main, Oct  2 2023, 20:56:14) [Clang 16.0.3 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from docling.document_converter import DocumentConverter
>>> converter = DocumentConverter()
>>> result = converter.convert("a4b3a1f45daf416a950584c918f0a007.pdf")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".venv/lib/python3.12/site-packages/pydantic/validate_call_decorator.py", line 60, in wrapper_function
    return validate_call_wrapper(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 96, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/docling/document_converter.py", line 170, in convert
    return next(all_res)
           ^^^^^^^^^^^^^
StopIteration
$ docling --version
Docling version: 2.6.0
Docling Core version: 2.4.0
Docling IBM Models version: 2.0.5
Docling Parse version: 2.0.4

Unfortunately, the document is proprietary -- checking today whether it can be released.
Thanks for the help though.

@PeterStaar-IBM
Copy link
Contributor

@charlescearl If you want, you can share it via the email in the CONTRIBUTORS.md file.

@niderhoff
Copy link

I am also receiving StopIteration, but I am using a text file:

alembic_autogen_error.txt

Traceback (most recent call last):
  File "/Users/A78751003/projects/genai/viper/src/viper/app/app.py", line 155, in upload_file
    result = converter.convert(source)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/pydantic/validate_call_decorator.py", line 60, in wrapper_function
    return validate_call_wrapper(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 96, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/docling/document_converter.py", line 170, in convert
    return next(all_res)
           ^^^^^^^^^^^^^
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
    await self.app(scope, receive, _send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/A78751003/projects/genai/viper/.venv/lib/python3.12/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: coroutine raised StopIteration

@cau-git
Copy link
Contributor

cau-git commented Nov 25, 2024

@niderhoff in your case, the error is seen because you are using an input file format txt that is not recognized as a supported format by Docling.

The following PR improves the error handling: #429

@niderhoff
Copy link

@niderhoff in your case, the error is seen because you are using an input file format txt that is not recognized as a supported format by Docling.

The following PR improves the error handling: #429

Ah yes, I guess I was was just assuming it supports .txt, because all other document parsing tools I have come across so far support it.

Wouldn't it be an option to treat .txt as markdown since it's just markdown without the markup?

@cau-git
Copy link
Contributor

cau-git commented Nov 26, 2024

Wouldn't it be an option to treat .txt as markdown since it's just markdown without the markup?

Yes we could easily add pure text as you outline, let us put that on the roadmap. But independently we will also fix the StopIteration since it should not be received by users.

@cau-git
Copy link
Contributor

cau-git commented Jan 30, 2025

The reported issues appear to be handled correctly in the current version. Closing.

@cau-git cau-git closed this as completed Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working error-handling
Projects
None yet
Development

No branches or pull requests

5 participants