Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error KeyError: 'TABLE_TITLE' #153

Open
OGiesecke opened this issue Jul 19, 2023 · 2 comments
Open

Error KeyError: 'TABLE_TITLE' #153

OGiesecke opened this issue Jul 19, 2023 · 2 comments
Assignees
Labels
bug Something isn't working python Relates to the Python version of TRP

Comments

@OGiesecke
Copy link

OGiesecke commented Jul 19, 2023

I run:

j = call_textract(input_document=f"{awspath}/images/{newfile}_table.jpeg", features=[Textract_Features.TABLES])
# the t_doc will be not ordered
t_doc = TDocumentSchema().load(j)
# the ordered_doc has elements ordered by y-coordinate (top to bottom of page)
ordered_doc = order_blocks_by_geo(t_doc)
# send to trp for further processing logic
trp_doc = trp.Document(TDocumentSchema().dump(ordered_doc))

And get the following error:

  File "/var/folders/6t/kcngxw3s50z4zg416dhcckjc0000gn/T/ipykernel_59247/1703830676.py", line 1, in <cell line: 1>
    t_doc = TDocumentSchema().load(j)

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/marshmallow/schema.py", line 719, in load
    return self._do_load(

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/marshmallow/schema.py", line 892, in _do_load
    result = self._invoke_load_processors(

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/marshmallow/schema.py", line 1090, in _invoke_load_processors
    data = self._invoke_processors(

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/marshmallow/schema.py", line 1220, in _invoke_processors
    data = processor(data, many=many, **kwargs)

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/trp/trp2.py", line 848, in make_tdocument
    return TDocument(**data)

  File "<string>", line 14, in __init__

  File "/Users/olivergiesecke/opt/anaconda3/envs/labelstudioenv/lib/python3.9/site-packages/trp/trp2.py", line 468, in __post_init__
    self._block_id_maps[blk.block_type][blk.id] = blk_i

KeyError: 'TABLE_TITLE'
@schadem schadem self-assigned this Jul 20, 2023
@athewsey athewsey added the python Relates to the Python version of TRP label Aug 28, 2023
@athewsey athewsey added the bug Something isn't working label Jun 7, 2024
athewsey added a commit to athewsey/amazon-textract-response-parser that referenced this issue Jun 7, 2024
As reported in aws-samples#153, calls to block_id_map and block_map for block
types not present at all in the target document were raising an error
@athewsey
Copy link
Contributor

athewsey commented Jun 7, 2024

For the immediate error, It appears that _block_id_maps only gets initialized for the block types that are present in the document, which I believe is a bug because block_id_map(block_type) & block_map(block_type) are documented/exposed functions.

However, I suspect initialising the map alone won't solve the issue, because there must be some reason the loader is searching for a TABLE_TITLE block when the TDocument state hasn't seen any.

Would you be able to share a non-confidential document/image that reproduces this issue?

@athewsey
Copy link
Contributor

_block_id_maps initialization was addressed in the linked PR and now released on PyPI v1.0.3.

I appreciate this issue was originally reported quite some time ago - If anybody's able to share a document that can reproduce it (or even better to test on v1.0.3+ and confirm whether it's helped) we can dive deeper. Otherwise, we'll probably close it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Relates to the Python version of TRP
Projects
None yet
Development

No branches or pull requests

3 participants