Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch sanitization does not work for bulk queries #1868

Closed
phillipuniverse opened this issue Jun 20, 2023 · 3 comments · Fixed by #1870
Closed

Elasticsearch sanitization does not work for bulk queries #1868

phillipuniverse opened this issue Jun 20, 2023 · 3 comments · Fixed by #1870
Assignees
Labels
bug Something isn't working

Comments

@phillipuniverse
Copy link
Contributor

Describe your environment

Discovered in elasticsearch 5.5.3 and elasticsearch-dsl 5.4.0 and caused by moving to the default sanitization in #1758.

The issue is illustrated here where body comes in as a string, not as a dictionary:

image

This is caseud by the bulk flow specifically as the body gets translated to a string here:

image

which looks like this:

image

Steps to reproduce

I don't have a super straightforward way to reproduce other than to use the bulk API from elasticsearch.

What is the expected behavior?
What did you expect to see?

What is the actual behavior?

The below stacktrace:

  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/elasticsearch/helpers/__init__.py", line 95, in _process_bulk_chunk
    resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/elasticsearch/client/__init__.py", line 1173, in bulk
    return self.transport.perform_request('POST', _make_path(index,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/opentelemetry/instrumentation/elasticsearch/__init__.py", line 224, in wrapper
    attributes[SpanAttributes.DB_STATEMENT] = sanitize_body(
                                              ^^^^^^^^^^^^^^
  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/opentelemetry/instrumentation/elasticsearch/utils.py", line 54, in sanitize_body
    flatten_body = _flatten_dict(body)
                   ^^^^^^^^^^^^^^^^^^^
  File "/Users/phillip/Library/Caches/pypoetry/virtualenvs/someenv/lib/python3.11/site-packages/opentelemetry/instrumentation/elasticsearch/utils.py", line 30, in _flatten_dict
    for k, v in d.items():
                ^^^^^^^
AttributeError: 'str' object has no attribute 'items'

Additional context
Add any other context about the problem here.

@nemoshlag
Copy link
Member

Hi @phillipuniverse thanks I'll look it up

@glebignatieff
Copy link

glebignatieff commented Jun 29, 2023

The issue isn't really fixed, because in bulk a bunch of actions is joined with \n like in the stacktrace:

client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)

As a result, you get a string like { dict 1 }\n{ dict 2 }\n...\n, which clearly breaks json.loads, suggested in #1870.

@anpr
Copy link

anpr commented Sep 13, 2023

This issue is preventing us from upgrading to the newest package versions. It would be nice to have a proper fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants