Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub aggregations can overwrite bucket fields #17652

Closed
colings86 opened this issue Apr 11, 2016 · 4 comments
Closed

sub aggregations can overwrite bucket fields #17652

colings86 opened this issue Apr 11, 2016 · 4 comments
Assignees
Labels
:Analytics/Aggregations Aggregations >bug help wanted adoptme PITA Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) won't fix

Comments

@colings86
Copy link
Contributor

Consider the following sense script:

GET test/_search
{
  "aggs": {
    "terms": {
      "terms": {
        "field": "i",
        "size": 10
      },
      "aggs": {
        "key": {
          "max": {
            "field": "i"
          }
        },
        "doc_count": {
          "bucket_script": {
            "buckets_path": "_count",
            "script": {
              "inline": "_value",
              "lang": "expression"
            }
          }
        }
      }
    }
  }
}

The sub aggregations to the terms aggregation are named key and doc_count. In the response these sub-aggregations overwrite the key and doc_count fields of the bucket (for comparison the normal response without any sub aggregations is at the bottom of this bug report):

{
  ...
  "aggregations": {
    "terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": {
            "value": 2
          },
          "doc_count": {
            "value": 1
          }
        }
      ]
    }
  }

Sub aggregations should not be able to overwrite fields in the bucket. Note that this affects all bucket aggregations and other bucket aggregations use other fields. More generally a bucket aggregation has (and should have) no restrictions on the fields it can use in each bucket response.

We should not fix this by not allowing aggregations to be named doc_count or key, etc. That would be a hacky fix that would be hard to maintain (if a bucket aggregation defines a new field in the bucket we would have to remember to add it to the exceptions which is easily missed).

I think we should solve this by name-spacing the sub aggregations under an aggregations object in the bucket. This would make the output safer (and avoid this bug), and would match the request format better (where sub aggregations are nested under an aggs/aggregations object). This would obviously be a breaking change but I think it is an important change to make. If we implement #11184 we could effectively deprecate the old response format by having it output the new format by default but output the old format if the version parameter is specified on the request (with a version number relevant for the old format). Then we could remove the old format in the next major version.

Response with no sub-aggregations:

{
  ...
  "aggregations": {
    "terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 2,
          "doc_count": 1
        }
      ]
    }
  }
}
@colings86
Copy link
Contributor Author

To change this we would require #11184 to be done first

@markharwood
Copy link
Contributor

cc @elastic/es-search-aggs

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@not-napoleon
Copy link
Member

We discussed this issue on 2020-08-19, and have decided not to fix it. We believe this to be a rare issue, based on the fact that it's been open for years and has no upvotes. Also, a user has to go out of their way to hit this, and the resulting failure is relatively obvious. On the other hand, fixing it either requires a huge breaking change to how aggregation results are formatted, or requires us to maintain a list of forbidden aggregation names, which would be a breaking change every time we added to it. Neither option is appealing. Based on this analysis, we feel making any change here would cause more harm than the bug is currently causing.

@not-napoleon
Copy link
Member

I'm removing the breaking label so this stops showing up in searches for breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug help wanted adoptme PITA Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) won't fix
Projects
None yet
Development

No branches or pull requests

9 participants