Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TopOne metrics aggregations #62801

Open
stfaun opened this issue Sep 23, 2020 · 6 comments
Open

Implement TopOne metrics aggregations #62801

stfaun opened this issue Sep 23, 2020 · 6 comments

Comments

@stfaun
Copy link

stfaun commented Sep 23, 2020

Relates to #35639

At the beginning, I think new first/last metric aggregations should be implemented as a SingleValue metrics aggregation to support the feature.

But the first/last metrics aggregations seem to be redundant. They can be combined as a top metrics aggregation.

I have found that top_metrics metrics aggregations may be appropriate for my requirement. But when the first sorted document has no value for the target field, the top_metrics metrics aggregations will return null value as the result rather than ignoring it.

I understand the feature for top_metrics metrics aggregations. A top_metrics metrics aggregations may return several fields at the same time. So it should not ignore any doc for the bucket.

@stfaun stfaun added >enhancement needs:triage Requires assignment of a team area label labels Sep 23, 2020
@stfaun
Copy link
Author

stfaun commented Sep 23, 2020

Considering the above problem, implementing a top_one metrics aggregation as a SingleValue metrics aggregation may be a alternative solution.

Like the top_hits and top_metrics metrics aggregations, new top_one metrics aggregation needs a sort parameter to determine how to sort the docs in a bucket.
UnLike the top_hits and top_metrics metrics aggregations, new top_one metrics aggregation should be a SingleValue metrics aggregation. Therefore, new top_one metrics aggregation can only extract one field of the first doc which are sorted by the specified sort fields.

Also, new top_one metrics aggregation provide a parameter ignore_null to determine if the null value of target field should be ignored.

New top_one metrics aggregation maybe used as follows:

GET /exams/_search
{
  "size": 0,
  "aggs": {
    "first_grade": {
      "top_one": {
        "value": {
          "field": "grade"
        },
        "sort": {
          "timestamp": "asc"
        },
        "ignore_null": true
      }
    }
  }
}

Which yields a response like:

{
  ...
  "aggregations": {
    "first_grade": {
      "value": 70.0
    }
  }
}

Because new top_one metrics aggregation is a SingleValue metrics aggregation, its result can be used in the bucket_path of bucket_script/bucket_selector/bucket_sort.

@stfaun
Copy link
Author

stfaun commented Sep 23, 2020

I would like to implement this feature, but I'm Not sure if it's a good design to implement the top_one metrics aggregation. Or the previos first/last metrics aggregation may be better?

@cbuescher cbuescher added the :Analytics/Aggregations Aggregations label Sep 23, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 23, 2020
@cbuescher cbuescher removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:triage Requires assignment of a team area label labels Sep 23, 2020
@polyfractal
Copy link
Contributor

Hiya @stfaun, thanks for opening this issue! I'm going to mark this as team-discuss, so that the analytics team can chat about it. I think it's an interesting use-case, but I'm personally not sure if it'd be better served as a modification to top-metrics (some kind of flag or mode when only one field is needed?) or as a whole new agg as you suggest. Both approaches have pros/cons.

Will write back once we've discussed!

@polyfractal
Copy link
Contributor

Hiya @stfaun, we chatted about this and were curious if a filter aggregation + exists query would solve your needs?

The filter aggregation will ensure that all documents inside the bucket match the provided query/filter, and the exists query can be used to ensure that all documents have the desired field (so that the "top one" document doesn't have a null value). You can then use the top-metrics agg specifying a single field, and you should get the "top" doc that has a non-null value.

@stfaun
Copy link
Author

stfaun commented Sep 24, 2020

Hi @polyfractal, the solution you suggests does work for me. I have been confirm that it can be use in the bucket_path of bucket_script/bucket_selector/bucket_sort.

Thanks for your replies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants