Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composite aggregation as child aggregation #28611

Closed
flefebure opened this issue Feb 9, 2018 · 15 comments · Fixed by #37178
Closed

composite aggregation as child aggregation #28611

flefebure opened this issue Feb 9, 2018 · 15 comments · Fixed by #37178

Comments

@flefebure
Copy link

The 6.x new composite aggregation provides a way to scroll and page on terms bucket, and it's a good thing.
However, it's not (for now?) possible to set a composite aggregation as a child of another bucket aggregation.
So it's impossible, for example, to composite-aggregate on a nested or a child entity. This request is not allowed :

{
        "metadatas" : {
            "nested" : {
                "path" : "metaDatas"
            },
            "aggs" : {
                "msisdn" : {
                  "filter" : { "term": { "metaDatas.name": "msisdn" } },
                  "aggs": {
                    "msisdns": {
                       "composite" : {
                          "size":5,
                          "sources" : [{ "value": { "terms" : { "field": "metaDatas.value.raw" } } }]
                       }
                    }
                  }
                }
            }
        }
    }

I don't understand this limitation.
Furthermore, an ES 6.2 patched without the check seems to work as expected.

@Override
   protected AggregatorFactory<?> doBuild(SearchContext context, AggregatorFactory<?> parent,
                                          AggregatorFactories.Builder subfactoriesBuilder) throws IOException {
       /*if (parent != null) {
           throw new IllegalArgumentException("[composite] aggregation cannot be used with a parent aggregation");
       }*/

any insight ?

@jimczi
Copy link
Contributor

jimczi commented Feb 16, 2018

The limitation was added because the use case was not clear but it is possible as long as all* sources in the composite are in the same nested context. I'll mark this issue with the feature label but as a low priority for now.

@jimczi jimczi self-assigned this Feb 16, 2018
@colings86
Copy link
Contributor

@elastic/es-search-aggs

@tchiotludo
Copy link

This feature is really useful, since nested is also not allow in sources of composite.
With out this features, we just can't use composite with nested document and must use deprecated terms aggregations with size = Infinity

@neelam28m
Copy link

I also feel this should be possible to have composite aggregation for nested object. I have come across a situation where sources with nested object is the only option and that too does not support nested types. I will have to look for other workaround, may be to use multiple queries to get the desired result.

@rooboo
Copy link

rooboo commented Jun 27, 2018

I find this feature also very helpful to get all values of a specific field in a nested object, grouped by an identifier.

@justinmcp88
Copy link

@jimczi Can you possibly provide an example of how to write the composite aggregation when "all* sources in the composite are in the same nested context"? I'm trying to do this, but so far have not been successful.

@jimczi
Copy link
Contributor

jimczi commented Jul 11, 2018

@justinmcp88 this is not possible currently which is why this issue is still open. I mentioned the all fields the composite must be in the same nested context as a way to implement the feature described in this issue.

@jimczi jimczi added good first issue low hanging fruit help wanted adoptme labels Jul 11, 2018
@jimczi jimczi removed their assignment Nov 20, 2018
jimczi added a commit to jimczi/elasticsearch that referenced this issue Jan 7, 2019
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes elastic#28611
jimczi added a commit that referenced this issue Jan 25, 2019
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes #28611
jimczi added a commit that referenced this issue Jan 25, 2019
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.

Closes #28611
@fredgalvao
Copy link

The nested aggregation is only one of the many bucket aggregations that have this limitation. Fixing the nested aggregation scenario (making it work with inner composite aggregations) only solves one fraction of the issue. Using a composite aggregation as a child of a filter aggregation is another very common scenario (for me, at least), and it's still forbidden after #37178.

Can we reopen this? Are all of the other bucket aggregations gonna be considered a hard limitation and be left out of the game? Did I miss some other issue that is responsible for the general scenario?

@nakulm95
Copy link

+1 for the request. Would be extremely helpful to have this capability ported to 6.3 as well so there's a way to paginate term bucket documents in an efficient manner

@fredgalvao
Copy link

@jimczi Sorry to ping you, but I fear this could go unnoticed for 7, which would be very sad.

Composite aggregations have the incredible power to actually make Elasticsearch amazing at aggregation level analytics, which is something that people (me included) have to currently work around to be productive. Setting "size": Long.MAX_VALUE on terms aggregations to paginate on the application level is terrible, and building intermediate rollup/derived types/indexes don't have the same intuitive level of so many other Elasticsearch solutions, and more elaborate solutions are not maintenance-friendly.

Do we still/{at least} have plans to support all bucket aggregations as parents of composite aggregations?

@jimczi
Copy link
Contributor

jimczi commented Feb 18, 2019

Do we still/{at least} have plans to support all bucket aggregations as parents of composite aggregations?

No and we never had such plan ;) We added the support for nested aggregations because otherwise it is impossible to paginate over nested fields but I don't see why bucket aggregations would be useful. The composite aggregation must be the root aggregation to allow pagination, that's the design. Can you explain why you'd need to use the composite as a sub-aggregation (other than switching to a nested context) ?

Using a composite aggregation as a child of a filter aggregation is another very common scenario (for me, at least), and it's still forbidden after #37178.

Can you move the filter aggregation to the query ? It should be equivalent. There is also a pr open to support a nested/filter combo but for the main context the query should be preferred.

@fredgalvao
Copy link

I didn't mean to imply there was such plan, but the original topic seemed broad enough to cover them all, my bad. But I digress.

Can you explain why you'd need to use the composite as a sub-aggregation (other than switching to a nested context) ?

Sure, I'll try.

Many of our queries here target many indexes at the same time, so that we can do "joins" in memory on the application side without requiring N>1 steps/queries (that would elevate complexity on our side a bunch).

So let's say I have indexes iA and iB, both already on the 7.0 mindset of index==type. Those indexes/types have many fields in common, for denormalization purposes. They relate to each other in some way (think fkeys), and we need extra data from one another when in a bigger context.
In a sample query, we could be doing something like:

GET /iA,iB/_search
{
  "size": 0,
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "filter": [
              { "type": { "value": "iA" } },
              /* some filter on iA */
            ]
          }
        },
        {
          "bool": {
            "filter": [
              { "type": { "value": "iB" } },
              /* some filter on iB */
            ]
          }
        }
      ]
    },
    /* there is some application logic to apply common filters here to all indexes involved */
  },
  "aggs": {
    "filtered_iA": {
      "filter": { "type": { "value": "iA" } },
      "aggs": {
        "and_then_grouped": {
          "terms": { /*========= this is the aggregation we wanted to paginate ==========*/
            "field": "ia_some_field",
            "size": 9999
          },
          "aggs": {
            /* a bunch o metric aggregations, sometimes top_hits too */
          }
        }
      }
    },
    "filtered_iB": {
      "filter": { "type": { "value": "iB" } },
      "aggs": {
        "raw": {
          "terms": {
            "field": "_id",
            /* sometimes this iB is small enough for us to not even try to paginate it */
            /* so we bring it all, since we can't join easily on elasticsearch */
            "size": 1000
          },
          "aggs": {
            /* top_hit to project fields we need to augment iA */
          }
        }
      }
    }
  }
}

Can you move the filter aggregation to the query ? It should be equivalent.

Using the model/approach we currently do, we cannot. It would break the aggregation because there could be multiple indexes involved in the query, but "this" aggregation would want to deal with only a subset of them, hence the filtering on aggregation level.

I was trying to "defend" the feature for all other bucket aggregations, but truth is I would be 99% happy if Filter Aggregation was added to the "allowed parents" list for composite aggregation.

@bernardmo
Copy link

Another example where composite is also useful is sorting by string fields (terms aggregation only support sorting on numeric fields or single-bucket numeric subaggregation).

Example data structure which describes detection:

{
  "id": {
    "type": "long"
  },
  "time": {
    "type": "date"
  },
  "objectId": {
    "type": "long"
  },
  "objectName": {
    "type": "text",
    "fields": {
      "key": {
        "type": "keyword"
      }
    }
  }
}

Example request: Create auto_bucket_histogram aggregation on detection time. For each bucket return 10 unique objects with included number of detections sorted by objectName.

Translated to composite sub-aggregation it would look something like this:

{
  "size": 0,
  "aggs": {
    "ranges": {
      "auto_date_histogram": {
        "field": "time",
        "buckets": 10
      },
      "aggregations": {
        "objects": {
          "composite": {
            "size": 10,
            "sources": [
              {
                "_sortBy": {
                  "terms": {
                    "field": "objectName.key",
                    "order": "desc"
                  }
                }
              },
              {
                "_groupBy": {
                  "terms": {
                    "field": "objectId"
                  }
                }
              }
            ]
          },
          "aggregations": {
            "count": {
              "value_count": {
                "field": "objectId"
              }
            }
          }
        }
      }
    }
  }
}

@mynkow
Copy link

mynkow commented Jan 23, 2020

What is the reason this to be closed?

@champloo11
Copy link

+1 for having composite aggregations be usable as a child aggregation of a filter aggregation. To understand a real world use-case where this would be helpful let me explain our use case:

We want to use composite aggregations as a method of creating hierarchical faceted searches. For a visual example of the final product that we want here is a screenshot:

image

Each document in our index has a parent "category" denoting a category i.e. "Men's Shoes", and several SKUs contained within each document that each have their own size "12", "13" etc...

Now, we might be able to accomplish a similar effect by using multiple term subaggregations to simulate parent/child hierarchies. But we wanted to be able to have these heirarchies be extendable (i.e deeper than just a single parent-child), an example of this could be seen here:

image

While recursive request/response generation to elasticsearch are fun, they make debugging a nightmare when you end up with a large list of configurable aggregations all of potentially varying depth. Composite aggregations gave us a way to very easily (and flatly) express the keys of our aggregation and get our hierarchy and flatly parse them out into a response that can be rendered by a UI by knowing where in the array of the hierarchy they are.

We built a quick proof of concept of rendering these hierarchies using composite aggregations and recently got down to working on the faceted part (where filters applied from an aggregation don't impact the aggregation itself). This is usually built using global aggregations, and then filtering on any filters that are not filtering on that aggregation's fields:

{
    "aggs" : {
        "brand_global_and_filtered" : {
            "global": {},
            "filter" : {  ...  },  // ... filters that DONT filter on the brand field 
            "aggs" : {
                "brand" : { "terms" : { "field" : "brand" } }
            }
        }
    }
}

However, you cannot have a composite aggregation be the child of a filter aggregation and we're now having to reconsider our approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.