Range Field Histogram Aggregation #41545

not-napoleon · 2019-04-25T17:55:17Z

Work in progress

TODO:
[x] Range ValuesSource
[x] Fork the HistogramAggregationFactory to return different concrete classes based on field type (similar to Terms agg)
[x] Binary decode logic for ranges - see #41206
[x] Implement RangeHistogramAggregation builder logic
[x] Implement RangeHistogramAggregation leaf collector logic
[x] Deal with IP range related edge cases
[ ] Yaml & Doc tests

elasticmachine · 2019-04-25T17:55:19Z

Pinging @elastic/es-analytics-geo

polyfractal · 2019-04-30T12:26:11Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

+        if (valuesSource instanceof ValuesSource.Numeric) {
+            return createAggregator((ValuesSource.Numeric) valuesSource, parent, pipelineAggregators, metaData);
+        }
+        else if (valuesSource instanceof ValuesSource.Bytes) {


Note to our future-selves: we should make sure this doesn't "bleed" over to other types that also use bytes... I think Strings will show up as ValuesSource.Bytes (from ValueType.String -> ValuesSourceType.BYTES -> ValuesSource.Bytes)

Unless that's handled elsewhere and we can ignore :)

Oh, actually, we should have a more specific ValuesSource here, now that I've added a range specific choice. Nice catch.

not-napoleon · 2019-05-01T14:19:08Z

server/src/main/java/org/elasticsearch/index/fielddata/plain/DocValuesIndexFieldData.java

@@ -87,7 +88,7 @@ public Builder scriptFunction(Function<SortedSetDocValues, ScriptDocValues<?>> s
                                       CircuitBreakerService breakerService, MapperService mapperService) {
            // Ignore Circuit Breaker
            final String fieldName = fieldType.name();
-            if (BINARY_INDEX_FIELD_NAMES.contains(fieldName)) {
+            if (BINARY_INDEX_FIELD_NAMES.contains(fieldName)|| fieldType.getClass() == RangeFieldMapper.RangeFieldType.class) {


Note to self: @polyfractal and I kicked this around, and we think the correct short term solution (i.e. for this PR) is to add a rangeType() method or similar to flag this, similar to how numeric and script work, rather than relying on the field type.

…stogram Conflicts: server/src/main/java/org/elasticsearch/index/mapper/RangeFieldMapper.java server/src/test/java/org/elasticsearch/search/aggregations/pipeline/DerivativeAggregatorTests.java x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/RollupJobIdentifierUtilTests.java

…stogram

not-napoleon · 2019-06-04T14:43:24Z

server/src/main/java/org/elasticsearch/index/fielddata/plain/DocValuesIndexFieldData.java

@@ -71,6 +72,7 @@ public final Index index() {

        private NumericType numericType;
        private Function<SortedSetDocValues, ScriptDocValues<?>> scriptFunction = AbstractAtomicOrdinalsFieldData.DEFAULT_SCRIPT_FUNCTION;
+        private RangeType rangeType;


This is currently only being used as a flag, but the overhead between storing a boolean and storing an enum reference isn't high, so it seemed worth leaving ourselves access to the more robust data.

not-napoleon · 2019-06-04T15:03:29Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSource.java

@@ -179,6 +180,17 @@ public FieldData(IndexFieldData<?> indexFieldData) {
            public SortedBinaryDocValues bytesValues(LeafReaderContext context) {
                return indexFieldData.load(context).getBytesValues();
            }
+
+            public static class RangeFieldData extends FieldData {


The type hierarchy in ValuesSource is a little arcane, and I just picked something that looked reasonable.

Yeah, was wondering about this as well. Perhaps we should add another "top-level" VS instead of extending Bytes.FieldData? So we'd have a total of Numeric, Bytes, GeoPoint, WithScript and now Range ?

Not fully sure of the consequences of doing that. But it seems like Ranges are more of a distinct "thing" rather than a specialization of Bytes (even though it technically is).

Not sure, thoughts?

Having just done some more thinking on the ValuesSource refactor, I'm in agreement that Range should be a peer to Numeric, GeoPoint, etc. I'll wire that up shortly.

not-napoleon · 2019-06-04T16:00:03Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

    }

    @Override
    protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData)
            throws IOException {
-        return createAggregator(null, parent, pipelineAggregators, metaData);
+        return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
+            null, config.format(), context, parent, pipelineAggregators, metaData);


I'm almost positive this is the wrong thing to do here, but I'm not sure what the right thing to do is.

Yeah, so this is a bit of a split personality in the framework right now. Setting the valuesSource to null will use a no-op collector in the histo's getLeafCollector(). Then it uses the same histo aggregator to build an empty result.

Terms agg (and a few others) instead have an "unmapped" version of the agg (UnmappedTerms) which is used instead. This unmapped version uses the same no-op collector, and knows how to build an empty response.

I'm not sure if there is a preference or why it's a split-personality. I've pinged @colings86 to see if he knows the history here :)

polyfractal

Looking good! Did a pass and left some comments/questions, and a few style nits (sorry!). :)

polyfractal · 2019-06-05T17:03:21Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

+        if (valuesSource instanceof ValuesSource.Numeric) {
+            return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
+                (ValuesSource.Numeric) valuesSource, config.format(), context, parent, pipelineAggregators, metaData);
+        }


Style nit: else if/else should go next to last clause's bracket

polyfractal · 2019-06-05T17:09:02Z

server/src/main/java/org/elasticsearch/index/mapper/RangeType.java

+        @Override
+        public Double doubleValue(Object endpointValue) {
+            assert endpointValue instanceof Double;
+            return (Double)endpointValue;


Style nit: should have space between cast and variable

polyfractal · 2019-06-05T17:09:26Z

server/src/main/java/org/elasticsearch/index/mapper/RangeType.java

@@ -407,6 +429,11 @@ public BytesRef encodeRanges(Set<RangeFieldMapper.Range> ranges) throws IOExcept
            return LONG.decodeRanges(bytes);
        }

+        @Override
+        public Double doubleValue (Object endpointValue) {


Style nit: space between method name and arguments

polyfractal · 2019-06-05T17:53:28Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSource.java

@@ -179,6 +180,17 @@ public FieldData(IndexFieldData<?> indexFieldData) {
            public SortedBinaryDocValues bytesValues(LeafReaderContext context) {
                return indexFieldData.load(context).getBytesValues();
            }
+
+            public static class RangeFieldData extends FieldData {


Yeah, was wondering about this as well. Perhaps we should add another "top-level" VS instead of extending Bytes.FieldData? So we'd have a total of Numeric, Bytes, GeoPoint, WithScript and now Range ?

Not fully sure of the consequences of doing that. But it seems like Ranges are more of a distinct "thing" rather than a specialization of Bytes (even though it technically is).

Not sure, thoughts?

polyfractal · 2019-06-05T17:54:21Z

...in/java/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java

+    private final LongHash bucketOrds;
+
+    RangeHistogramAggregator(String name, AggregatorFactories factories, double interval, double offset,
+                               BucketOrder order, boolean keyed, long minDocCount, double minBound, double maxBound,


The nittiest of nits: spacing of the rest of the args aren't consistent with the first line :)

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceType.java

polyfractal · 2019-06-05T18:23:11Z

.../java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

    }

    @Override
    protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData)
            throws IOException {
-        return createAggregator(null, parent, pipelineAggregators, metaData);
+        return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
+            null, config.format(), context, parent, pipelineAggregators, metaData);


Yeah, so this is a bit of a split personality in the framework right now. Setting the valuesSource to null will use a no-op collector in the histo's getLeafCollector(). Then it uses the same histo aggregator to build an empty result.

Terms agg (and a few others) instead have an "unmapped" version of the agg (UnmappedTerms) which is used instead. This unmapped version uses the same no-op collector, and knows how to build an empty response.

I'm not sure if there is a preference or why it's a split-personality. I've pinged @colings86 to see if he knows the history here :)

...va/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregatorTests.java

…stogram

polyfractal

Left two questions, I think this is good to go once the ANY/missing stuff is integrated! Gimme a ping once that gets merged into this PR and I'll do a quick skim. 👍

polyfractal · 2019-07-01T14:42:50Z

...in/java/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java

+                assert bucket == 0;
+                if (values.advanceExact(doc)) {
+                    // Is it possible for valuesCount to be > 1 here? Multiple ranges are encoded into the same BytesRef in the binary doc
+                    // values, so it isn't clear what we'd be iterating over.


polyfractal · 2019-07-01T14:45:27Z

...in/java/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java

+
+                    for (int i = 0; i < valuesCount; i++) {
+                        BytesRef encodedRanges = values.nextValue();
+                        // This list should be sorted by start-of-range, I think?


Will things go pear shaped if we ever get non-sorted ranges? Should we toss an assertion in here?

Or does it not particularly matter what order the ranges come in?

It matters, but the encoding sorts them. Doesn't hurt to throw an assert in though, I'll do that.

Conflicts: server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

…stogram Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregationBuilder.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java

not-napoleon · 2019-07-11T19:02:07Z

@elasticmachine run elasticsearch-ci/bwc

not-napoleon · 2019-07-12T16:45:43Z

@elasticmachine update branch

…stogram

polyfractal

Think there are a few out-of-date comments, and curious about comments left in last review for RangeHistogramAggregator (about assertions, sorted input), but otherwise LGTM! 🎉

polyfractal · 2019-07-15T16:36:35Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java

@@ -252,6 +269,7 @@ public VS toValuesSource(QueryShardContext context, Function<Object, ValuesSourc
            } else if (valueSourceType() == ValuesSourceType.ANY) {
                vs = (VS) resolveMissingAny.apply(missing());
            } else {
+                // TODO: Do we need a missing case for Range values type?


Don't think we need this comment anymore?

polyfractal · 2019-07-15T16:37:04Z

server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java

@@ -293,6 +311,7 @@ private VS originalValuesSource() {
            if (valueSourceType() == ValuesSourceType.BYTES) {
                return (VS) bytesScript();
            }
+            // TODO: Do we need a range script case?


Ditto, think this is outdated?

polyfractal · 2019-07-16T13:58:13Z

🎉

not-napoleon added 6 commits March 14, 2019 14:12

Enable range field type

5c09a48

very minimal range histo test

6d986f1

Forking the internal histogram implementations based on field type

b9f906e

add apache license to new files

4ad0a0c

outline for leaf bucket collector

0b8c814

ValuesSource for Range doc values

40a60eb

not-napoleon added WIP :Analytics/Aggregations Aggregations v8.0.0 labels Apr 25, 2019

Copy numeric histogram build aggregation methods for range histogram

164c40e

$@polyfractal$ polyfractal mentioned this pull request Apr 29, 2019

Aggregations on Range Fields #34644

Closed

11 tasks

$polyfractal$

polyfractal reviewed Apr 30, 2019

View reviewed changes

More range values source stuff

7d93c56

not-napoleon commented May 1, 2019

View reviewed changes

not-napoleon added 2 commits May 7, 2019 11:05

Use the RangeFieldData values source in the RangeHistogram

10fee2e

not-napoleon changed the base branch from master to feature-range-aggregations May 7, 2019 19:20

not-napoleon added 9 commits May 9, 2019 09:27

Wire up decoder logic

c9d36a7

Merge branch 'feature-range-aggregations' into feature/range-field-hi…

28a1cfc

…stogram

Put endpoints in the right order

d962c1e

release bucketOrds hash when cleaning up

d93139c

Clean up DocValuesIndexFieldData kludge

b13b0ea

Fix histogram serialization

9af8074

fix failing tests

40d3190

reject histograms over IPs

9b7448a

Clean up ValueType kludge from prototype phase

37a50ad

not-napoleon commented Jun 4, 2019

View reviewed changes

Docs and small cleanup

e90a192

not-napoleon commented Jun 4, 2019

View reviewed changes

$polyfractal$

polyfractal reviewed Jun 5, 2019

View reviewed changes

not-napoleon added 9 commits June 6, 2019 14:21

Fix nits

2ddb99d

Make Range a top level value source

076cdad

Support for offsets in RangeHistogram

ea2f9cc

Test for minDocCount

0b3208d

Support for multiple ranges on one doc

0933933

Merge branch 'feature-range-aggregations' into feature/range-field-hi…

f016c01

…stogram

ValuesSource serialization test

b48efdc

Better toString() implementations for debugging missing values

db9df49

Merge branch 'feature-range-aggregations' into feature/range-field-hi…

b63c395

…stogram

$polyfractal$

polyfractal reviewed Jul 1, 2019

View reviewed changes

not-napoleon added 5 commits July 3, 2019 16:07

Merge branch 'master' into feature/range-field-histogram

0e5a901

Conflicts: server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java

Fix test for unmapped missing

78bce8d

More test fixes

fa5db26

Fix BCW serialization issue

ac3e717

elasticmachine and others added 2 commits July 13, 2019 02:45

Merge branch 'feature-range-aggregations' into feature/range-field-hi…

56b0641

…stogram

Merge branch 'feature-range-aggregations' into feature/range-field-hi…

854cc86

…stogram

$polyfractal$

polyfractal approved these changes Jul 15, 2019

View reviewed changes

not-napoleon mentioned this pull request Jul 15, 2019

Support Missing Range Values #44381

Closed

Response to PR feedback

4da53b8

not-napoleon merged commit 1d83f4d into elastic:feature-range-aggregations Jul 16, 2019

not-napoleon deleted the feature/range-field-histogram branch July 16, 2019 13:36

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range Field Histogram Aggregation #41545

Range Field Histogram Aggregation #41545

not-napoleon commented Apr 25, 2019 •

edited

Loading

elasticmachine commented Apr 25, 2019

$@polyfractal$ polyfractal Apr 30, 2019

not-napoleon Apr 30, 2019

not-napoleon May 1, 2019

not-napoleon Jun 4, 2019

not-napoleon Jun 4, 2019

$@polyfractal$ polyfractal Jun 5, 2019

not-napoleon Jun 6, 2019

not-napoleon Jun 4, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal Jun 5, 2019

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal Jul 1, 2019

$@polyfractal$ polyfractal Jul 1, 2019

not-napoleon Jul 15, 2019

not-napoleon commented Jul 11, 2019

not-napoleon commented Jul 12, 2019

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal Jul 15, 2019

$@polyfractal$ polyfractal Jul 15, 2019

polyfractal commented Jul 16, 2019

Range Field Histogram Aggregation #41545

Range Field Histogram Aggregation #41545

Conversation

not-napoleon commented Apr 25, 2019 • edited Loading

elasticmachine commented Apr 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

not-napoleon commented Jul 11, 2019

not-napoleon commented Jul 12, 2019

polyfractal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polyfractal commented Jul 16, 2019

not-napoleon commented Apr 25, 2019 •

edited

Loading

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal left a comment