-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range Field Histogram Aggregation #41545
Range Field Histogram Aggregation #41545
Conversation
Pinging @elastic/es-analytics-geo |
if (valuesSource instanceof ValuesSource.Numeric) { | ||
return createAggregator((ValuesSource.Numeric) valuesSource, parent, pipelineAggregators, metaData); | ||
} | ||
else if (valuesSource instanceof ValuesSource.Bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to our future-selves: we should make sure this doesn't "bleed" over to other types that also use bytes... I think Strings will show up as ValuesSource.Bytes
(from ValueType.String
-> ValuesSourceType.BYTES
-> ValuesSource.Bytes
)
Unless that's handled elsewhere and we can ignore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, we should have a more specific ValuesSource
here, now that I've added a range specific choice. Nice catch.
@@ -87,7 +88,7 @@ public Builder scriptFunction(Function<SortedSetDocValues, ScriptDocValues<?>> s | |||
CircuitBreakerService breakerService, MapperService mapperService) { | |||
// Ignore Circuit Breaker | |||
final String fieldName = fieldType.name(); | |||
if (BINARY_INDEX_FIELD_NAMES.contains(fieldName)) { | |||
if (BINARY_INDEX_FIELD_NAMES.contains(fieldName)|| fieldType.getClass() == RangeFieldMapper.RangeFieldType.class) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: @polyfractal and I kicked this around, and we think the correct short term solution (i.e. for this PR) is to add a rangeType()
method or similar to flag this, similar to how numeric and script work, rather than relying on the field type.
…stogram Conflicts: server/src/main/java/org/elasticsearch/index/mapper/RangeFieldMapper.java server/src/test/java/org/elasticsearch/search/aggregations/pipeline/DerivativeAggregatorTests.java x-pack/plugin/rollup/src/test/java/org/elasticsearch/xpack/rollup/RollupJobIdentifierUtilTests.java
@@ -71,6 +72,7 @@ public final Index index() { | |||
|
|||
private NumericType numericType; | |||
private Function<SortedSetDocValues, ScriptDocValues<?>> scriptFunction = AbstractAtomicOrdinalsFieldData.DEFAULT_SCRIPT_FUNCTION; | |||
private RangeType rangeType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently only being used as a flag, but the overhead between storing a boolean and storing an enum reference isn't high, so it seemed worth leaving ourselves access to the more robust data.
@@ -179,6 +180,17 @@ public FieldData(IndexFieldData<?> indexFieldData) { | |||
public SortedBinaryDocValues bytesValues(LeafReaderContext context) { | |||
return indexFieldData.load(context).getBytesValues(); | |||
} | |||
|
|||
public static class RangeFieldData extends FieldData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hierarchy in ValuesSource is a little arcane, and I just picked something that looked reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was wondering about this as well. Perhaps we should add another "top-level" VS instead of extending Bytes.FieldData
? So we'd have a total of Numeric
, Bytes
, GeoPoint
, WithScript
and now Range
?
Not fully sure of the consequences of doing that. But it seems like Ranges are more of a distinct "thing" rather than a specialization of Bytes
(even though it technically is).
Not sure, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having just done some more thinking on the ValuesSource
refactor, I'm in agreement that Range should be a peer to Numeric
, GeoPoint
, etc. I'll wire that up shortly.
} | ||
|
||
@Override | ||
protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) | ||
throws IOException { | ||
return createAggregator(null, parent, pipelineAggregators, metaData); | ||
return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound, | ||
null, config.format(), context, parent, pipelineAggregators, metaData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost positive this is the wrong thing to do here, but I'm not sure what the right thing to do is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so this is a bit of a split personality in the framework right now. Setting the valuesSource to null will use a no-op collector in the histo's getLeafCollector()
. Then it uses the same histo aggregator to build an empty result.
Terms agg (and a few others) instead have an "unmapped" version of the agg (UnmappedTerms
) which is used instead. This unmapped version uses the same no-op collector, and knows how to build an empty response.
I'm not sure if there is a preference or why it's a split-personality. I've pinged @colings86 to see if he knows the history here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Did a pass and left some comments/questions, and a few style nits (sorry!). :)
if (valuesSource instanceof ValuesSource.Numeric) { | ||
return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound, | ||
(ValuesSource.Numeric) valuesSource, config.format(), context, parent, pipelineAggregators, metaData); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: else if
/else
should go next to last clause's bracket
@Override | ||
public Double doubleValue(Object endpointValue) { | ||
assert endpointValue instanceof Double; | ||
return (Double)endpointValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: should have space between cast and variable
@@ -407,6 +429,11 @@ public BytesRef encodeRanges(Set<RangeFieldMapper.Range> ranges) throws IOExcept | |||
return LONG.decodeRanges(bytes); | |||
} | |||
|
|||
@Override | |||
public Double doubleValue (Object endpointValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: space between method name and arguments
@@ -179,6 +180,17 @@ public FieldData(IndexFieldData<?> indexFieldData) { | |||
public SortedBinaryDocValues bytesValues(LeafReaderContext context) { | |||
return indexFieldData.load(context).getBytesValues(); | |||
} | |||
|
|||
public static class RangeFieldData extends FieldData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was wondering about this as well. Perhaps we should add another "top-level" VS instead of extending Bytes.FieldData
? So we'd have a total of Numeric
, Bytes
, GeoPoint
, WithScript
and now Range
?
Not fully sure of the consequences of doing that. But it seems like Ranges are more of a distinct "thing" rather than a specialization of Bytes
(even though it technically is).
Not sure, thoughts?
private final LongHash bucketOrds; | ||
|
||
RangeHistogramAggregator(String name, AggregatorFactories factories, double interval, double offset, | ||
BucketOrder order, boolean keyed, long minDocCount, double minBound, double maxBound, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nittiest of nits: spacing of the rest of the args aren't consistent with the first line :)
server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceType.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) | ||
throws IOException { | ||
return createAggregator(null, parent, pipelineAggregators, metaData); | ||
return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound, | ||
null, config.format(), context, parent, pipelineAggregators, metaData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so this is a bit of a split personality in the framework right now. Setting the valuesSource to null will use a no-op collector in the histo's getLeafCollector()
. Then it uses the same histo aggregator to build an empty result.
Terms agg (and a few others) instead have an "unmapped" version of the agg (UnmappedTerms
) which is used instead. This unmapped version uses the same no-op collector, and knows how to build an empty response.
I'm not sure if there is a preference or why it's a split-personality. I've pinged @colings86 to see if he knows the history here :)
...va/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregatorTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two questions, I think this is good to go once the ANY/missing stuff is integrated! Gimme a ping once that gets merged into this PR and I'll do a quick skim. 👍
assert bucket == 0; | ||
if (values.advanceExact(doc)) { | ||
// Is it possible for valuesCount to be > 1 here? Multiple ranges are encoded into the same BytesRef in the binary doc | ||
// values, so it isn't clear what we'd be iterating over. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^^ ?
|
||
for (int i = 0; i < valuesCount; i++) { | ||
BytesRef encodedRanges = values.nextValue(); | ||
// This list should be sorted by start-of-range, I think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will things go pear shaped if we ever get non-sorted ranges? Should we toss an assertion in here?
Or does it not particularly matter what order the ranges come in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It matters, but the encoding sorts them. Doesn't hurt to throw an assert in though, I'll do that.
Conflicts: server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/NumericHistogramAggregatorTests.java
…stogram Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregationBuilder.java server/src/main/java/org/elasticsearch/search/aggregations/bucket/histogram/HistogramAggregatorFactory.java
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think there are a few out-of-date comments, and curious about comments left in last review for RangeHistogramAggregator
(about assertions, sorted input), but otherwise LGTM! 🎉
@@ -252,6 +269,7 @@ public VS toValuesSource(QueryShardContext context, Function<Object, ValuesSourc | |||
} else if (valueSourceType() == ValuesSourceType.ANY) { | |||
vs = (VS) resolveMissingAny.apply(missing()); | |||
} else { | |||
// TODO: Do we need a missing case for Range values type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need this comment anymore?
@@ -293,6 +311,7 @@ private VS originalValuesSource() { | |||
if (valueSourceType() == ValuesSourceType.BYTES) { | |||
return (VS) bytesScript(); | |||
} | |||
// TODO: Do we need a range script case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, think this is outdated?
🎉 |
Work in progress
TODO:
[x] Range ValuesSource
[x] Fork the
HistogramAggregationFactory
to return different concrete classes based on field type (similar to Terms agg)[x] Binary decode logic for ranges - see #41206
[x] Implement RangeHistogramAggregation builder logic
[x] Implement RangeHistogramAggregation leaf collector logic
[x] Deal with IP range related edge cases
[ ] Yaml & Doc tests