Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range Field Histogram Aggregation #41545

Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
5c09a48
Enable range field type
not-napoleon Mar 14, 2019
6d986f1
very minimal range histo test
not-napoleon Mar 14, 2019
b9f906e
Forking the internal histogram implementations based on field type
not-napoleon Mar 26, 2019
4ad0a0c
add apache license to new files
not-napoleon Mar 26, 2019
0b8c814
outline for leaf bucket collector
not-napoleon Mar 27, 2019
40a60eb
ValuesSource for Range doc values
not-napoleon Apr 24, 2019
164c40e
Copy numeric histogram build aggregation methods for range histogram
not-napoleon Apr 29, 2019
7d93c56
More range values source stuff
not-napoleon May 1, 2019
10fee2e
Use the RangeFieldData values source in the RangeHistogram
not-napoleon May 7, 2019
4cfcebe
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon May 7, 2019
c9d36a7
Wire up decoder logic
not-napoleon May 9, 2019
28a1cfc
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon May 9, 2019
d962c1e
Put endpoints in the right order
not-napoleon May 9, 2019
d93139c
release bucketOrds hash when cleaning up
not-napoleon May 9, 2019
b13b0ea
Clean up DocValuesIndexFieldData kludge
not-napoleon May 10, 2019
9af8074
Fix histogram serialization
not-napoleon May 10, 2019
40d3190
fix failing tests
not-napoleon May 31, 2019
9b7448a
reject histograms over IPs
not-napoleon May 31, 2019
37a50ad
Clean up ValueType kludge from prototype phase
not-napoleon Jun 3, 2019
e90a192
Docs and small cleanup
not-napoleon Jun 4, 2019
2ddb99d
Fix nits
not-napoleon Jun 6, 2019
076cdad
Make Range a top level value source
not-napoleon Jun 6, 2019
ea2f9cc
Support for offsets in RangeHistogram
not-napoleon Jun 13, 2019
0b3208d
Test for minDocCount
not-napoleon Jun 17, 2019
0933933
Support for multiple ranges on one doc
not-napoleon Jun 18, 2019
f016c01
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon Jun 18, 2019
b48efdc
ValuesSource serialization test
not-napoleon Jun 18, 2019
db9df49
Better toString() implementations for debugging missing values
not-napoleon Jun 20, 2019
b63c395
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon Jun 20, 2019
0e5a901
Merge branch 'master' into feature/range-field-histogram
not-napoleon Jul 3, 2019
78bce8d
Fix test for unmapped missing
not-napoleon Jul 3, 2019
fa5db26
More test fixes
not-napoleon Jul 9, 2019
0183469
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon Jul 10, 2019
ac3e717
Fix BCW serialization issue
not-napoleon Jul 11, 2019
56b0641
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
elasticmachine Jul 12, 2019
854cc86
Merge branch 'feature-range-aggregations' into feature/range-field-hi…
not-napoleon Jul 15, 2019
4da53b8
Response to PR feedback
not-napoleon Jul 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.MapperService;
import org.elasticsearch.index.mapper.RangeType;
import org.elasticsearch.indices.breaker.CircuitBreakerService;

import java.util.Set;
Expand Down Expand Up @@ -71,6 +72,7 @@ public static class Builder implements IndexFieldData.Builder {

private NumericType numericType;
private Function<SortedSetDocValues, ScriptDocValues<?>> scriptFunction = AbstractAtomicOrdinalsFieldData.DEFAULT_SCRIPT_FUNCTION;
private RangeType rangeType;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently only being used as a flag, but the overhead between storing a boolean and storing an enum reference isn't high, so it seemed worth leaving ourselves access to the more robust data.


public Builder numericType(NumericType type) {
this.numericType = type;
Expand All @@ -82,12 +84,17 @@ public Builder scriptFunction(Function<SortedSetDocValues, ScriptDocValues<?>> s
return this;
}

public Builder setRangeType(RangeType rangeType) {
this.rangeType = rangeType;
return this;
}

@Override
public IndexFieldData<?> build(IndexSettings indexSettings, MappedFieldType fieldType, IndexFieldDataCache cache,
CircuitBreakerService breakerService, MapperService mapperService) {
// Ignore Circuit Breaker
final String fieldName = fieldType.name();
if (BINARY_INDEX_FIELD_NAMES.contains(fieldName)) {
if (BINARY_INDEX_FIELD_NAMES.contains(fieldName) || rangeType != null) {
assert numericType == null;
return new BinaryDVIndexFieldData(indexSettings.getIndex(), fieldName);
} else if (numericType != null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentParser;
import org.elasticsearch.common.xcontent.support.XContentMapValues;
import org.elasticsearch.index.fielddata.IndexFieldData;
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData;
import org.elasticsearch.index.query.QueryShardContext;

import java.io.IOException;
Expand Down Expand Up @@ -210,6 +212,8 @@ public static final class RangeFieldType extends MappedFieldType {
}
}

public RangeType rangeType() { return rangeType; }

@Override
public MappedFieldType clone() {
return new RangeFieldType(this);
Expand All @@ -230,6 +234,12 @@ public int hashCode() {
return Objects.hash(super.hashCode(), rangeType, dateTimeFormatter);
}

@Override
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
failIfNoDocValues();
return new DocValuesIndexFieldData.Builder().setRangeType(rangeType);
}

@Override
public String typeName() {
return rangeType.name;
Expand Down Expand Up @@ -468,6 +478,14 @@ public String toString() {
sb.append(includeTo ? ']' : ')');
return sb.toString();
}

public Object getFrom() {
return from;
}

public Object getTo() {
return to;
}
}

static class BinaryRangesDocValuesField extends CustomDocValuesField {
Expand Down
46 changes: 46 additions & 0 deletions server/src/main/java/org/elasticsearch/index/mapper/RangeType.java
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,11 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
throw new UnsupportedOperationException();
}

@Override
public Double doubleValue (Object endpointValue) {
throw new UnsupportedOperationException("IP ranges cannot be safely converted to doubles");
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -208,6 +213,11 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
return LONG.decodeRanges(bytes);
}

@Override
public Double doubleValue (Object endpointValue) {
return LONG.doubleValue(endpointValue);
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -274,6 +284,12 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
return BinaryRangeUtil.decodeFloatRanges(bytes);
}

@Override
public Double doubleValue(Object endpointValue) {
assert endpointValue instanceof Float;
return ((Float) endpointValue).doubleValue();
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -339,6 +355,12 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
return BinaryRangeUtil.decodeDoubleRanges(bytes);
}

@Override
public Double doubleValue(Object endpointValue) {
assert endpointValue instanceof Double;
return (Double) endpointValue;
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -407,6 +429,11 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
return LONG.decodeRanges(bytes);
}

@Override
public Double doubleValue(Object endpointValue) {
return LONG.doubleValue(endpointValue);
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -461,6 +488,12 @@ public List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes) {
return BinaryRangeUtil.decodeLongRanges(bytes);
}

@Override
public Double doubleValue(Object endpointValue) {
assert endpointValue instanceof Long;
return ((Long) endpointValue).doubleValue();
}

@Override
public Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to, boolean includeFrom,
boolean includeTo) {
Expand Down Expand Up @@ -621,6 +654,19 @@ public Query rangeQuery(String field, boolean hasDocValues, Object from, Object
public abstract BytesRef encodeRanges(Set<RangeFieldMapper.Range> ranges) throws IOException;
public abstract List<RangeFieldMapper.Range> decodeRanges(BytesRef bytes);

/**
* Given the Range.to or Range.from Object value from a Range instance, converts that value into a Double. Before converting, it
* asserts that the object is of the expected type. Operation is not supported on IP ranges (because of loss of precision)
*
* @param endpointValue Object value for Range.to or Range.from
* @return endpointValue as a Double
*/
public abstract Double doubleValue(Object endpointValue);

public boolean isNumeric() {
return numberType != null;
}

public abstract Query dvRangeQuery(String field, BinaryDocValuesRangeQuery.QueryType queryType, Object from, Object to,
boolean includeFrom, boolean includeTo);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,7 @@
import org.elasticsearch.search.aggregations.InternalOrder;
import org.elasticsearch.search.aggregations.InternalOrder.CompoundOrder;
import org.elasticsearch.search.aggregations.bucket.MultiBucketAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValueType;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
import org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder;
import org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory;
import org.elasticsearch.search.aggregations.support.ValuesSourceConfig;
Expand All @@ -48,9 +46,10 @@
import java.util.Objects;

/**
* A builder for histograms on numeric fields.
* A builder for histograms on numeric fields. This builder can operate on either base numeric fields, or numeric range fields. IP range
* fields are unsupported, and will throw at the factory layer.
*/
public class HistogramAggregationBuilder extends ValuesSourceAggregationBuilder<ValuesSource.Numeric, HistogramAggregationBuilder>
public class HistogramAggregationBuilder extends ValuesSourceAggregationBuilder<ValuesSource, HistogramAggregationBuilder>
implements MultiBucketAggregationBuilder {
public static final String NAME = "histogram";

Expand All @@ -65,7 +64,7 @@ public class HistogramAggregationBuilder extends ValuesSourceAggregationBuilder<
private static final ObjectParser<HistogramAggregationBuilder, Void> PARSER;
static {
PARSER = new ObjectParser<>(HistogramAggregationBuilder.NAME);
ValuesSourceParserHelper.declareNumericFields(PARSER, true, true, false);
ValuesSourceParserHelper.declareAnyFields(PARSER, true, true);

PARSER.declareDouble(HistogramAggregationBuilder::interval, Histogram.INTERVAL_FIELD);

Expand Down Expand Up @@ -97,7 +96,7 @@ public static HistogramAggregationBuilder parse(String aggregationName, XContent

/** Create a new builder with the given name. */
public HistogramAggregationBuilder(String name) {
super(name, ValuesSourceType.NUMERIC, ValueType.DOUBLE);
super(name, ValuesSourceType.ANY, null);
}

protected HistogramAggregationBuilder(HistogramAggregationBuilder clone, Builder factoriesBuilder, Map<String, Object> metaData) {
Expand All @@ -118,7 +117,7 @@ protected AggregationBuilder shallowCopy(Builder factoriesBuilder, Map<String, O

/** Read from a stream, for internal use only. */
public HistogramAggregationBuilder(StreamInput in) throws IOException {
super(in, ValuesSourceType.NUMERIC, ValueType.DOUBLE);
super(in, ValuesSourceType.ANY);
order = InternalOrder.Streams.readHistogramOrder(in);
keyed = in.readBoolean();
minDocCount = in.readVLong();
Expand All @@ -128,6 +127,11 @@ public HistogramAggregationBuilder(StreamInput in) throws IOException {
maxBound = in.readDouble();
}

@Override
protected boolean serializeTargetValueType() {
return true;
}

@Override
protected void innerWriteTo(StreamOutput out) throws IOException {
InternalOrder.Streams.writeHistogramOrder(order, out);
Expand Down Expand Up @@ -295,7 +299,7 @@ public String getType() {
}

@Override
protected ValuesSourceAggregatorFactory<Numeric, ?> innerBuild(SearchContext context, ValuesSourceConfig<Numeric> config,
protected ValuesSourceAggregatorFactory<ValuesSource, ?> innerBuild(SearchContext context, ValuesSourceConfig<ValuesSource> config,
AggregatorFactory<?> parent, Builder subFactoriesBuilder) throws IOException {
return new HistogramAggregatorFactory(name, config, interval, offset, order, keyed, minDocCount, minBound, maxBound,
context, parent, subFactoriesBuilder, metaData);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,9 @@
import org.elasticsearch.search.aggregations.Aggregator;
import org.elasticsearch.search.aggregations.AggregatorFactories;
import org.elasticsearch.search.aggregations.AggregatorFactory;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.BucketOrder;
import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator;
import org.elasticsearch.search.aggregations.support.ValuesSource;
import org.elasticsearch.search.aggregations.support.ValuesSource.Numeric;
import org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory;
import org.elasticsearch.search.aggregations.support.ValuesSourceConfig;
import org.elasticsearch.search.internal.SearchContext;
Expand All @@ -34,15 +33,19 @@
import java.util.List;
import java.util.Map;

public final class HistogramAggregatorFactory extends ValuesSourceAggregatorFactory<ValuesSource.Numeric, HistogramAggregatorFactory> {
/**
* Constructs the per-shard aggregator instance for histogram aggregation. Selects the numeric or range field implementation based on the
* field type.
*/
public final class HistogramAggregatorFactory extends ValuesSourceAggregatorFactory<ValuesSource, HistogramAggregatorFactory> {

private final double interval, offset;
private final BucketOrder order;
private final boolean keyed;
private final long minDocCount;
private final double minBound, maxBound;

public HistogramAggregatorFactory(String name, ValuesSourceConfig<Numeric> config, double interval, double offset,
public HistogramAggregatorFactory(String name, ValuesSourceConfig<ValuesSource> config, double interval, double offset,
BucketOrder order, boolean keyed, long minDocCount, double minBound, double maxBound,
SearchContext context, AggregatorFactory<?> parent,
AggregatorFactories.Builder subFactoriesBuilder, Map<String, Object> metaData) throws IOException {
Expand All @@ -61,24 +64,34 @@ public long minDocCount() {
}

@Override
protected Aggregator doCreateInternal(ValuesSource.Numeric valuesSource, Aggregator parent, boolean collectsFromSingleBucket,
protected Aggregator doCreateInternal(ValuesSource valuesSource, Aggregator parent, boolean collectsFromSingleBucket,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
if (collectsFromSingleBucket == false) {
return asMultiBucketAggregator(this, context, parent);
}
return createAggregator(valuesSource, parent, pipelineAggregators, metaData);
}

private Aggregator createAggregator(ValuesSource.Numeric valuesSource, Aggregator parent, List<PipelineAggregator> pipelineAggregators,
Map<String, Object> metaData) throws IOException {

return new HistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound, valuesSource,
config.format(), context, parent, pipelineAggregators, metaData);
if (valuesSource instanceof ValuesSource.Numeric) {
return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
(ValuesSource.Numeric) valuesSource, config.format(), context, parent, pipelineAggregators, metaData);
} else if (valuesSource instanceof ValuesSource.Range) {
ValuesSource.Range rangeValueSource = (ValuesSource.Range) valuesSource;
if (rangeValueSource.rangeType().isNumeric() == false) {
throw new IllegalArgumentException("Expected numeric range type but found non-numeric range ["
+ rangeValueSource.rangeType().name + "]");
}
return new RangeHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
(ValuesSource.Range) valuesSource, config.format(), context, parent, pipelineAggregators,
metaData);
}
else {
throw new IllegalArgumentException("Expected one of [Numeric, Range] values source, found ["
+ valuesSource.toString() + "]");
}
}

@Override
protected Aggregator createUnmapped(Aggregator parent, List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData)
throws IOException {
return createAggregator(null, parent, pipelineAggregators, metaData);
return new NumericHistogramAggregator(name, factories, interval, offset, order, keyed, minDocCount, minBound, maxBound,
null, config.format(), context, parent, pipelineAggregators, metaData);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost positive this is the wrong thing to do here, but I'm not sure what the right thing to do is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so this is a bit of a split personality in the framework right now. Setting the valuesSource to null will use a no-op collector in the histo's getLeafCollector(). Then it uses the same histo aggregator to build an empty result.

Terms agg (and a few others) instead have an "unmapped" version of the agg (UnmappedTerms) which is used instead. This unmapped version uses the same no-op collector, and knows how to build an empty response.

I'm not sure if there is a preference or why it's a split-personality. I've pinged @colings86 to see if he knows the history here :)

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
* written as {@code interval * x + offset} and yet is less than or equal to
* {@code value}.
*/
class HistogramAggregator extends BucketsAggregator {
class NumericHistogramAggregator extends BucketsAggregator {

private final ValuesSource.Numeric valuesSource;
private final DocValueFormat formatter;
Expand All @@ -64,11 +64,11 @@ class HistogramAggregator extends BucketsAggregator {

private final LongHash bucketOrds;

HistogramAggregator(String name, AggregatorFactories factories, double interval, double offset,
BucketOrder order, boolean keyed, long minDocCount, double minBound, double maxBound,
@Nullable ValuesSource.Numeric valuesSource, DocValueFormat formatter,
SearchContext context, Aggregator parent,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {
NumericHistogramAggregator(String name, AggregatorFactories factories, double interval, double offset,
BucketOrder order, boolean keyed, long minDocCount, double minBound, double maxBound,
@Nullable ValuesSource.Numeric valuesSource, DocValueFormat formatter,
SearchContext context, Aggregator parent,
List<PipelineAggregator> pipelineAggregators, Map<String, Object> metaData) throws IOException {

super(name, factories, context, parent, pipelineAggregators, metaData);
if (interval <= 0) {
Expand Down
Loading