-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Histogram field mapper that supports percentiles aggregations. #48580
Changes from 25 commits
c4bfdb7
550394c
9d4f9c4
4e3eed7
a168d32
038d429
edc2faf
c527aec
bd59238
71886a8
793a257
579c05c
af1249f
1cb8f53
93229e5
996f8fc
edec448
adf12a4
3c5892e
19f15a2
1f6383d
fe039ee
40f679d
79f7fd9
fbabf1c
f1a1ead
c8a1f12
0045a8b
f8cf1a7
2e8649a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,9 @@ | |
=== Percentile Ranks Aggregation | ||
|
||
A `multi-value` metrics aggregation that calculates one or more percentile ranks | ||
over numeric values extracted from the aggregated documents. These values | ||
can be extracted either from specific numeric fields in the documents, or | ||
be generated by a provided script. | ||
over numeric values extracted from the aggregated documents. These values can be | ||
generated by a provided script or extracted from specific numeric or histogram | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a link to the histogram field? |
||
fields in the documents. | ||
|
||
[NOTE] | ||
================================================== | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[histogram]] | ||
=== Histogram datatype | ||
++++ | ||
<titleabbrev>Histogram</titleabbrev> | ||
++++ | ||
|
||
A field to store pre-aggregated numerical data representing a histogram. | ||
This data is defined using two paired arrays: | ||
|
||
* A `values` array of <<number, `double`>> numbers, representing the buckets for | ||
the histogram. These values must be provided in ascending order. | ||
* A corresponding `counts` array of <<number, `integer`>> numbers, representing how | ||
many values fall into each bucket. These numbers must be positive or zero. | ||
|
||
Because the elements in the `values` array correspond to the elements in the | ||
same position of the `count` array, these two arrays must have the same length. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not strictly needed for MVP, but it might be nice to add some text for context and split up the index creation and document indexing snippets. For example:
Providing an example use case for the data may also be helpful. For example, the histograms could represent load time, similar to the percentile aggs docs. Not required for MVP though. |
||
[IMPORTANT] | ||
======== | ||
* A `histogram` field can only store a single pair of `values` and `count` arrays | ||
per document. Nested arrays are not supported. | ||
* `histogram` fields do not support sorting. | ||
======== | ||
|
||
[[histogram-uses]] | ||
==== Uses | ||
|
||
`histogram` fields are primarily intended for use with aggregations. To make it | ||
more readily accessible for aggregations, `histogram` field data is stored as a | ||
binary <<doc-values,doc values>> and not indexed. Its size in bytes is at most | ||
`12 * numValues`, where `numValues` is the length of the provided arrays. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's actually 13 since vints can take up to 5 bytes. |
||
|
||
Because the data is not indexed, you only can use `histogram` fields for the | ||
following aggregations and queries: | ||
|
||
* <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation | ||
* <<search-aggregations-metrics-percentile-rank-aggregation,percentile ranks>> aggregation | ||
* <<query-dsl-exists-query,exists>> query | ||
|
||
We recommend you define the buckets in the `values` array based on the type of aggregation you intended to use. | ||
|
||
[[mapping-types-histogram-building-histogram]] | ||
==== Building a histogram | ||
|
||
When using a histogram as part of an aggregation, the accuracy of the results will depend on how the | ||
histogram was constructed. It is important to consider the percentiles aggregation mode that will be used | ||
to build it. Some possibilities include: | ||
|
||
- For the <<search-aggregations-metrics-percentile-aggregation, T-Digest>> mode, histograms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, trying to tweak this a little to make it more explicit, so the user knows what the value/count fields do.
WDYT? |
||
can be built by using the mean value of the centroids and the centroid's count. If the algorithm has already | ||
started to approximate the percentiles, this inaccuracy is carried over in the histogram. | ||
|
||
- For the <<_hdr_histogram,High Dynamic Range (HDR)>> histogram mode, histograms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly,
?? |
||
can be created by using the recorded values and the count at that value. This implementation maintains a fixed worse-case | ||
percentage error (specified as a number of significant digits), therefore the value used when generating the histogram | ||
would be the maximum accuracy you can achieve at aggregation time. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps another sentence/paragraph at the end?
Or something similar... trying to convey to the user that how they index the data is important and they should chose upfront. |
||
[[histogram-ex]] | ||
==== Examples | ||
|
||
The following <<indices-create-index, create index>> API request creates a new index with two field mappings: | ||
|
||
* `my_histogram`, a `histogram` field used to store percentile data | ||
* `my_text`, a `keyword` field used to store a title for the histogram | ||
|
||
[ INSERT CREATE INDEX SNIPPET ] | ||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"my_histogram": { | ||
"type" : "histogram" | ||
}, | ||
"my_text" : { | ||
"type" : "keyword" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
|
||
The following <<docs-index_,index>> API requests store pre-aggregated for | ||
two histograms: `histogram_1` and `histogram_2`. | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
PUT my_index/_doc/1 | ||
{ | ||
"my_text" : "histogram_1", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1> | ||
"counts" : [3, 7, 23, 12, 6] <2> | ||
} | ||
} | ||
|
||
PUT my_index/_doc/2 | ||
{ | ||
"my_text" : "histogram_2", | ||
"my_histogram" : { | ||
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5], <1> | ||
"counts" : [8, 17, 8, 7, 6, 2] <2> | ||
} | ||
} | ||
-------------------------------------------------- | ||
<1> Values for each bucket. Values in the array are treated as doubles and must be given in | ||
increasing order. For <<search-aggregations-metrics-percentile-aggregation-approximation, T-Digest>> | ||
histograms this value represents the mean value. In case of HDR histograms this represents the value iterated to. | ||
<2> Count for each bucket. Values in the arrays are treated as integers and must be positive or zero. | ||
Negative values will be rejected. The relation between a bucket and a count is given by the position in the array. | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.elasticsearch.index.fielddata; | ||
|
||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* {@link AtomicFieldData} specialization for histogram data. | ||
*/ | ||
public interface AtomicHistogramFieldData extends AtomicFieldData { | ||
|
||
/** | ||
* Return Histogram values. | ||
*/ | ||
HistogramValues getHistogramValues() throws IOException; | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Per-document histogram value. Every value of the histogram consist on | ||
* a value and a count. | ||
*/ | ||
public abstract class HistogramValue { | ||
|
||
/** | ||
* Advance this instance to the next value of the histogram | ||
* @return true if there is a next value | ||
*/ | ||
public abstract boolean next() throws IOException; | ||
|
||
/** | ||
* the current value of the histogram | ||
* @return the current value of the histogram | ||
*/ | ||
public abstract double value(); | ||
|
||
/** | ||
* The current count of the histogram | ||
* @return the current count of the histogram | ||
*/ | ||
public abstract int count(); | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Per-segment histogram values. | ||
*/ | ||
public abstract class HistogramValues { | ||
|
||
/** | ||
* Advance this instance to the given document id | ||
* @return true if there is a value for this document | ||
*/ | ||
public abstract boolean advanceExact(int doc) throws IOException; | ||
|
||
/** | ||
* Get the {@link HistogramValue} associated with the current document. | ||
* The returned {@link HistogramValue} might be reused across calls. | ||
*/ | ||
public abstract HistogramValue histogram() throws IOException; | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.fielddata; | ||
|
||
|
||
import org.elasticsearch.index.Index; | ||
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData; | ||
|
||
/** | ||
* Specialization of {@link IndexFieldData} for histograms. | ||
*/ | ||
public abstract class IndexHistogramFieldData extends DocValuesIndexFieldData implements IndexFieldData<AtomicHistogramFieldData> { | ||
|
||
public IndexHistogramFieldData(Index index, String fieldName) { | ||
super(index, fieldName); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a link to the histogram field?