-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PercentileSmartTDigestAggregationFunction #8565
Add PercentileSmartTDigestAggregationFunction #8565
Conversation
@@ -52,6 +52,9 @@ public static AggregationFunction getAggregationFunction(FunctionContext functio | |||
ExpressionContext firstArgument = arguments.get(0); | |||
if (upperCaseFunctionName.startsWith("PERCENTILE")) { | |||
String remainingFunctionName = upperCaseFunctionName.substring(10); | |||
if (remainingFunctionName.equals("SMARTTDIGEST")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case ignore match ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already canonicalized, so no need to ignore case
public void aggregate(int length, AggregationResultHolder aggregationResultHolder, | ||
Map<ExpressionContext, BlockValSet> blockValSetMap) { | ||
BlockValSet blockValSet = blockValSetMap.get(_expression); | ||
validateValueType(blockValSet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this once (upon first call to aggregate) should be enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but because the aggregation function itself is stateless (shared across threads), we cannot add a variable to the function to track if it is the first call. The overhead of this call should be minimal. Once we enforce schema, we should be able to perform all these checks on the broker side
Description
Adds
PercentileSmartTDigestAggregationFunction
which can automatically convert theDoubleArrayList
toTDigest
if the list size grows too big to protect the servers from running out of memory. This conversion only applies to aggregation only queries, but not the group-by queries.By default, when the list size exceeds 100K, it will be converted to a TDigest with compression of 100.
The threshold and compression can be configured using the third argument (literal) of the function:
threshold
: list size threshold to trigger the conversion, non-positive means never convert (default 100K)compression
: compression of the converted TDigest (default 100)Example query:
SELECT PERCENTILE_SMART_TDIGEST(myCol, 95, 'threshold=10;compression=50') FROM myTable
Release Notes
Adds
PercentileSmartTDigestAggregationFunction
which automatically stores values in DoubleArrayList or TDigest based on the number of values