Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PercentileSmartTDigestAggregationFunction #8565

Merged
merged 1 commit into from
Apr 20, 2022

Conversation

Jackie-Jiang
Copy link
Contributor

Description

Adds PercentileSmartTDigestAggregationFunction which can automatically convert the DoubleArrayList to TDigest if the list size grows too big to protect the servers from running out of memory. This conversion only applies to aggregation only queries, but not the group-by queries.

By default, when the list size exceeds 100K, it will be converted to a TDigest with compression of 100.
The threshold and compression can be configured using the third argument (literal) of the function:

  • threshold: list size threshold to trigger the conversion, non-positive means never convert (default 100K)
  • compression: compression of the converted TDigest (default 100)

Example query:
SELECT PERCENTILE_SMART_TDIGEST(myCol, 95, 'threshold=10;compression=50') FROM myTable

Release Notes

Adds PercentileSmartTDigestAggregationFunction which automatically stores values in DoubleArrayList or TDigest based on the number of values

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Apr 19, 2022
@Jackie-Jiang Jackie-Jiang requested a review from xiangfu0 April 19, 2022 18:41
@Jackie-Jiang Jackie-Jiang merged commit b025f43 into apache:master Apr 20, 2022
@Jackie-Jiang Jackie-Jiang deleted the percentile_smart_tdigest branch April 20, 2022 17:05
@@ -52,6 +52,9 @@ public static AggregationFunction getAggregationFunction(FunctionContext functio
ExpressionContext firstArgument = arguments.get(0);
if (upperCaseFunctionName.startsWith("PERCENTILE")) {
String remainingFunctionName = upperCaseFunctionName.substring(10);
if (remainingFunctionName.equals("SMARTTDIGEST")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case ignore match ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already canonicalized, so no need to ignore case

public void aggregate(int length, AggregationResultHolder aggregationResultHolder,
Map<ExpressionContext, BlockValSet> blockValSetMap) {
BlockValSet blockValSet = blockValSetMap.get(_expression);
validateValueType(blockValSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this once (upon first call to aggregate) should be enough ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but because the aggregation function itself is stateless (shared across threads), we cannot add a variable to the function to track if it is the first call. The overhead of this call should be minimal. Once we enforce schema, we should be able to perform all these checks on the broker side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants