-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PercentileKLL aggregation function #10643
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10643 +/- ##
=============================================
- Coverage 64.40% 13.66% -50.75%
+ Complexity 6446 439 -6007
=============================================
Files 2098 2103 +5
Lines 113315 113521 +206
Branches 17204 17255 +51
=============================================
- Hits 72984 15511 -57473
- Misses 35082 96740 +61658
+ Partials 5249 1270 -3979
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1444 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pinot-core/src/main/java/org/apache/pinot/core/common/ObjectSerDeUtils.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/pinot/core/query/aggregation/function/PercentileKLLAggregationFunction.java
Show resolved
Hide resolved
|
||
_percentile = arguments.get(1).getLiteral().getDoubleValue(); | ||
if (numArguments == 3) { | ||
_kValue = arguments.get(2).getLiteral().getIntValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar check should be done on the K value. There's could be some memory implications from setting the K to be too high: https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library already returns a good exception (SketchesArgumentException: K must be >= 8 and <= 65535
), so I didn't see much value in doing another check here.
For the memory implication, I think it should be up to the user to decide if the size of the sketch is going to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Did not know about SketchesArgumentException. That should be good enough if bubbled up properly
...t-segment-local/src/main/java/org/apache/pinot/segment/local/customobject/SerializedKLL.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/AggregationFunctionType.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Show resolved
Hide resolved
.../java/org/apache/pinot/core/query/aggregation/function/PercentileKLLAggregationFunction.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/PercentileKLLQueriesTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
_percentile = arguments.get(1).getLiteral().getDoubleValue(); | ||
if (numArguments == 3) { | ||
_kValue = arguments.get(2).getLiteral().getIntValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Did not know about SketchesArgumentException. That should be good enough if bubbled up properly
* Compares two sketches for a specific percentile value. | ||
*/ | ||
public class SerializedKLL implements Comparable<SerializedKLL> { | ||
private final double _quantile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need _quantile inside this SerializedKLL?
if we want to return the KllDoublesSketch as byte[], the toString() is for that purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. That is used when we want to sort (ORDER BY) sketches.
@navina, not yet. I'll add it to OSS documentation soon. |
Introducing a new approximate percentile calculation function
PercentileKLL
and its variations (MV & Raw), using Apache Datasketches libraries 'KLL'.This is part of a proposal to improve Apache Datasketches support in Pinot:
(Google Docs Link) [Proposal] Improved Apache DataSketches Support in Pinot
Some advantages listed and discussed in the linked document:
Please leave design related comments on the linked document and code related comments in this PR.
Testing
PercentileKLL
,PercentileKLLMV
,PercentileRawKLL
,PercentileRawKLLMV
feature
performance