-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add histogram aggregation function #8724
Conversation
da24002
to
9f2fa24
Compare
Codecov Report
@@ Coverage Diff @@
## master #8724 +/- ##
============================================
- Coverage 69.68% 69.67% -0.02%
- Complexity 4577 4618 +41
============================================
Files 1729 1736 +7
Lines 90183 91180 +997
Branches 13415 13632 +217
============================================
+ Hits 62848 63529 +681
- Misses 22971 23226 +255
- Partials 4364 4425 +61
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
feature
feature
8c43cd7
to
67bb742
Compare
pinot-common/src/main/java/org/apache/pinot/segment/local/customobject/DoubleVector.java
Outdated
Show resolved
Hide resolved
Suggest changing the function name to |
6672bac
to
1ad061f
Compare
...main/java/org/apache/pinot/core/query/aggregation/function/HistogramAggregationFunction.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/pinot/core/query/aggregation/function/HistogramAggregationFunction.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/HistogramQueriesTest.java
Show resolved
Hide resolved
...main/java/org/apache/pinot/core/query/aggregation/function/HistogramAggregationFunction.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/HistogramQueriesTest.java
Show resolved
Hide resolved
pinot-core/src/test/java/org/apache/pinot/queries/HistogramQueriesTest.java
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/utils/DataSchema.java
Outdated
Show resolved
Hide resolved
feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the feature. Please also update the pinot doc to include this new function.
} | ||
|
||
public static DoubleArrayList vectorAdd(DoubleArrayList a, DoubleArrayList b) { | ||
Preconditions.checkState(a.size() == b.size(), "The two operand arrays are not of the same size! provided %s, %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified to:
return vectorAdd(a, b.elements());
} | ||
|
||
public static DoubleArrayList vectorAdd(DoubleArrayList a, double[] b) { | ||
Preconditions.checkState(a.size() == b.length, "The two operand arrays are not of the same size! provided %s, %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call a.elements()
first and cache the value, then operate on 2 double[]
for better performance. Not sure if jvm will inline all the method calls. Same for incrementElement()
public static DoubleArrayList vectorAdd(DoubleArrayList a, double[] b) { | ||
Preconditions.checkState(a.size() == b.length, "The two operand arrays are not of the same size! provided %s, %s", | ||
a.size(), b.length); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
This effort is part of the issue #8493
The histogram aggregation function for single value numerical columns is implemented.
usage example:
Histogram(columnName, ARRAY[0,1,10,100])
to specify bins [0,1), [1,10), [10,1000]HISTOGRAM(intColumn,ARRAY["-Infinity",0,1,10,100,1000, "+Infinity"])
to specify bins (-inf, 0), [0,1), [1,10), [10,1000), [1000, +inf)Histogram(columnName, 0, 1000, 10)
to specify 10 equal-length bins [0,100), [100,200), ..., [900,1000]This is a proof of concept histogram function, the following are some TODOs that we want to address in a follow up pr: