-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for PercentileKLL and its variants #206
Merged
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILEKLL function. | ||
--- | ||
|
||
# PERCENTILEKLL | ||
|
||
`KLL Sketch` is an approxiamate quantiles algorithm which targets optimal space for a given accuracy. `PERCENTILEKLL` is a percentile calculation aggregation function based on Apache Datasketches [KLL Doubles Sketch](https://datasketches.apache.org/docs/KLL/KLLSketch.html) implementation. | ||
|
||
Pinot also offers a 'raw' variant, `PERCENTILEKLLRAW`, which returns the serialized sketch that can be used for calculating 'rank' or 'histogram'. | ||
|
||
All of the variants of `PercentileKLL` also support raw sketches in Pinot columns. This means you can create KLL Doubles sketches outside of Pinot and ingest them into columns as binary strings. `PercentileKLL` will identify these columns merge them to produce aggregate results. | ||
|
||
## Signature | ||
|
||
> PercentileKLL(column, percentile, kValue) -> Double | ||
|
||
* `column` (required): Name of the column to aggregate on. If the column is a multi value column, use `PERCENTILEKLLMV` variant. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. For defails please see the [accuracy vs size chart](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLL(ArrDelayMinutes, 90) as DelayP90 | ||
from airlineStats | ||
``` | ||
|
||
| DelayP90 | | ||
| -------- | | ||
| 40 | | ||
|
||
```sql | ||
select Carrier, percentileKll(ArrDelay, 50, 600) as MedianDelay | ||
from airlineStats | ||
where ArrDelay > 0 | ||
group by Carrier | ||
order by 2 desc | ||
limit 3 | ||
``` | ||
|
||
| Carrier | MedianDelay | | ||
| ------- | ----------- | | ||
| MQ | 28 | | ||
| B6 | 28 | | ||
| EV | 24 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILEKLLMV function. | ||
--- | ||
|
||
# PERCENTILEKLLMV | ||
|
||
Variant of the `PERCENTILEKLL` aggregation function which accepts multi-value columns. Values in the given column are 'flattened' before aggregation, so the function will produce a single value for the given percentile. | ||
|
||
## Signature | ||
|
||
> PercentileKLLMV(column, percentile, kValue) -> Double | ||
|
||
* `column` (required): Name of the column to aggregate on. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. For defails please see the [accuracy vs size chart](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLLMV(ArrOfInts, 90) as value | ||
from MyTable | ||
``` | ||
|
||
| value | | ||
| ------ | | ||
| 40 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILERAWKLL function. | ||
--- | ||
|
||
# PERCENTILERAWKLL | ||
|
||
Raw variant of the `PERCENTILEKLL` which returns a Base64 encoded string of the KLLSketch object. The response can be deserialized back to a KLLSketch using Apache Datasketches library and used to do further analysis. For example you can use this approach to calculate the CDF (Cumulative Density Function) or PMF (Probability Mass Function) of a dataset. | ||
|
||
## Signature | ||
|
||
> PercentileRawKLL(column, percentile, kValue) -> Double | ||
|
||
* `column` (required): Name of the column to aggregate on. If the column is a multi value column, use `PERCENTILERAWKLLMV` variant. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileRawKll(ArrDelayMinutes, 90) as sketch | ||
from airlineStats | ||
``` | ||
|
||
| sketch | | ||
| -------- | | ||
| BQEPC... | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILERAWKLLMV function. | ||
--- | ||
|
||
# PERCENTILRAWEKLLMV | ||
|
||
Variant of the `PERCENTILERAWKLL` aggregation function which accepts multi-value columns. Values in the given column are 'flattened' before aggregation. | ||
|
||
## Signature | ||
|
||
> PercentileRAWKLLMV(column, percentile, kValue) -> Double | ||
|
||
* `column` (required): Name of the column to aggregate on. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. For defails please see the [accuracy vs size chart](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLLMV(ArrOfInts, 90) as value | ||
from MyTable | ||
``` | ||
|
||
| sketch | | ||
| -------- | | ||
| BQEPC... | |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/defails/details/ in all 4
.md
filesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks! Fixed.