-
Notifications
You must be signed in to change notification settings - Fork 168
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #206 from cbalci/percentil-kll-docs
Add documentation for PercentileKLL and its variants
- Loading branch information
Showing
6 changed files
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILEKLL function. | ||
--- | ||
|
||
# PERCENTILEKLL | ||
|
||
`KLL Sketch` is an approxiamate quantiles algorithm which targets optimal space for a given accuracy. `PERCENTILEKLL` is a percentile calculation aggregation function based on Apache Datasketches [KLL Doubles Sketch](https://datasketches.apache.org/docs/KLL/KLLSketch.html) implementation. | ||
|
||
Pinot also offers a 'raw' variant, `PERCENTILEKLLRAW`, which returns the serialized sketch that can be used for calculating 'rank' or 'histogram'. | ||
|
||
All of the variants of `PercentileKLL` also support raw sketches in Pinot columns. This means you can create KLL Doubles sketches outside of Pinot and ingest them into columns as binary strings. `PercentileKLL` will identify these columns merge them to produce aggregate results. | ||
|
||
## Signature | ||
|
||
> PercentileKLL(column, percentile, kValue) -> Double | ||
* `column` (required): Name of the column to aggregate on. If the column is a multi value column, use `PERCENTILEKLLMV` variant. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. For details please see the [accuracy vs size chart](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLL(ArrDelayMinutes, 90) as DelayP90 | ||
from airlineStats | ||
``` | ||
|
||
| DelayP90 | | ||
| -------- | | ||
| 40 | | ||
|
||
```sql | ||
select Carrier, percentileKll(ArrDelay, 50, 600) as MedianDelay | ||
from airlineStats | ||
where ArrDelay > 0 | ||
group by Carrier | ||
order by 2 desc | ||
limit 3 | ||
``` | ||
|
||
| Carrier | MedianDelay | | ||
| ------- | ----------- | | ||
| MQ | 28 | | ||
| B6 | 28 | | ||
| EV | 24 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILEKLLMV function. | ||
--- | ||
|
||
# PERCENTILEKLLMV | ||
|
||
Variant of the `PERCENTILEKLL` aggregation function which accepts multi-value columns. Values in the given column are 'flattened' before aggregation, so the function will produce a single value for the given percentile. | ||
|
||
## Signature | ||
|
||
> PercentileKLLMV(column, percentile, kValue) -> Double | ||
* `column` (required): Name of the column to aggregate on. | ||
* `percentile` (required): Percentile value to be calculated [0..100] | ||
* `kValue`: Integer value which determines the size of the sketch. Default value is `200` which corresponds to a normalized rank error of about 1.65%. For details please see the [accuracy vs size chart](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLLMV(ArrOfInts, 90) as value | ||
from MyTable | ||
``` | ||
|
||
| value | | ||
| ------ | | ||
| 40 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILERAWKLL function. | ||
--- | ||
|
||
# PERCENTILERAWKLL | ||
|
||
Raw variant of the `PERCENTILEKLL` which returns a Base64 encoded string of the KLLSketch object. The response can be deserialized back to a KLLSketch using Apache Datasketches library and used to do further analysis. For example you can use this approach to calculate the CDF (Cumulative Density Function) or PMF (Probability Mass Function) of a dataset. | ||
|
||
## Signature | ||
|
||
> PercentileRawKLL(column, percentile, kValue) -> Double | ||
* `column` (required): Name of the column to aggregate on. If the column is a multi value column, use `PERCENTILERAWKLLMV` variant. | ||
* `percentile` (required): Percentile value to be calculated [0..100]. For 'raw' versions of the function, this value is used for ordering (ORDER BY). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileRawKll(ArrDelayMinutes, 90) as sketch | ||
from airlineStats | ||
``` | ||
|
||
| sketch | | ||
| -------- | | ||
| BQEPC... | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
description: >- | ||
This section contains reference documentation for the PERCENTILERAWKLLMV function. | ||
--- | ||
|
||
# PERCENTILRAWEKLLMV | ||
|
||
Variant of the `PERCENTILERAWKLL` aggregation function which accepts multi-value columns. Values in the given column are 'flattened' before aggregation. | ||
|
||
## Signature | ||
|
||
> PercentileRAWKLLMV(column, percentile) -> Double | ||
* `column` (required): Name of the column to aggregate on. | ||
* `percentile` (required): Percentile value to be calculated [0..100]. For raw versions of the function, this value is used for ordering (ORDER BY). | ||
|
||
## Usage Examples | ||
|
||
```sql | ||
select percentileKLLMV(ArrOfInts, 90) as value | ||
from MyTable | ||
``` | ||
|
||
| sketch | | ||
| -------- | | ||
| BQEPC... | |