This is a cross-platform library with implementations of various percentile algorithms on streams of data. These algorithms allow you to calculate approximate percentiles (e.g. 50th percentile, 95th percentile) in a single pass over a data set. They are particularly useful for calculating percentiles for immense data sets, for extremely high-throughput systems, or for near-real time use cases.
The library supports the following languages:
- C++
- JavaScript
The library implements the following streaming percentile algorithms:
- Greenwald-Khanna
- Cormode-Korn-Muthukrishnan-Srivastava for high-biased, low-biased, uniform, and targeted quantiles
- Q-Digest
For more background on streaming percentiles, see Calculating Percentiles on Streaming Data.
The current version of the library is 3.1.0, and it was released March 28, 2019.
You can download the latest release of the library from the streaming-percentiles latest release page.
If you use NPM, npm install streaming-percentiles
. Otherwise, download
the latest release of the library from the streaming-percentiles latest
release
page
page.
For convenience, you can also use the latest release JS directly
from sengelha.github.io
:
You can download the latest release's source code from the streaming-percentiles latest release page.
See CONTRIBUTING.md for instructions on how to build the release from source.
Historical releases may be downloaded from the streaming-percentiles release page.
Here's an example on how to use the Greenwald-Khanna streaming percentile algorithm from C++:
#include <stmpct/gk.hpp>
double epsilon = 0.1;
stmpct::gk<double> g(epsilon);
for (int i = 0; i < 1000; ++i)
g.insert(rand());
double p50 = g.quantile(0.5); // Approx. median
double p95 = g.quantile(0.95); // Approx. 95th percentile
Here's an example of how to use the library from Node.JS:
var sp = require('streaming-percentiles');
var epsilon = 0.1;
var g = new sp.GK(epsilon);
for (var i = 0; i < 1000; ++i)
g.insert(Math.random());
var p50 = g.quantile(0.5); // Approx. median
var p95 = g.quantile(0.95); // Approx. 95th percentile
Here's an example of how to use the library from a browser. Note that the
default module name is streamingPercentiles
:
<script src="//sengelha.github.io/streaming-percentiles/streamingPercentiles.v1.min.js"></script>
<script>
var epsilon = 0.1;
var g = new streamingPercentiles.GK(epsilon);
for (var i = 0; i < 1000; ++i)
g.insert(Math.random());
var p50 = g.quantile(0.5);
</script>
Below is the performance of the C++ implementation of these algorithms tested on December 22, 2021 on a Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz:
Algorithm | Epsilon | N | Input Type | Throughput (thousands of insertions/s) |
---|---|---|---|---|
ckms_hbq | 0.1 | 1,000,000 | sorted | 1,517.7 |
ckms_hbq | 0.01 | 1,000,000 | sorted | 1,094.1 |
ckms_hbq | 0.001 | 1,000,000 | sorted | 289.8 |
ckms_uq | 0.1 | 1,000,000 | sorted | 1,533.9 |
ckms_uq | 0.01 | 1,000,000 | sorted | 1,654.2 |
ckms_uq | 0.001 | 1,000,000 | sorted | 399.2 |
gk | 0.1 | 1,000,000 | sorted | 1,392.1 |
gk | 0.01 | 1,000,000 | sorted | 1,444.9 |
gk | 0.001 | 1,000,000 | sorted | 1,114.9 |
See //cpp:benchmark
for the benchmark program.
Benchmarks are auto-generated on every master
build. They can be viewed
at https://sengelha.github.io/streaming-percentiles/dev/bench/.
See doc/api_reference/ for detailed API reference documentation.
This project is licensed under the MIT License. See LICENSE for more information.
If you are interested in contributing to the library, please see CONTRIBUTING.md.