This repository contains the Python implementation of QuantTree [Boracchi et al. 2018] and its extensions QT-EWMA [Frittoli et al. 2021][Frittoli et al. 2022], CDM [Stucchi et al. 2022] and Kernel QuantTree [Stucchi et. al. 2023].
Python 3 with the packages in requirements.txt
- Clone the repository
- Run
pip install -e path/to/quanttree
In this section, we illustrate the methods that are implemented in this library. For in-depth explanation, we recommend checking out the papers associated with each algorithm.
QuantTree monitors batches of data to detect any distribution change
The main parameters of QuantTree are:
- the number of bins
$K$ ; - the desired percentage of points per bin
${\pi_1, ..., \pi_K}$ ; - the target False Positive Rate
$\alpha$ ; - the test statistic to be employed.
The QuantTree is implemented in quanttree/quanttree_src.py
in a class called QuantTree
. We refer to the inline documentation of QuantTree
for a detailed explanation of the __init__
arguments. In demos/demo_quanttree.py
, you can see QuantTree in action (with comments!).
QuantTree Exponentially Weighted Moving Average (QT-EWMA) monitors datastreams by means of an online statistical test that monitors the bin frequencies a QuantTree histogram by EWMA statistics. During training, a QuantTree histogram is constructed over the training set drawn from
The main parameters of QT-EWMA are:
- the number of bins
$K$ ; - the target ARL0;
- the EWMA forgetting factor
$\lambda$ ;
QT-EWMA is implemented in quanttree/qtewma_src.py
in a class called QT_EWMA
. We refer to the inline documentation of QT_EWMA
for a detailed explanation of the __init__
arguments. In demos/demo_qtewma.py
, QT-EWMA is used in a small experiment over synthetic data.
QT-EWMA-update is a change-detection algorithm based on QT-EWMA that enables online monitoring even when the training set is extremely small. In QT-EWMA-update, new samples are used to update the estimated bin probabilities of the initial QuantTree histogram (namely, the estimated expected values of the EWMA statistics), as long as no change is detected. This update improves the model, thus increasing the detection power. The updating procedure is compatible with the computational requirements of online monitoring schemes, and the distribution of the QT-EWMA-update statistic is also independent of the stationary distribution, enabling the computation of thresholds controlling the ARL0 through the same procedure as in QT-EWMA.
To set up QT-EWMA-update, the following parameters are required:
- the number of bins
$K$ ; - the target ARL0;
- the EWMA forgetting factor
$\lambda$ ; - the weight of the latest sample during the update
$\beta$ ; - the number of samples after which the update stops
$S$ (optional);
QT-EWMA-update can be used by setting the correct parameters in the initialization of a QT_EWMA
instance.
Class Distribution Monitoring (CDM) employs separate instances of QT-EWMA to monitor the class-conditional distributions. We report a concept drift after detecting a change in the class-conditional distribution of at least one class. The main advantages of CDM are:
i) it can detect any relevant drift, including virtual ones that have little impact on the classification error and are by design ignored by methods that monitor the error rate; ii) it can detect concept drifts affecting only a subset of classes more promptly than methods that monitor the overall data distribution, since the other class-conditional distributions do not change; iii) it provides insights on which classes have been affected by concept drift, which might be crucial for diagnostics and adaptation; iv) it effectively controls false alarms by maintaining a target ARL0, set before monitoring.
The setup of CDM simply consists of setting the parameters for the underlying QT-EWMA (see previous sections).
CDM is implemented in quanttree/cdm_src.py
in a class called CDM_QT_EWMA
. We refer to the inline documentation of CDM_QT_EWMA
for a detailed explanation of the __init__
arguments. In demos/demo_cdm.py
, CDM is compared against QT-EWMA in an experiment over a synthetic datastream comprising two classes.
We remark that the control of the ARL0 holds for any CDM defined by any online change-detection algorithm that can be configured to yield the desired ARL0 by setting a constant false alarm probability over time (see Proposition 1 in [Stucchi et al. 2022]). This means that, in principle, we can define CDM using other change-detection tests. However, to the best of our knowledge, QT-EWMA is the only nonparametric and online change-detection test for multivariate datastreams where the ARL0 is controlled by setting a constant false alarm probability.
Kernel QuantTree monitors batches of data to detect any distribution change
The main parameters of Kernel QuantTree are:
- the kernel functions
$f_k$ to be employed; - the number of bins
$K$ ; - the desired percentage of points per bin
${\pi_1, ..., \pi_K}$ ; - the target False Positive Rate
$\alpha$ ; - the test statistic to be employed.
The Kernel QuantTree is implemented in the following files:
quanttree/kqt_eucliean.py
in a class calledEuclideanKernelQuantTree
;quanttree/kqt_mahalanobis.py
in a class calledMahalanobisKernelQuantTree
;quanttree/kqt_weighted_mahalanobis.py
in a class calledWeightedMahalanobisKernelQuantTree
;
We refer to the inline documentation of each class for a detailed explanation of the __init__
arguments. In demos/demo_kqt.py
, you can see the proposed Kernel QuantTrees in action (with comments!).
Coming soon.
The theoretical properties of QuantTree enable an efficient monitoring where detection thresholds are independent of the stationary distribution, and can be pre-computed via Monte Carlo simulations, as detailed in the papers reported in the References. Here, we provide pre-computed thresholds for the settings addressed in the experimental sections of the works involving QuantTree and its extensions. For any question about thresholds, see Contacts
The main contributors to this repo are Diego Stucchi, Diego Carrera and Luca Frittoli. For any question or bug report, please contact [email protected].
This software is released under a NonCommercial-ShareAlike license issued by Politecnico di Milano. The adaptation of this software is allowed under specific conditions that are designed to enable most non-commercial uses. See LICENSE.pdf for the complete terms and conditions.
"QuantTree: Histograms for Change Detection in Multivariate Data Streams"
G. Boracchi, D. Carrera, C. Cervellera, D. Macciò. International Conference on Machine Learning (ICML) 2018.
"Change Detection in Multivariate Datastreams Controlling False Alarms"
L. Frittoli, D. Carrera, G. Boracchi. Joint European Conference on Machine Learning and Knowledge Discovery in Databases 2021.
"Nonparametric and Online Change Detection in Multivariate Datastreams using QuantTree"
L. Frittoli, D. Carrera, G. Boracchi. IEEE Transactions on Knowledge and Data Engineering 2022.
"Class Distribution Monitoring for Concept Drift Detection"
D. Stucchi, L. Frittoli, G. Boracchi. IEEE-INNS International Joint Conference on Neural Networks (IJCNN) 2022.
"Kernel QuantTree" D. Stucchi, P. Rizzo, N. Folloni, G. Boracchi. International Conference on Machine Learning (ICML) 2023.