Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Ensemble dimensions #248

Merged
merged 24 commits into from
Apr 15, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
35b421b
Add example on ensemble dimensions
theovincent Apr 6, 2022
5ef3879
Add example in docs
theovincent Apr 6, 2022
174100f
Update docs/examples/ensemble-dimensions.ipynb spelling mistake
theovincent Apr 13, 2022
fa6be64
Update docs/examples/ensemble-dimensions.ipynb formulation
theovincent Apr 13, 2022
d245af4
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
e40ad64
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
2c196cf
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
ce3136c
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
358738c
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
e66e45b
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
a73d44b
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
1c4918b
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
885c28b
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
5ce4705
Update docs/examples/ensemble-dimensions.ipynb
theovincent Apr 13, 2022
3c3ed5c
Fix bug comma (ensemble-dimensions)
theovincent Apr 13, 2022
3ad84d5
docs: add corrections to notebook
deepcharles Apr 13, 2022
658f242
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2022
eba433a
docs(example): typo
deepcharles Apr 13, 2022
a3b9a87
docs(example): typo + formatting
deepcharles Apr 13, 2022
5db321a
docs: typo
deepcharles Apr 13, 2022
85917a9
docs(example): typo
deepcharles Apr 13, 2022
18e24d3
docs(example): change title
deepcharles Apr 13, 2022
af959c8
docs(example): renaming notebook + fmt
deepcharles Apr 14, 2022
cd55775
docs(example): style
deepcharles Apr 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
308 changes: 308 additions & 0 deletions docs/examples/ensemble-dimensions.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Ensemble dimensions: splitting changepoint detection algorithms along the dimensions\n",
"\n",
"<!-- {{ add_binder_block(page) }} -->\n",
"\n",
"## Introduction\n",
"\n",
"In `ruptures`, change point detection procedures make use of the same cost function along all the dimensions.\n",
"The choice of the cost function is critical as it is related to the type of change to find. \n",
"For instance, [CostL2](../user-guide/costs/costl2.md) can detect shifts in the mean, [CostNormal](../user-guide/costs/costnormal.md) can detect shifts in the mean and the covariance structure, [CostAR](../user-guide/costs/costautoregressive.md) can detect shifts in the auto-regressive structure, etc.\n",
"\n",
"However, in many settings, all the dimensions don't have the same type of changes and a single cost function is not able to spot all changes simultaneously.\n",
"To cope with this issue, a procedure to study the cost on each dimension independantly is presented here. It is inspired by the paper [[Katser2021]](#Katser2021), where a procedure to merge several cost functions has been introduced.\n",
"In a nutshell, different costs along each dimension can be combined to yield an aggregated cost function which is sensitive to several types of changes.\n",
"The aggregated cost can then be used with any search method (such as the [window search method](../user-guide/detection/window.md)) to create change point detection algorithm.\n",
"\n",
"This example illustrates the aggregation procedure, also referred to as an ensemble model.\n",
"Here, only [CostL2](../user-guide/costs/costl2.md) is considered for all dimensions, but all other costs could be used. The focus is then on the way the costs are combined (see [intersection](#intersection) and [union](#union)).\n",
"In addition, the number of changes is assumed to be known by the user."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"First, we make the necessary imports and generate a multivariate toy signal which contains different mean shifts along the dimensions. Notice that only one changepoint is shared between the two dimensions. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from itertools import cycle\n",
"import matplotlib.pyplot as plt\n",
"import ruptures as rpt\n",
"\n",
"\n",
"# Scaling function\n",
"def minmax(array):\n",
" return (array - np.min(array, axis=0)) / (\n",
" np.max(array, axis=0) - np.min(array, axis=0) + 1e-8\n",
" )\n",
"\n",
"\n",
"# Aggregation functions\n",
"def min_(array):\n",
" return np.min(array, axis=1).T\n",
"\n",
"\n",
"def max_(array):\n",
" return np.max(array, axis=1).T"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bkps = [329, 656, 972, 1291, 1642, 2000]\n",
"n_samples = bkps[-1]\n",
"bkps_1 = [bkps[0], bkps[1], bkps[4], n_samples]\n",
"bkps_2 = [bkps[1], bkps[2], bkps[3], n_samples]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"signal = np.zeros((n_samples, 2))\n",
"val_cycle = cycle([0, 1])\n",
"for (start, end) in rpt.utils.pairwise([0] + bkps_1):\n",
" signal[start:end, 0] = next(val_cycle)\n",
"for (start, end) in rpt.utils.pairwise([0] + bkps_2):\n",
" signal[start:end, 1] = next(val_cycle)\n",
"\n",
"fig, axes = rpt.display(signal, bkps)\n",
"axes[0].set_title(\"Noise free signal\")\n",
"\n",
"signal += np.random.normal(size=signal.shape)\n",
"fig, axes = rpt.display(signal, bkps)\n",
"_ = axes[0].set_title(\"Toy signal\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell defines a custom cost that computes the [CostL2](../user-guide/costs/costl2.md) on the given dimension. To define a custom cost that would not compute the same cost on all dimensions, an `if` loop relying on the value of `self.dim` can be imagined. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from ruptures.base import BaseCost\n",
"\n",
"\n",
"class MyCost(BaseCost):\n",
"\n",
" \"\"\"Custom cost for exponential signals.\"\"\"\n",
"\n",
" # The 2 following attributes must be specified for compatibility.\n",
" model = \"\"\n",
" min_size = 2\n",
"\n",
" def __init__(self, dim):\n",
" super().__init__()\n",
" self.dim = dim\n",
"\n",
" def fit(self, signal):\n",
" \"\"\"Set the internal parameter.\"\"\"\n",
" self.signal = signal[:, self.dim].reshape(-1, 1)\n",
" return self\n",
"\n",
" def error(self, start, end) -> float:\n",
" \"\"\"Return the approximation cost on the segment [start:end].\n",
"\n",
" Args:\n",
" start (int): start of the segment\n",
" end (int): end of the segment\n",
"\n",
" Returns:\n",
" segment cost\n",
"\n",
" Raises:\n",
" NotEnoughPoints: when the segment is too short (less than `min_size` samples).\n",
" \"\"\"\n",
" if end - start < self.min_size:\n",
" raise rpt.exceptions.NotEnoughPoints\n",
"\n",
" return self.signal[start:end].var(axis=0).sum() * (end - start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Looking at the costs along each dimension\n",
"\n",
"Here, the [window search method](../user-guide/detection/window.md) is used. Thus, the scores will be considered instead of the costs.\n",
"\n",
"The following cell shows the scores along each dimension:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"window_size = 200\n",
"\n",
"list_of_costs = [MyCost(dim=0), MyCost(dim=1)]\n",
"\n",
"scores = []\n",
"\n",
"for cost in list_of_costs:\n",
" algo = rpt.Window(width=window_size, custom_cost=cost, jump=1).fit(signal)\n",
" scores.append(algo.score)\n",
"scores = np.array(scores).T\n",
"\n",
"# For display purpose\n",
"appended_scores = np.append(\n",
" np.ones((window_size // 2, 2)) * float(\"inf\"), scores, axis=0\n",
")\n",
"fig, axes = rpt.display(appended_scores, bkps)\n",
"_ = axes[0].set_title(\"Scores along each dimension\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Intersection\n",
"\n",
"A first way of aggregating the scores is to consider each score along a dimension as being an expert. The idea is then to take the __intersection__ of the experts so that the predicted changepoints are the changepoints where all the experts are confident. This method is useful when the user is interested in collecting the changepoints that have been detected on all dimensions at the same timestep.\n",
"\n",
"In the following cell, the intersection procedure correctly predicts the only common changepoint between the two dimensions. The aggregated score shows clearly the only changepoint to predict."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bkps_intersection = [bkps[1], n_samples]\n",
"n_bkps_intersection = 1\n",
"\n",
"# intersection aggregation\n",
"algo.score = min_(minmax(scores))\n",
"bkps_intersection_predicted = algo.predict(n_bkps=n_bkps_intersection)\n",
"\n",
"# For display purpose\n",
"appended_intersection_scores = np.append(\n",
" np.ones(window_size // 2) * float(\"inf\"), algo.score\n",
")\n",
"\n",
"fig, (ax,) = rpt.display(\n",
" appended_intersection_scores, bkps_intersection, bkps_intersection_predicted\n",
")\n",
"_ = ax.set_title(\"Aggregated score for intersection purpose\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Union\n",
"\n",
"Another way of aggregating the scores is to take the __union__ of the experts so that the predicted changepoints are the changepoints where at least one expert is confident. This method is useful when the user is interested in collecting all the changepoints that are present along all dimensions.\n",
"\n",
"In the following cell, the union procedure correctly predicts all the changepoints. The aggregated score shows 5 clear peaks from which to take the changepoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bkps_union = bkps\n",
"n_bkps_union = 5\n",
"\n",
"# union aggregation\n",
"algo.score = max_(minmax(scores))\n",
"bkps_union_predicted = algo.predict(n_bkps=n_bkps_union)\n",
"\n",
"# For display purpose\n",
"appended_union_scores = np.append(np.ones(window_size // 2) * float(\"inf\"), algo.score)\n",
"\n",
"fig, (ax,) = rpt.display(appended_union_scores, bkps_union, bkps_union_predicted)\n",
"_ = ax.set_title(\"Aggregated score for union purpose\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"This example shows a way of crafting a changepoint detection algorithm at the scale of the dimensions. This is a finer scale than the usual one: the signal scale.\n",
"\n",
"Two options are presented: [intersection](#intersection) where a changepoint is detected only if it is detected along all dimensions and [union](#union) where a changepoint is detected as soon as one dimension has detected it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Authors\n",
"\n",
"This example notebook has been authored by [Théo VINCENT](https://github.com/theovincent) and edited by [Olivier Boulant](https://github.com/oboulant) and [Charles Truong](https://github.com/deepcharles)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"<a id=\"Katser2021\">[Katser2021]</a>\n",
"Katser, I., Kozitsin, V., Lobachev, V., & Maksimov, I. (2021). Unsupervised Offline Changepoint Detection Ensembles. Applied Sciences, 11(9), 4280."
]
}
],
"metadata": {
"interpreter": {
"hash": "de7c5fd504cc39d35018b8f7c7b90638a48a11123680d0b8595c478c27913e5d"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,11 @@ nav:
- 'Simple usages':
- 'Basic usage': examples/basic-usage.ipynb
- 'Advanced usages':
- 'Dimensions ensemble model': examples/ensemble-dimensions.ipynb
- 'Kernel change point detection: a performance comparison': examples/kernel-cpd-performance-comparison.ipynb
- 'Music segmentation': examples/music-segmentation.ipynb
- 'Text segmentation': examples/text-segmentation.ipynb

- Code reference:
- code-reference/index.md
- Base classes: code-reference/base-reference.md
Expand Down