deepcharles · deepcharles · Apr 15, 2022 · Apr 6, 2022 · Apr 6, 2022 · Apr 13, 2022
diff --git a/docs/examples/ensemble-dimensions.ipynb b/docs/examples/ensemble-dimensions.ipynb
@@ -0,0 +1,308 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Ensemble dimensions: splitting changepoint detection algorithms along the dimensions\n",
+    "\n",
+    "<!-- {{ add_binder_block(page) }} -->\n",
+    "\n",
+    "## Introduction\n",
+    "\n",
+    "In `ruptures`, change point detection procedures make use of the same cost function along all the dimensions.\n",
+    "The choice of the cost function is critical as it is related to the type of change to find. \n",
+    "For instance, [CostL2](../user-guide/costs/costl2.md) can detect shifts in the mean, [CostNormal](../user-guide/costs/costnormal.md) can detect shifts in the mean and the covariance structure, [CostAR](../user-guide/costs/costautoregressive.md) can detect shifts in the auto-regressive structure, etc.\n",
+    "\n",
+    "However, in many settings, all the dimensions don't have the same type of changes and a single cost function is not able to spot all changes simultaneously.\n",
+    "To cope with this issue, a procedure to study the cost on each dimension independantly is presented here. It is inspired by the paper [[Katser2021]](#Katser2021), where a procedure to merge several cost functions has been introduced.\n",
+    "In a nutshell, different costs along each dimension can be combined to yield an aggregated cost function which is sensitive to several types of changes.\n",
+    "The aggregated cost can then be used with any search method (such as the [window search method](../user-guide/detection/window.md)) to create change point detection algorithm.\n",
+    "\n",
+    "This example illustrates the aggregation procedure, also referred to as an ensemble model.\n",
+    "Here, only [CostL2](../user-guide/costs/costl2.md) is considered for all dimensions, but all other costs could be used. The focus is then on the way the costs are combined (see [intersection](#intersection) and [union](#union)).\n",
+    "In addition, the number of changes is assumed to be known by the user."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "First, we make the necessary imports and generate a multivariate toy signal which contains different mean shifts along the dimensions. Notice that only one changepoint is shared between the two dimensions. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from itertools import cycle\n",
+    "import matplotlib.pyplot as plt\n",
+    "import ruptures as rpt\n",
+    "\n",
+    "\n",
+    "# Scaling function\n",
+    "def minmax(array):\n",
+    "    return (array - np.min(array, axis=0)) / (\n",
+    "        np.max(array, axis=0) - np.min(array, axis=0) + 1e-8\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "# Aggregation functions\n",
+    "def min_(array):\n",
+    "    return np.min(array, axis=1).T\n",
+    "\n",
+    "\n",
+    "def max_(array):\n",
+    "    return np.max(array, axis=1).T"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bkps = [329, 656, 972, 1291, 1642, 2000]\n",
+    "n_samples = bkps[-1]\n",
+    "bkps_1 = [bkps[0], bkps[1], bkps[4], n_samples]\n",
+    "bkps_2 = [bkps[1], bkps[2], bkps[3], n_samples]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "signal = np.zeros((n_samples, 2))\n",
+    "val_cycle = cycle([0, 1])\n",
+    "for (start, end) in rpt.utils.pairwise([0] + bkps_1):\n",
+    "    signal[start:end, 0] = next(val_cycle)\n",
+    "for (start, end) in rpt.utils.pairwise([0] + bkps_2):\n",
+    "    signal[start:end, 1] = next(val_cycle)\n",
+    "\n",
+    "fig, axes = rpt.display(signal, bkps)\n",
+    "axes[0].set_title(\"Noise free signal\")\n",
+    "\n",
+    "signal += np.random.normal(size=signal.shape)\n",
+    "fig, axes = rpt.display(signal, bkps)\n",
+    "_ = axes[0].set_title(\"Toy signal\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following cell defines a custom cost that computes the [CostL2](../user-guide/costs/costl2.md) on the given dimension. To define a custom cost that would not compute the same cost on all dimensions, an `if` loop relying on the value of `self.dim` can be imagined. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from ruptures.base import BaseCost\n",
+    "\n",
+    "\n",
+    "class MyCost(BaseCost):\n",
+    "\n",
+    "    \"\"\"Custom cost for exponential signals.\"\"\"\n",
+    "\n",
+    "    # The 2 following attributes must be specified for compatibility.\n",
+    "    model = \"\"\n",
+    "    min_size = 2\n",
+    "\n",
+    "    def __init__(self, dim):\n",
+    "        super().__init__()\n",
+    "        self.dim = dim\n",
+    "\n",
+    "    def fit(self, signal):\n",
+    "        \"\"\"Set the internal parameter.\"\"\"\n",
+    "        self.signal = signal[:, self.dim].reshape(-1, 1)\n",
+    "        return self\n",
+    "\n",
+    "    def error(self, start, end) -> float:\n",
+    "        \"\"\"Return the approximation cost on the segment [start:end].\n",
+    "\n",
+    "        Args:\n",
+    "            start (int): start of the segment\n",
+    "            end (int): end of the segment\n",
+    "\n",
+    "        Returns:\n",
+    "            segment cost\n",
+    "\n",
+    "        Raises:\n",
+    "            NotEnoughPoints: when the segment is too short (less than `min_size` samples).\n",
+    "        \"\"\"\n",
+    "        if end - start < self.min_size:\n",
+    "            raise rpt.exceptions.NotEnoughPoints\n",
+    "\n",
+    "        return self.signal[start:end].var(axis=0).sum() * (end - start)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Looking at the costs along each dimension\n",
+    "\n",
+    "Here, the [window search method](../user-guide/detection/window.md) is used. Thus, the scores will be considered instead of the costs.\n",
+    "\n",
+    "The following cell shows the scores along each dimension:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "window_size = 200\n",
+    "\n",
+    "list_of_costs = [MyCost(dim=0), MyCost(dim=1)]\n",
+    "\n",
+    "scores = []\n",
+    "\n",
+    "for cost in list_of_costs:\n",
+    "    algo = rpt.Window(width=window_size, custom_cost=cost, jump=1).fit(signal)\n",
+    "    scores.append(algo.score)\n",
+    "scores = np.array(scores).T\n",
+    "\n",
+    "# For display purpose\n",
+    "appended_scores = np.append(\n",
+    "    np.ones((window_size // 2, 2)) * float(\"inf\"), scores, axis=0\n",
+    ")\n",
+    "fig, axes = rpt.display(appended_scores, bkps)\n",
+    "_ = axes[0].set_title(\"Scores along each dimension\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Intersection\n",
+    "\n",
+    "A first way of aggregating the scores is to consider each score along a dimension as being an expert. The idea is then to take the __intersection__ of the experts so that the predicted changepoints are the changepoints where all the experts are confident. This method is useful when the user is interested in collecting the changepoints that have been detected on all dimensions at the same timestep.\n",
+    "\n",
+    "In the following cell, the intersection procedure correctly predicts the only common changepoint between the two dimensions. The aggregated score shows clearly the only changepoint to predict."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bkps_intersection = [bkps[1], n_samples]\n",
+    "n_bkps_intersection = 1\n",
+    "\n",
+    "# intersection aggregation\n",
+    "algo.score = min_(minmax(scores))\n",
+    "bkps_intersection_predicted = algo.predict(n_bkps=n_bkps_intersection)\n",
+    "\n",
+    "# For display purpose\n",
+    "appended_intersection_scores = np.append(\n",
+    "    np.ones(window_size // 2) * float(\"inf\"), algo.score\n",
+    ")\n",
+    "\n",
+    "fig, (ax,) = rpt.display(\n",
+    "    appended_intersection_scores, bkps_intersection, bkps_intersection_predicted\n",
+    ")\n",
+    "_ = ax.set_title(\"Aggregated score for intersection purpose\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Union\n",
+    "\n",
+    "Another way of aggregating the scores is to take the __union__ of the experts so that the predicted changepoints are the changepoints where at least one expert is confident. This method is useful when the user is interested in collecting all the changepoints that are present along all dimensions.\n",
+    "\n",
+    "In the following cell, the union procedure correctly predicts all the changepoints. The aggregated score shows 5 clear peaks from which to take the changepoints."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "bkps_union = bkps\n",
+    "n_bkps_union = 5\n",
+    "\n",
+    "# union aggregation\n",
+    "algo.score = max_(minmax(scores))\n",
+    "bkps_union_predicted = algo.predict(n_bkps=n_bkps_union)\n",
+    "\n",
+    "# For display purpose\n",
+    "appended_union_scores = np.append(np.ones(window_size // 2) * float(\"inf\"), algo.score)\n",
+    "\n",
+    "fig, (ax,) = rpt.display(appended_union_scores, bkps_union, bkps_union_predicted)\n",
+    "_ = ax.set_title(\"Aggregated score for union purpose\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "This example shows a way of crafting a changepoint detection algorithm at the scale of the dimensions. This is a finer scale than the usual one: the signal scale.\n",
+    "\n",
+    "Two options are presented: [intersection](#intersection) where a changepoint is detected only if it is detected along all dimensions and [union](#union) where a changepoint is detected as soon as one dimension has detected it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Authors\n",
+    "\n",
+    "This example notebook has been authored by [Théo VINCENT](https://github.com/theovincent) and edited by [Olivier Boulant](https://github.com/oboulant) and [Charles Truong](https://github.com/deepcharles)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## References\n",
+    "\n",
+    "<a id=\"Katser2021\">[Katser2021]</a>\n",
+    "Katser, I., Kozitsin, V., Lobachev, V., & Maksimov, I. (2021). Unsupervised Offline Changepoint Detection Ensembles. Applied Sciences, 11(9), 4280."
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "de7c5fd504cc39d35018b8f7c7b90638a48a11123680d0b8595c478c27913e5d"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -77,9 +77,11 @@ nav:
     - 'Simple usages':
       - 'Basic usage': examples/basic-usage.ipynb
     - 'Advanced usages':
+      - 'Dimensions ensemble model': examples/ensemble-dimensions.ipynb
       - 'Kernel change point detection: a performance comparison': examples/kernel-cpd-performance-comparison.ipynb
       - 'Music segmentation': examples/music-segmentation.ipynb
       - 'Text segmentation': examples/text-segmentation.ipynb
+
   - Code reference:
       - code-reference/index.md
       - Base classes: code-reference/base-reference.md