docs: Ensemble dimensions (old) #240

theovincent · 2022-03-02T07:57:48Z

Hi,

I have implemented an ensemble method on top of rupture. The contribution is only a Jupyter Notebook showing how to replicate the algorithm shown in Katser2021. This Jupyter Notebook is added to the galery of examples in the docs. This subject was first raised in the following issue and is related to the following one.

The code was developed following the guidelines. The docs work for me on my computer when I use mkdocs serve. The only problem is that I don't see the output of the cells of the Jupyter Notebook. The problem also happens if I try to view the other examples in the gallery even if I am in the branch master.

[Katser2021]
Katser, I., Kozitsin, V., Lobachev, V., & Maksimov, I. (2021). Unsupervised Offline Changepoint Detection Ensembles. Applied Sciences, 11(9), 4280.

theovincent · 2022-03-02T08:26:03Z

The workflow pre-commit.ci is not passing. It comes from the blacken-docs check. I don't think it comes from my code. Indeed, the log message reads:

Traceback (most recent call last):
  File "/pc/clone/dh2xB3GrSU6cvd4ucIO-Kw/py_env-python3/bin/blacken-docs", line 8, in <module>
    sys.exit(main())
  File "/pc/clone/dh2xB3GrSU6cvd4ucIO-Kw/py_env-python3/lib/python3.8/site-packages/blacken_docs.py", line 238, in main
    black_mode = black.FileMode(
  File "<string>", line 3, in __init__
TypeError: set object expected; got list

In ruptures the tag of the blacken-docs used is v1.12.0. Interestingly, in the new version of the blacken-docs (v1.12.1), the code line 238 in blacken_docs.py has changed and the commit message is called Fix target_versions for FileMode. I will then change the version of the blacken-docs pre-commit.

for more information, see https://pre-commit.ci

oboulant · 2022-03-03T13:32:47Z

Hi @theovincent ,

Thanks for your interest in ruptures !

FYI, as in your description you say that

I don't see the output of the cells of the Jupyter Notebook

If you want to see the results of the docs built in the CI, you can look at the artifacts generated by the GHA. For instance, the last one passing related to your PR is here. Then see the artifact, download it and you can browse locally. Spoiler alert, your cells outputs are there !

We will look into your PR in the following days !

Thx !

docs/examples/ensemble-window.ipynb

oboulant · 2022-03-07T20:49:16Z

docs/examples/ensemble-window.ipynb

+    "from ruptures.metrics import randindex\n",
+    "\n",
+    "# generate a signal\n",
+    "n_samples, dim, sigma = 250, 3, 4\n",


Here, you might have a better showcase if you took the regular sigma = 1.
Indeed, what we see from the results, is that all the costs (included CostLinear to some extent) perform pretty poorly on the second half of the signal.
To show the power of the combination, we might want to have a "better" score on the second half with CostLinear.
WDYT ?

@oboulant The problem I see by setting sigma = 1 is that the non-ensemble methods perform too well, which makes the ensemble method useless. This is what I have for sigma = 1:

ar 0.923 mahalanobis 0.962 l1 0.959 l2 0.962 linear 0.74 ensemble 0.959

I also tried to put the noise sigma = 4 only on the piecewise constant signal but this also leads to similar results as the one that we have:

ar 0.929 mahalanobis 0.941 l1 0.942 l2 0.941 linear 0.898 ensemble 0.942

As a reminder, this is what we get with sigma = 4 on both parts of the signal:

ar 0.941 mahalanobis 0.941 l1 0.953 l2 0.941 linear 0.897 ensemble 0.953

My puzzle

All right, here are few elements after I had a look into it. I was puzzled that the linear cost model was returning really bad scores (in my opinion). Indeed, in my opinion, since the signal is supposed to be tailor made for this cost model, I expected a cleaner score profile on the second half of the signal. See following graph where I highlighted the section I am refering to ⬇️

The root problem

So I managed to understand why we witness this behaviour. It is because you normalize on the complete signal. If you rather normalize each signal before concatenating them, the score profile for the linear cost model is way more clean and how I expect it to be. See following graph ⬇️

Still an issue

The problem is not fully solved since the aggregated score is still not satisfactory according to me. See following graph ⬇️

I think the aggregated score is not fine because the scaling and aggregating functions do not do their job. But I have to give it some more thoughts and dive deeper into it to identify what is the real issue.

Please let me know if what I present here is unclear !

@oboulant it is really clear. I agree, to my mind, the scaling and aggregating functions are no the best here but they are the ones on which the authors got the best scores:

The thing is that I think

those figures are really data dependent

individual score profiles from the paper seem really different than what we have. Whereas I quite agree on the need for the two steps (scale and aggregate), I do not think than given our individual score profiles, MinAbs is doing the job of scaling. The question of the choice of the aggregating function is more open, since according to me the main failure here is the scaling function

So moving forward I would look into a better way to scale the individual scores. I tried so normalize by the max value because given our individual score at least it allows for comparison afterward. But it fails at the aggregation step because individual scores are noisy (which seems not to be that true if I look at the paper's sample graph) : even with "reasonable" lambas, noise cancels out the signal (on the linear part of the signal). Maybe we have to denoise individual scores before aggregation / before scaling.

docs/examples/ensemble-window.ipynb

Co-authored-by: Olivier Boulant <[email protected]>

deepcharles · 2022-04-06T09:56:49Z

I'm ok with the code.

deepcharles · 2022-04-14T09:09:56Z

replaced by #248

theovincent added 4 commits March 1, 2022 19:52

Window ensemble: formatted

4f8ddef

Respect naming convention (ensemble-window)

d202d6e

Fix pretty equation (ensemble-window)

c623b86

Change other (ensemble-window)

259ca1e

theovincent changed the title ~~Ensemble window~~ Gallery of examples: Ensemble window Mar 2, 2022

theovincent changed the title ~~Gallery of examples: Ensemble window~~ docs: Ensemble window Mar 2, 2022

theovincent and others added 2 commits March 2, 2022 09:27

Update version (blacken-docs)

89137b2

[pre-commit.ci] auto fixes from pre-commit.com hooks

28f7f10

for more information, see https://pre-commit.ci