-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Ensemble dimensions (old) #240
Conversation
The workflow pre-commit.ci is not passing. It comes from the
In |
for more information, see https://pre-commit.ci
Hi @theovincent , Thanks for your interest in FYI, as in your description you say that
If you want to see the results of the docs built in the CI, you can look at the artifacts generated by the GHA. For instance, the last one passing related to your PR is here. Then see the artifact, download it and you can browse locally. Spoiler alert, your cells outputs are there ! We will look into your PR in the following days ! Thx ! |
docs/examples/ensemble-window.ipynb
Outdated
"from ruptures.metrics import randindex\n", | ||
"\n", | ||
"# generate a signal\n", | ||
"n_samples, dim, sigma = 250, 3, 4\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, you might have a better showcase if you took the regular sigma = 1
.
Indeed, what we see from the results, is that all the costs (included CostLinear
to some extent) perform pretty poorly on the second half of the signal.
To show the power of the combination, we might want to have a "better" score on the second half with CostLinear
.
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oboulant The problem I see by setting sigma = 1 is that the non-ensemble methods perform too well, which makes the ensemble method useless. This is what I have for sigma = 1
:
ar 0.923
mahalanobis 0.962
l1 0.959
l2 0.962
linear 0.74
ensemble 0.959
I also tried to put the noise sigma = 4
only on the piecewise constant signal but this also leads to similar results as the one that we have:
ar 0.929
mahalanobis 0.941
l1 0.942
l2 0.941
linear 0.898
ensemble 0.942
As a reminder, this is what we get with sigma = 4
on both parts of the signal:
ar 0.941
mahalanobis 0.941
l1 0.953
l2 0.941
linear 0.897
ensemble 0.953
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My puzzle
All right, here are few elements after I had a look into it. I was puzzled that the linear
cost model was returning really bad scores (in my opinion). Indeed, in my opinion, since the signal is supposed to be tailor made for this cost model, I expected a cleaner score profile on the second half of the signal. See following graph where I highlighted the section I am refering to ⬇️
The root problem
So I managed to understand why we witness this behaviour. It is because you normalize on the complete signal. If you rather normalize each signal before concatenating them, the score profile for the linear
cost model is way more clean and how I expect it to be. See following graph ⬇️
Still an issue
The problem is not fully solved since the aggregated score is still not satisfactory according to me. See following graph ⬇️
I think the aggregated score is not fine because the scaling and aggregating functions do not do their job. But I have to give it some more thoughts and dive deeper into it to identify what is the real issue.
Please let me know if what I present here is unclear !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oboulant it is really clear. I agree, to my mind, the scaling and aggregating functions are no the best here but they are the ones on which the authors got the best scores:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that I think
- those figures are really data dependent
- individual score profiles from the paper seem really different than what we have. Whereas I quite agree on the need for the two steps (scale and aggregate), I do not think than given our individual score profiles,
MinAbs
is doing the job of scaling. The question of the choice of the aggregating function is more open, since according to me the main failure here is the scaling function
So moving forward I would look into a better way to scale the individual scores. I tried so normalize by the max value because given our individual score at least it allows for comparison afterward. But it fails at the aggregation step because individual scores are noisy (which seems not to be that true if I look at the paper's sample graph) : even with "reasonable" lambas, noise cancels out the signal (on the linear part of the signal). Maybe we have to denoise individual scores before aggregation / before scaling.
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
Co-authored-by: Olivier Boulant <[email protected]>
I'm ok with the code. |
replaced by #248 |
Hi,
I have implemented an ensemble method on top of
rupture
. The contribution is only a Jupyter Notebook showing how to replicate the algorithm shown in Katser2021. This Jupyter Notebook is added to the galery of examples in the docs. This subject was first raised in the following issue and is related to the following one.The code was developed following the guidelines. The docs work for me on my computer when I use
mkdocs serve
. The only problem is that I don't see the output of the cells of the Jupyter Notebook. The problem also happens if I try to view the other examples in the gallery even if I am in the branch master.[Katser2021]
Katser, I., Kozitsin, V., Lobachev, V., & Maksimov, I. (2021). Unsupervised Offline Changepoint Detection Ensembles. Applied Sciences, 11(9), 4280.