Change detection improvements #38

rvansa · 2020-09-18T12:31:11Z

A pseudo-issue tracking ideas for improvements in regression monitoring

rvansa · 2020-09-18T12:35:16Z

When the test contains sequential data (e.g. throughput per-second), rather than just averaging these for comparison with other run we could treat the sequence as a multi-dimensional vector and compute distance between those vectors.

When the sequences are not of equal length we could just pad the missing dimensions with average value. Truncating the longer vector is another option but that would lose information.

rvansa · 2020-09-18T13:55:03Z

The critical problem with current approach using T-test for each variable is that with growing number of variables that are not independent the likelihood of false positives in at least one of them is growing. It's useful to incorporate more performance counters into the dataset but clustering them by covariance could have better properties.

rvansa · 2020-09-24T15:20:12Z

I made an experiment with the current algorithm: I've generated 200 dummy runs using ~normal distribution (10 + sum of 5 random() ) and let the algorithm do its job. It generated 25 changes: that's a clear evidence that the approach is flawed.

rvansa · 2020-09-24T15:37:33Z

Actually, when using the minWindow = 5, and trying the 2 * stddev test, I got 10 changes, and in the t-test I got 9 changes. These numbers roughly fit the expected population outside mean +- 2*stdev (4.55%) or confidence levels (5% chance of rejecting null hypotesis while it holds).

We get what we ask for from the statistics, even though we wish for no false positives.

rvansa · 2021-09-07T11:32:31Z

@johnaohara Thinking about comparing histograms, I think it could be done, and the method could be used for any constant-size vector: we could average the values to obtain the baseline and then calculate square root of difference in each item and average these. Then the regular thresholds would apply.

I am not sure how useful this would be in practice but it is something that makes sense to try and is not possible currently (you could compare each vector item but you'd probably need higher thresholds - it is not possible to diff each vector item first now).

The UI would not need to be more complicated: this could be a default for any regression var returning and array. If the vector size differs, though, the comparison would fail and notification would be sent. Charting would be a bit more difficult to do: I can imagine interactive time axis, using the whole chart to display just the single histogram, with gray 'average histogram' in the background. Optional log scale? (with some primitive heuristic choosing the default). If users choose to plot data that have completely different scales such chart wouldn't be too useful, but hey, they can normalize them in the calculation function (not affecting the regression algorithm at all).

rvansa · 2021-09-30T14:20:43Z

If we decide to adopt some form of statistical tests again we should use https://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method to compensate for the multiple comparisons.

rvansa · 2022-03-09T15:12:12Z

When monitoring performance on a branch it might be that a regression is introduced, and later on it's fixed. Horreum does not let us confirm if the performance after the fix is equal to the one before the regression.

rvansa · 2023-07-18T12:21:28Z

Worth reading paper on change detection: https://arxiv.org/pdf/1101.1438.pdf

johnaohara · 2023-07-18T12:35:12Z

Hey @rvansa hope you are doing well! thanks for the link to the paper, will take a look

rvansa · 2023-07-18T12:37:32Z

Hi John, yep, except no AC in my home office :) I've actually found the paper when I've stumbled upon this python library: https://centre-borelli.github.io/ruptures-docs/

rvansa added the type/enhancement An enhancement to an existing feature label Sep 18, 2020

rvansa changed the title ~~Regression testing improvements~~ Change detection improvements Mar 4, 2022

jesperpedersen added type/feature A new feature and removed type/enhancement An enhancement to an existing feature labels Feb 6, 2023

jesperpedersen mentioned this issue Jun 12, 2023

Test suite: Hang #564

Closed

johnaohara mentioned this issue Jan 2, 2024

Add E-Divisive means algorithm to the available change detection models #1007

Closed

johnaohara added this to the 0.13 milestone Jan 2, 2024

stalep modified the milestones: 0.13, 0.14 Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change detection improvements #38

Change detection improvements #38

rvansa commented Sep 18, 2020

rvansa commented Sep 18, 2020

rvansa commented Sep 18, 2020

rvansa commented Sep 24, 2020

rvansa commented Sep 24, 2020

rvansa commented Sep 7, 2021

rvansa commented Sep 30, 2021

rvansa commented Mar 9, 2022

rvansa commented Jul 18, 2023

johnaohara commented Jul 18, 2023

rvansa commented Jul 18, 2023

Change detection improvements #38

Change detection improvements #38

Comments

rvansa commented Sep 18, 2020

rvansa commented Sep 18, 2020

rvansa commented Sep 18, 2020

rvansa commented Sep 24, 2020

rvansa commented Sep 24, 2020

rvansa commented Sep 7, 2021

rvansa commented Sep 30, 2021

rvansa commented Mar 9, 2022

rvansa commented Jul 18, 2023

johnaohara commented Jul 18, 2023

rvansa commented Jul 18, 2023