Incorrect classification for low-variance data #35

kohlert · 2024-08-28T03:40:21Z

I often work with timeseries data in log space, and it can have a fairly small absolute variance. I believe the fix for issue #26 has since caused another bug where the added random noise (up to 0.1, based on the distribution used) can be sufficient to introduce significant errors into the resulting peak detection.

The ideal solution would probably be to use indexing instead of relying on unique values for de-duplication, but that might require some additional refactoring that I haven't fully scoped out.

A simple alternative solution could be to use the smallest available increment to de-duplicate the data instead of relying on random noise.

erdogant · 2024-08-28T19:29:54Z

Thank you for this contribution! Your solution is much better than adding random noise. I created a new version.

kohlert · 2024-08-28T20:54:27Z

Glad I could help out. Thanks for all the work you do on the library. This is a great resource!

kohlert mentioned this issue Aug 28, 2024

Incremental de-duplication #36

Merged

kohlert closed this as completed Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect classification for low-variance data #35

Incorrect classification for low-variance data #35

kohlert commented Aug 28, 2024

erdogant commented Aug 28, 2024

kohlert commented Aug 28, 2024

Incorrect classification for low-variance data #35

Incorrect classification for low-variance data #35

Comments

kohlert commented Aug 28, 2024

erdogant commented Aug 28, 2024

kohlert commented Aug 28, 2024