Use absolute features when doing training/prediction. #11876

vkalintiris · 2021-12-08T14:37:06Z

Summary

Add a final step to the preprocessing of feature vectors to make them always be absolute values only. This follows on from some internal research showing the impact of this on the distance measure used to generate anomaly scores (illustrative colab notebook).

In below chart we add some contamination to the metric in last part of the chart.

Orange line below then shows impact of ensuring feature vectors are all absolute as opposed to allowing both positive and negative which can lead to a bias in distance measures of metics such as system.cpu that tend to randomly move up and down from second to second.

In this case the orange line behaves better as an anomaly score and jumps consistently in the period of contaminated data. Whereas the implementation that does not force absolute values actually performs worse. Essentially this is because allowing positive and negative values "naturally" suppresses distance measures since elements of the feature vector can essentially cancel each other out when you try to compare distances.

Component Name

area/ml

Test Plan

CI & verification with the ML team.

andrewm4894

LGTM. Tested and verified it behaves as expected on one of our devml nodes.

andrewm4894

LGTM

vkalintiris added the area/ml Machine Learning Related Issues label Dec 8, 2021

vkalintiris requested review from andrewm4894 and siamaktavakoli December 8, 2021 14:37

andrewm4894 previously approved these changes Dec 9, 2021

View reviewed changes

vkalintiris added 2 commits January 13, 2022 14:20

Use absolute features when doing training/prediction.

de97ddd

Rebase.

c5d3fb3

vkalintiris dismissed andrewm4894’s stale review via c5d3fb3 January 13, 2022 12:22

vkalintiris force-pushed the abs-samples branch from fd2d65c to c5d3fb3 Compare January 13, 2022 12:22

andrewm4894 approved these changes Jan 13, 2022

View reviewed changes

siamaktavakoli approved these changes Jan 13, 2022

View reviewed changes

vkalintiris merged commit 90ceb55 into netdata:master Jan 13, 2022

vkalintiris deleted the abs-samples branch July 5, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use absolute features when doing training/prediction. #11876

Use absolute features when doing training/prediction. #11876

vkalintiris commented Dec 8, 2021 •

edited

Loading

andrewm4894 left a comment

andrewm4894 left a comment

Use absolute features when doing training/prediction. #11876

Use absolute features when doing training/prediction. #11876

Conversation

vkalintiris commented Dec 8, 2021 • edited Loading

Summary

Component Name

Test Plan

andrewm4894 left a comment

Choose a reason for hiding this comment

andrewm4894 left a comment

Choose a reason for hiding this comment

vkalintiris commented Dec 8, 2021 •

edited

Loading