Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "skipped" conditional for near zero metric value fast fails #412

Merged

Conversation

DanielHHowell
Copy link
Contributor

No description provided.

@linear
Copy link

linear bot commented Mar 17, 2022

ENG-565 Servox fast-fail logic for handling values/thresholds at or near zero

During release candidate testing for 2.4 we ran into a failing baseline validation stage caused by the error_rate fast fail configuration:

        - keep: below
          metric: tuning_error_rate
          threshold_metric: main_error_rate
          threshold_multiplier: 4
          trigger_count: 3
          trigger_window: 5

The problem was that main error_rate was 0 whereas tuning error rate was 0.009090909090909090467524933388. The tuning error rate is within acceptable margins but because 0.009090909090909090467524933388 > 4 * 0 it fast-failed the measurement and we need to add logic to the fast fail to ignore values less than 1 or provide some sort of minimum value (eg. 0.25) when the threshold_metrics is detected as 0

https://console.opsani.com/accounts/dev.opsani.com/applications/opsani-dev24-rc/logs?sort_by=-last_activity_time&show_apps=all&log_details_for=dfLK4rv7wgit03hCD9tf

@DanielHHowell DanielHHowell requested a review from linkous8 March 17, 2022 18:14
Copy link
Contributor

@linkous8 linkous8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a new property to the FastFailConfiguration class that toggles whether to treat absolute zero values (== 0, not close to 0) as missing since some metric systems will return 0 for a metric that has data in the system even though it has no data/null data for the time frame being queried. The new configuration allows us to account for that as needed

@DanielHHowell DanielHHowell requested a review from linkous8 April 7, 2022 18:14
Co-authored-by: Fred L Sharp <[email protected]>
@DanielHHowell DanielHHowell merged commit ca4fc2f into main Apr 12, 2022
@DanielHHowell DanielHHowell deleted the danielh/eng-565-servox-fast-fail-logic-for-handling branch April 12, 2022 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants