-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps][flapping] change action behavior when flapping #147810
[ResponseOps][flapping] change action behavior when flapping #147810
Conversation
…-action-behavior-when-flapping
x-pack/plugins/alerting/server/alert/create_alert_factory.test.ts
Outdated
Show resolved
Hide resolved
Pinging @elastic/response-ops (Team:ResponseOps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infra
plugin code changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AO changes LGTM
…-action-behavior-when-flapping
## Summary As part of [this PR](#148751) to encapsulate alert related functionality into a `LegacyAlertsClient`, we removed some generics for `ActionGroupId` because they did not seem necessary. However, they are needed to support scheduling actions, which we add in [this PR](#147810). This PR is just for restoring the generics for alert related functions.
…-action-behavior-when-flapping
x-pack/plugins/rule_registry/server/utils/get_alerts_for_notification.ts
Outdated
Show resolved
Hide resolved
…-action-behavior-when-flapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Tested locally, works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM
Only thing I'm curious about is the new pendingRecoveredCount
property on the alert meta. Seems like this would be highly correlated with flappingHistory
. Like, the number of true
entries is probably close to the pendingRecoveredCount
. But since we don't know which state was flipped to in the flapping history, just that it changed between active and recovered, we can't actually calculate the precise number that pendingRecoveredCount
is tracking. Is that right?
Yes that is correct! We also need to reset the value once it's reported active or finally recovered, and I don't think we can use the flapping history to do that. |
💚 Build Succeeded
Metrics [docs]Page load bundle
History
To update your PR or re-run it, just comment with: |
Resolves #143445
Summary
This pr changes the behavior of alerting actions respecting flapping.
To recover a flapping alert the alert must be reported as recovered for 4 (this will be configurable eventually) consecutive runs before the framework makes the alert as recovered.
Link to a spreadsheet that helps vizualize the changes: https://docs.google.com/spreadsheets/d/1amoUDFx-Sdxirnabd_E4OSnyur57sLhY6Ce79oDYgJA/edit#gid=0
Also Resolves #147205 to recover those alerts
Checklist
To verify