-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207
Comments
Pinging @elastic/response-ops (Team:ResponseOps) |
From a comment in a previous attempted PR: #189041 ... In order to make the double update (rule and task) more resilient, we determined that this should be the order of updates:
analysisTrying to capture what happens when a request to enable/disable a task is run, but one of the updates to the task doc or the rule doc fails - in this case, it's the second one (task updated then rule => task doc updated, then rule doc updated). Two tables for each of enable / disable, showing the difference in the ordering of the updates (rule then task, or task then rule). The first column is the input state of the task/doc, and the second column is the final state after the failed update. The mismatched input states would be the result of a bad enable/disable update, or some other bad thing that happened. Final state of "task: !enabled, rule: enabled" should be avoided if possible. The rule will look enabled in the UX, but actually will not be running. These final states have an Final state of "task: enabled, rule: !enabled" is acceptable. In this case, the rule is the source of truth for enablement, so when the task runs, it will check if the rule is enabled. If it isn't, the rule execution code will instead disable the task, so it won't be polled for till re-enabled. Even if that update fails, we'll get it again the next time. enable: task updated then rule update fails
enable: rule updated then task update fails
disable: task updated then rule update fails
disable: rule updated then task update fails
|
In one of the previous related PRs, I believe I noticed we did NOT have a test where we created a rule in a disabled state, and then did a bulk update of it. We should add a FT for this case, and anything similar/related. You can repro this today, manually, by importing a rule that has been exported - it will be disabled on import (with no backing task doc). You can then enable it via the rule list, which will use the bulk enable/disable functionality. |
Assuming this PR gets merged, it may change the way we fix this problem: Add bulk update function that directly updates using the esClient #191760. Since this allows a partial update, what I'm wondering is if we just blast the |
Related to: #181050
When enabling rules, the Alerting framework skips rules if their saved object has
alert.attributes.enabled: true
.This behavior creates issues described here and in the attached SDH.
There might be a situation where a rule is marked as enabled, but no corresponding task exists in the Task Manager.
In the UI, these rules will appear enabled but will never run. Users expect that all rules affected by the bulk enable action will get a corresponding task created in the Task Manager and be scheduled for execution.
Therefore, it would be best to also check if the rules to be enabled have tasks in the Task Manager, instead of relying solely on the rule's current enabled state.
The text was updated successfully, but these errors were encountered: