[Security Solution] Extend Detection Engine Health API with top N rules by metrics #181169
+727
−206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Relates to: #125642
Summary
This PR extends Detection Engine Health API by adding top N (by default 10) rules grouped by metrics like execution duration or schedule delay.
Details
This PR is part of my OnWeek! project to investigate possible usage of LLM models for example ChatGPT provided by OpenAI to perform automatic rule monitoring by summarising problems in Detection Engine Health API responses and giving users instructions and advices to solve the problems.
Extending Detection Engine Health API by top N rules is beneficial on its own since it allows to easily spot problematic rules and investigate further manually. It could be super helpful while working on SDH.
The following API endpoints were extended
/internal/detection_engine/health/_cluster
/internal/detection_engine/health/_space
A number of extracted top N rules is controlled by
num_of_top_rules
body param. A default value is 10 rules.It's possible to set this param only by using a HTTP POST request (similar behavior for
interval
). When a HTTP GET request is used alway maximum of 10 top rules will be returned for each metric.The following metrics were added to show top N rules for each of them (measured in milliseconds)
The following response parts were extended by added a section under
top_rules
keystats_over_interval
)history_over_interval
)Response example
Cluster health response (truncated)