-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Add pagination to the upgrade/_review
endpoint
#208361
Closed
Tracked by
#201502
Labels
8.18 candidate
bug
Fixes for quality problems that affect the customer experience
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
impact:high
Addressing this issue will have a high level of impact on the quality/strength of our product.
performance
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
v8.18.0
Comments
Pinging @elastic/security-solution (Team: SecuritySolution) |
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
This was referenced Jan 27, 2025
xcrzx
added a commit
that referenced
this issue
Mar 3, 2025
… size (#211045) **Resolves: #208361 **Resolves: #210544 ## Summary This PR introduces significant memory consumption improvements to the prebuilt rule endpoints, ensuring users won't encounter OOM errors on memory-limited Kibana instances. Memory consumption testing results provided in #211045 (comment). ## Details This PR implements a number of memory usage optimizations to the prebuilt rule endpoints with the final goal reducing chances of getting OOM errors. The changes are extensive and require thorough testing before merging. The changes are described by the following bullets - The most significant change is the addition of pagination to the `upgrade/_review` endpoint. This endpoint was known for causing OOM errors due to its large and ever-growing response size. With pagination, it now returns upgrade information for no more than 20-100 rules at a time, significantly reducing its memory footprint. - New backend methods, such as `ruleObjectsClient.fetchInstalledRuleVersions`, have been introduced. These methods return rule IDs with their corresponding installed versions, allowing to build a map of outdated rules without loading all available rules into memory. Previously, all installed rules, along with their base and target versions, were fetched unconditionally before filtering for updates. - The `stats` data structure of the review endpoint has been deprecated (it can be safely removed after one Serverless release cycle). Since the endpoint now returns paginated results, building stats is no longer feasible due to the limited rule set size fetched on the server side. As the side effect it required removing related Cypress tests asserting `Update All` disabled when rules can't be updated. - All changes to the endpoints are backward-compatible. All previously required returned structures still present in response. All newly added structures are optional. - Upgradeable rule tags are now returned from the prebuilt rule status endpoint. - The frontend logic has been updated to move sorting and filtering of prebuilt rules from the client side to the server side. - The `upgrade/_perform` endpoint has been rewritten to use lightweight rule version information rather than full rules to determine upgradeable rules. Additionally, upgrades are now performed in batches of up to 100 rules, further reducing memory usage. - A dry run option has been added to the upgrade perform endpoint. This is needed for the "Update all" rules scenario to determine if any rules contain conflicts and display a confirmation modal to the user. - An option to skip conflicting rules has been added to the upgrade endpoint when called with the `ALL_RULES` mode. - The `install/_review` endpoint's memory consumption has been optimized by avoiding loading all rules into memory to determine available rules for installation. Redundant fetching of all base versions has also been removed, as they do not participate in the calculation. --------- Co-authored-by: Maxim Palenov <[email protected]>
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Mar 3, 2025
… size (elastic#211045) **Resolves: elastic#208361 **Resolves: elastic#210544 ## Summary This PR introduces significant memory consumption improvements to the prebuilt rule endpoints, ensuring users won't encounter OOM errors on memory-limited Kibana instances. Memory consumption testing results provided in elastic#211045 (comment). ## Details This PR implements a number of memory usage optimizations to the prebuilt rule endpoints with the final goal reducing chances of getting OOM errors. The changes are extensive and require thorough testing before merging. The changes are described by the following bullets - The most significant change is the addition of pagination to the `upgrade/_review` endpoint. This endpoint was known for causing OOM errors due to its large and ever-growing response size. With pagination, it now returns upgrade information for no more than 20-100 rules at a time, significantly reducing its memory footprint. - New backend methods, such as `ruleObjectsClient.fetchInstalledRuleVersions`, have been introduced. These methods return rule IDs with their corresponding installed versions, allowing to build a map of outdated rules without loading all available rules into memory. Previously, all installed rules, along with their base and target versions, were fetched unconditionally before filtering for updates. - The `stats` data structure of the review endpoint has been deprecated (it can be safely removed after one Serverless release cycle). Since the endpoint now returns paginated results, building stats is no longer feasible due to the limited rule set size fetched on the server side. As the side effect it required removing related Cypress tests asserting `Update All` disabled when rules can't be updated. - All changes to the endpoints are backward-compatible. All previously required returned structures still present in response. All newly added structures are optional. - Upgradeable rule tags are now returned from the prebuilt rule status endpoint. - The frontend logic has been updated to move sorting and filtering of prebuilt rules from the client side to the server side. - The `upgrade/_perform` endpoint has been rewritten to use lightweight rule version information rather than full rules to determine upgradeable rules. Additionally, upgrades are now performed in batches of up to 100 rules, further reducing memory usage. - A dry run option has been added to the upgrade perform endpoint. This is needed for the "Update all" rules scenario to determine if any rules contain conflicts and display a confirmation modal to the user. - An option to skip conflicting rules has been added to the upgrade endpoint when called with the `ALL_RULES` mode. - The `install/_review` endpoint's memory consumption has been optimized by avoiding loading all rules into memory to determine available rules for installation. Redundant fetching of all base versions has also been removed, as they do not participate in the calculation. --------- Co-authored-by: Maxim Palenov <[email protected]> (cherry picked from commit c4a016e)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Mar 3, 2025
… size (elastic#211045) **Resolves: elastic#208361 **Resolves: elastic#210544 ## Summary This PR introduces significant memory consumption improvements to the prebuilt rule endpoints, ensuring users won't encounter OOM errors on memory-limited Kibana instances. Memory consumption testing results provided in elastic#211045 (comment). ## Details This PR implements a number of memory usage optimizations to the prebuilt rule endpoints with the final goal reducing chances of getting OOM errors. The changes are extensive and require thorough testing before merging. The changes are described by the following bullets - The most significant change is the addition of pagination to the `upgrade/_review` endpoint. This endpoint was known for causing OOM errors due to its large and ever-growing response size. With pagination, it now returns upgrade information for no more than 20-100 rules at a time, significantly reducing its memory footprint. - New backend methods, such as `ruleObjectsClient.fetchInstalledRuleVersions`, have been introduced. These methods return rule IDs with their corresponding installed versions, allowing to build a map of outdated rules without loading all available rules into memory. Previously, all installed rules, along with their base and target versions, were fetched unconditionally before filtering for updates. - The `stats` data structure of the review endpoint has been deprecated (it can be safely removed after one Serverless release cycle). Since the endpoint now returns paginated results, building stats is no longer feasible due to the limited rule set size fetched on the server side. As the side effect it required removing related Cypress tests asserting `Update All` disabled when rules can't be updated. - All changes to the endpoints are backward-compatible. All previously required returned structures still present in response. All newly added structures are optional. - Upgradeable rule tags are now returned from the prebuilt rule status endpoint. - The frontend logic has been updated to move sorting and filtering of prebuilt rules from the client side to the server side. - The `upgrade/_perform` endpoint has been rewritten to use lightweight rule version information rather than full rules to determine upgradeable rules. Additionally, upgrades are now performed in batches of up to 100 rules, further reducing memory usage. - A dry run option has been added to the upgrade perform endpoint. This is needed for the "Update all" rules scenario to determine if any rules contain conflicts and display a confirmation modal to the user. - An option to skip conflicting rules has been added to the upgrade endpoint when called with the `ALL_RULES` mode. - The `install/_review` endpoint's memory consumption has been optimized by avoiding loading all rules into memory to determine available rules for installation. Redundant fetching of all base versions has also been removed, as they do not participate in the calculation. --------- Co-authored-by: Maxim Palenov <[email protected]> (cherry picked from commit c4a016e)
kibanamachine
pushed a commit
to kibanamachine/kibana
that referenced
this issue
Mar 3, 2025
… size (elastic#211045) **Resolves: elastic#208361 **Resolves: elastic#210544 ## Summary This PR introduces significant memory consumption improvements to the prebuilt rule endpoints, ensuring users won't encounter OOM errors on memory-limited Kibana instances. Memory consumption testing results provided in elastic#211045 (comment). ## Details This PR implements a number of memory usage optimizations to the prebuilt rule endpoints with the final goal reducing chances of getting OOM errors. The changes are extensive and require thorough testing before merging. The changes are described by the following bullets - The most significant change is the addition of pagination to the `upgrade/_review` endpoint. This endpoint was known for causing OOM errors due to its large and ever-growing response size. With pagination, it now returns upgrade information for no more than 20-100 rules at a time, significantly reducing its memory footprint. - New backend methods, such as `ruleObjectsClient.fetchInstalledRuleVersions`, have been introduced. These methods return rule IDs with their corresponding installed versions, allowing to build a map of outdated rules without loading all available rules into memory. Previously, all installed rules, along with their base and target versions, were fetched unconditionally before filtering for updates. - The `stats` data structure of the review endpoint has been deprecated (it can be safely removed after one Serverless release cycle). Since the endpoint now returns paginated results, building stats is no longer feasible due to the limited rule set size fetched on the server side. As the side effect it required removing related Cypress tests asserting `Update All` disabled when rules can't be updated. - All changes to the endpoints are backward-compatible. All previously required returned structures still present in response. All newly added structures are optional. - Upgradeable rule tags are now returned from the prebuilt rule status endpoint. - The frontend logic has been updated to move sorting and filtering of prebuilt rules from the client side to the server side. - The `upgrade/_perform` endpoint has been rewritten to use lightweight rule version information rather than full rules to determine upgradeable rules. Additionally, upgrades are now performed in batches of up to 100 rules, further reducing memory usage. - A dry run option has been added to the upgrade perform endpoint. This is needed for the "Update all" rules scenario to determine if any rules contain conflicts and display a confirmation modal to the user. - An option to skip conflicting rules has been added to the upgrade endpoint when called with the `ALL_RULES` mode. - The `install/_review` endpoint's memory consumption has been optimized by avoiding loading all rules into memory to determine available rules for installation. Redundant fetching of all base versions has also been removed, as they do not participate in the calculation. --------- Co-authored-by: Maxim Palenov <[email protected]> (cherry picked from commit c4a016e)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
8.18 candidate
bug
Fixes for quality problems that affect the customer experience
Feature:Prebuilt Detection Rules
Security Solution Prebuilt Detection Rules area
impact:high
Addressing this issue will have a high level of impact on the quality/strength of our product.
performance
Team:Detection Rule Management
Security Detection Rule Management Team
Team:Detections and Resp
Security Detection Response Team
Team: SecuritySolution
Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
v8.18.0
Epic: #174168
Related to: https://github.com/elastic/security-team/issues/11822, #209551, #210544
Summary
This is related to OOM errors investigation (see
#incident-1039-multiple-projects-oomkilled-when-installing-updating-fleet-pa
in Slack for more details).Here's a short summary of what happened.
/detection_engine/prebuilt_rules/_bootstrap
calls in the logs, but since the endpoint responded in less than 1 second, that means the package installation didn’t occur. The endpoint skips installation if the package is already installed, which was the case here.detection_engine/prebuilt_rules/installation/_perform
requests in the logs. These requests are used to install all prebuilt Elastic rules. Normally, browsers don’t send parallel requests like this, and it’s not something that can be easily triggered through the UI. Given there are many other duplicate requests in the proxy logs, I strongly suspect this was a testing environment where two rule installation requests were deliberately sent at the same time from separate browser tabs to test the workflow._perform
requests were handled successfully, even though each took almost one minute to complete.detection_engine/prebuilt_rules/installation/_review
request to fetch updated information about prebuilt Elastic rules. This request is memory-intensive, because all rules (more than 1,000) are loaded into memory on the server side to calculate their installation status._review
requests were sent, both logged with a 499 status (client closed connection). Immediately after, the browser sent two additional_review
requests, resulting in Kibana handling four concurrent_review
requests at one point._review
requests appears to have caused the OOM. This conclusion can also be confirmed by proxy logs that show many heavy Elasticsearch responses (around 80Mb summed up).To avoid OOM errors in the future, we need to implement pagination for the
_review
endpoint so all rules are not fetched into memory. This might be not that straightforward, and considered a long-term solution. A short term solution would be introduce a caching mechanism to skip redundant calculations: #208357The text was updated successfully, but these errors were encountered: