Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Alerting] Create FAAD API for use by rule type executors #145103

Closed
ymao1 opened this issue Nov 14, 2022 · 6 comments
Closed

[Response Ops][Alerting] Create FAAD API for use by rule type executors #145103

ymao1 opened this issue Nov 14, 2022 · 6 comments
Assignees
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@ymao1
Copy link
Contributor

ymao1 commented Nov 14, 2022

As part of Phase 2 of framework alerts-as-data, we need to provide an API for rule type executors to report alerts that will be written out as FAAD documents.

POC for possible implementation here. This issue would cover the AlertsClient portion of the POC.

Rule type executors will have to opt into using the new API, which should provide the same services as the existing AlertsFactory (i.e., recovered alerts determination, alert limit checking, categorizing into new/active/recovered alerts, etc) as well as writing out alert documents. The existing AlertsFactory should be deprecated but not removed.

@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 14, 2022
@ymao1 ymao1 added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework labels Nov 14, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 14, 2022
@ymao1
Copy link
Contributor Author

ymao1 commented Nov 14, 2022

This is a large issue so it can be started but a complete implementation may be blocked by #145100

@ymao1
Copy link
Contributor Author

ymao1 commented Jan 11, 2023

This issue will need to be broken down into multiple steps to ensure that the new API has the same functionality as the existing implementation. While planning for this issue, I found it useful to encapsulate all alert related functionality in the alerting task runner into a LegacyAlertsClient. #148751

Here's a recommendation for how we can break this down:

1. Create new AlertsClient that works with alerts-as-data

This new client should eventually replace the LegacyAlertsClient so should contain all the current framework functionality:

  • read existing alerts from FAAD index instead of task manager state
  • allow executors to report alerts back during rule execution
  • allow executors to get recovered alerts
  • "processAlerts" which determines which alerts are new/active/recovered
  • respects the configured alert limit
  • sets flapping flag and flapping history for alert
  • writes the appropriate event log docs
  • writes alerts to FAAD index instead of persisting to task manager state

In addition, the client should check to ensure the context-specific resources have been installed prior to rule execution and retry installation if they have not been.

In additional addition, we should try to proxy the AlertsClient with the LegacyAlertsClient so that if a rule type has registered an alert context with the framework but has not yet updated the executor to use the new AlertsClient, FAAD docs will still be written with just the common framework-level fields.

It may be helpful to split the flapping portion out from this step so we can ensure things work as expected with the FAAD documents. For example, we now store recovered alert history information in the task manager state to support flapping detection. What is the equivalent in the FAAD doc?

2. Update action scheduling to work with FAAD

Currently, action scheduling is handled by the ExecutionHandler class which takes active and recovered (legacy) alerts that context and state information. Updates will need to be made so:

3. Update AlertsClient to perform lifecycle executor functionality

Finally, in order to deprecate the lifecycle executor from the rule registry, we need to absorb the functionality currently provided by that executor. Some of that functionality is already duplicating what the framework does (for example, recovery calculation) but some of it is not available in the framework.

ymao1 added a commit that referenced this issue Jan 13, 2023
…askRunner in `LegacyAlertClient` (#148751)

## Summary

While planning out the FAAD API for
#145103, I found it useful to
create a `LegacyAlertsClient` that encapsulates all of the alert related
processing that's happening in the alerting task runner. This acts as a
proxy to the `AlertFactory` and to the various library functions we use
to process and log alerts to determine recovery status, flapping status.
I added basic unit tests for the client that just test that the correct
library functions are being proxied but since no functionality should be
changing, I did not add or update any integration tests.
@ymao1
Copy link
Contributor Author

ymao1 commented Jan 26, 2023

After some discussion, we are going to take a more incremental approach to creating this new FAAD API. For the first step, we will be creating an AlertsClient that proxies much of the functionality of the LegacyAlertsClient.

We will:

  • continue serializing and deserializing alerts from the task manager state at the start and end of rule execution
  • continue using the existing Alert class (in x-pack/plugins/alerting/server/alert/alert.ts) to store alert data in memory; this allows us to move faster by not having to rewrite all of the processAlerts, setFlapping, trimRecoveredAlerts, getAlertsForNotification, logAlerts and determineAlertsToReturn helper functions that expect this data model
  • continue using the AlertFactory as the underlying method for reporting alerts to the framework.

The new AlertsClient:

  • is instantiated in the task runner if framework alerts are enabled in the config and the rule type has registered with FAAD
  • reads active alert documents from the FAAD indices upon initialization
  • exposes a public API to rule executors for reporting alerts back to the framework; this public API works similarly to the rule registry APIs, where the rule executors report back the document that they want indexed into the FAAD indices. Under the hood however, the AlertsClient converts this to an AlertFactory.create request.
  • bulk writes FAAD documents at the end of rule execution.

We will create (many) followup issues for the subsequent steps after this initial issue is complete.

@ymao1
Copy link
Contributor Author

ymao1 commented Jan 26, 2023

While working on the PR for this issue , I found myself blocked by this issue for moving alert UUID generation to the framework. That issue was put on hold as not needed but we will be reviving it and getting that resolved before moving forward with the PR for this issue.

@ymao1 ymao1 added the blocked label Feb 6, 2023
@ymao1 ymao1 moved this from In Progress to Blocked / On hold in AppEx: ResponseOps - Execution & Connectors Feb 6, 2023
@ymao1
Copy link
Contributor Author

ymao1 commented May 2, 2023

Closing in favor of #156442 and #156443

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
2 participants