Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][AIOps] Telemetry: track analysis endpoint usage #166988

Merged
merged 13 commits into from
Sep 29, 2023

Conversation

alvarezmelissa87
Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 commented Sep 21, 2023

Summary

This PR adds tracking for Log Rate Analysis and Log Pattern Analysis endpoints for AIOps.

  • tracks type of analysis and source (where the analysis is being run from)

@alvarezmelissa87 alvarezmelissa87 added :ml release_note:skip Skip the PR/issue when compiling release notes Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis v8.11.0 labels Sep 21, 2023
@alvarezmelissa87 alvarezmelissa87 self-assigned this Sep 21, 2023
@alvarezmelissa87 alvarezmelissa87 requested review from a team as code owners September 21, 2023 19:39
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@alvarezmelissa87
Copy link
Contributor Author

cc @benakansara for the infra plugin addition.

@@ -268,6 +268,7 @@ export const LogRateAnalysis: FC<AlertDetailsLogRateAnalysisSectionProps> = ({ r
</EuiFlexItem>
<EuiFlexItem>
<LogRateAnalysisContent
source="observability_alerts"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe observability_alert_details or log_threshold_alert_details ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 8c6bc9d

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the header name less generic than just source. Please have a look how Kibana structures other header names like Kbn-Build-Number, Kbn-System-Request, Kbn-Version etc.

@walterra - good call. Updated to aiops-analysis-run-origin in bc27b46

Comment on lines 23 to 29
export const LOG_RATE_ANALYSIS_RUN = 'aiops_log_rate_analysis_run';

export const LOG_PATTERN_ANALYSIS_RUN = 'aiops_log_pattern_analysis_run';

export const CHANGE_POINT_DETECTION_RUN = 'aiops_change_point_detection_run';

export const AIOPS_DEFAULT_SOURCE = 'ml_aiops_labs';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to make these single consts an enum like object with as const, something like:

export const AIOPS_TELEMETRY_ID = {
    LOG_RATE_ANALYSIS_RUN: ..., 
    LOG_PATTERN_ANALYSIS_RUN: ...
} as const;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in bc27b46

@@ -64,6 +65,8 @@ export interface LogRateAnalysisContentProps {
barHighlightColorOverride?: string;
/** Optional callback that exposes data of the completed analysis */
onAnalysisCompleted?: (d: LogRateAnalysisResultsData) => void;
/** Optional identifier to indicate the plugin utilizing the component */
source?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make this non-optional to enforce correct telemetry on new/future embeddings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - updated in bc27b46

@@ -268,6 +268,7 @@ export const LogRateAnalysis: FC<AlertDetailsLogRateAnalysisSectionProps> = ({ r
</EuiFlexItem>
<EuiFlexItem>
<LogRateAnalysisContent
source="observability_alert_details"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have other alert details pages for other rule types. Could we add a prefix/suffix to specify this is related to Log threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in bc27b46

@alvarezmelissa87
Copy link
Contributor Author

@elasticmachine merge upstream

@alvarezmelissa87
Copy link
Contributor Author

This has been updated and is ready for another look when you get a chance 🙏 cc @walterra, @peteharverson, @fkanout

CHANGE_POINT_DETECTION_RUN: 'aiops_change_point_detection_run',
AIOPS_DEFAULT_SOURCE: 'ml_aiops_labs',
AIOPS_ANALYSIS_RUN_ORIGIN: 'aiops-analysis-run-origin',
} as const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the continued nitpicking 😅 — looking at the attributes now they serve different purposes so we probably shouldn't combine them all into one enum. Suggest to break out the run ones.

If I read the code right AIOPS_ANALYSIS_RUN_ORIGIN is now the header name? Suggest to prefix with kbn- then. We now also have a mix of source and origin in the naming. Related to that, if you align this, maybe we can also make the prop name source more specific, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header name can't begin with 'kbn' - I got an error. But good call on renaming the prop to match up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed prop name to 'embeddingOrigin' in c973e76

Copy link
Contributor

@fkanout fkanout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AO changes LGTM.

@@ -32,6 +36,13 @@ export const defineLogCategorizationRoutes = (
},
},
async (context, request, response) => {
const { headers } = request;
trackAIOpsRouteUsage(
Copy link
Contributor

@peteharverson peteharverson Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgowdyelastic is this in the right place? We want to track the number of times the analysis is run from the ML AIOps Labs Log Pattern Analysis page, or from within Discover. Currently I don't see this counter incrementing when I run the analysis from either page.

Copy link
Member

@jgowdyelastic jgowdyelastic Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint will be called whenever the analysis is run, but it also has the risk that it could be called without the categorization analysis being run.

Debugging this, it appears there is a bug. The AIOPS_TELEMETRY_ID.AIOPS_ANALYSIS_RUN_ORIGIN header is not being added to the request and so the telemetry is not being counted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative would be to also implement client side telemetry to count this on click of the analyse button or in the case of the Discover embedded version, on render of the component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed categorization tracking in c973e76

@walterra
Copy link
Contributor

Did some more testing and was able to generate the following data with the latest state:

                {
                  "domainId": "aiops",
                  "counterName": "POST /internal/aiops/log_rate_analysis",
                  "counterType": "run_via_observability_log_threshold_alert_details",
                  "lastUpdatedAt": "2023-09-27T11:31:13.882Z",
                  "fromTimestamp": "2023-09-27T00:00:00Z",
                  "total": 1
                },
               {
                  "domainId": "aiops",
                  "counterName": "POST /internal/aiops/log_rate_analysis",
                  "counterType": "run_via_ml_aiops_labs",
                  "lastUpdatedAt": "2023-09-27T11:11:53.590Z",
                  "fromTimestamp": "2023-09-27T00:00:00Z",
                  "total": 2
                },
                {
                  "domainId": "aiops",
                  "counterName": "POST /internal/aiops/categorization_field_validation",
                  "counterType": "run_via_ml_aiops_labs",
                  "lastUpdatedAt": "2023-09-27T10:59:08.426Z",
                  "fromTimestamp": "2023-09-27T00:00:00Z",
                  "total": 1
                },

Looks good IMHO!

When I run log pattern analysis via Discover it still tracks it as run_via_ml_aiops_labs though.

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest state correctly tracks usage of log pattern analysis across AIOps Labs/Discover:

                {
                  "domainId": "aiops",
                  "counterName": "POST /internal/aiops/categorization_field_validation",
                  "counterType": "run_via_discover_run_pattern_analysis",
                  "lastUpdatedAt": "2023-09-27T17:16:35.721Z",
                  "fromTimestamp": "2023-09-27T00:00:00Z",
                  "total": 1
                },
                {
                  "domainId": "aiops",
                  "counterName": "POST /internal/aiops/categorization_field_validation",
                  "counterType": "run_via_ml_aiops_labs",
                  "lastUpdatedAt": "2023-09-27T17:16:15.710Z",
                  "fromTimestamp": "2023-09-27T00:00:00Z",
                  "total": 4
                },

Just a note: I think it's good enough for now, but should the log pattern analysis flyout ever be used in multiple places we need to make sure that the origin gets passed on from the place where it is embedded from. The way the code is structured now, it would report any flyout usage as a trigger via Discover.

@@ -70,6 +70,7 @@ export async function showCategorizeFlyout(
savedSearch={null}
selectedField={field}
onClose={onFlyoutClose}
embeddingOrigin={'discover_run_pattern_analysis'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't think hardcoding the ID here is sufficient. This flyout could be (in the future) opened from other places in kibana.
To get the real origin of the action, you'll need to have the value passed into this component as a prop and set by the ui action that has trigged it.
The context that is passed to the action's execute function does have an originatingApp variable which I added just in case when this was first written.
This has the value discover when called from Discover.

@alvarezmelissa87 alvarezmelissa87 requested a review from a team as a code owner September 27, 2023 20:48
Copy link
Contributor

@benakansara benakansara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Tested analysis endpoint with Log threshold alert details page.

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and LGTM

Copy link
Member

@jgowdyelastic jgowdyelastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alvarezmelissa87
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 594.8KB 595.3KB +458.0B
infra 1.9MB 1.9MB +60.0B
total +518.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
aiops 7.9KB 8.0KB +129.0B
Unknown metric groups

API count

id before after diff
aiops 66 67 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @alvarezmelissa87

@alvarezmelissa87 alvarezmelissa87 merged commit 0bdbcc0 into elastic:main Sep 29, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Sep 29, 2023
@alvarezmelissa87 alvarezmelissa87 deleted the ml-aiops-add-telemetry branch September 29, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis :ml release_note:skip Skip the PR/issue when compiling release notes v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants