Skip to content

Commit

Permalink
Update diagnostic sending logic so it doesn't use EP alerts queue. (#…
Browse files Browse the repository at this point in the history
…171381)

## Summary

Currently, the diagnostic task is enqueueing alerts onto the production
queue. This is problematic and likely causes a lot of EP alert telemetry
loss in busy clusters. There is also a 100/1m cap on the queue which is
also a bottleneck for the diagnostic feed. I'm following up with a
bigger PR to move this query to a
[PIT](https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html)
query.


### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
  • Loading branch information
pjhampton authored Nov 22, 2023
1 parent 668f856 commit 0e2ef90
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ export const TELEMETRY_CHANNEL_LISTS = 'security-lists-v2';

export const TELEMETRY_CHANNEL_ENDPOINT_META = 'endpoint-metadata';

export const TELEMETRY_CHANNEL_ENDPOINT_ALERTS = 'alerts-endpoint';

export const TELEMETRY_CHANNEL_DETECTION_ALERTS = 'alerts-detections';

export const TELEMETRY_CHANNEL_TIMELINE = 'alerts-timeline';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,10 @@ describe('diagnostics telemetry task test', () => {
mockTelemetryEventsSender,
testTaskExecutionPeriod
);

expect(mockTelemetryReceiver.fetchDiagnosticAlerts).toHaveBeenCalledWith(
testTaskExecutionPeriod.last,
testTaskExecutionPeriod.current
);

expect(mockTelemetryEventsSender.queueTelemetryEvents).toHaveBeenCalledWith(
testDiagnosticsAlerts.hits.hits.flatMap((doc) => [doc._source])
);
expect(mockTelemetryEventsSender.sendOnDemand).toBeCalledTimes(1);
expect(mockTelemetryEventsSender.sendOnDemand).toBeCalledTimes(2);
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import type { ITelemetryEventsSender } from '../sender';
import type { TelemetryEvent } from '../types';
import type { ITelemetryReceiver } from '../receiver';
import type { TaskExecutionPeriod } from '../task';
import { TASK_METRICS_CHANNEL } from '../constants';
import { TELEMETRY_CHANNEL_ENDPOINT_ALERTS, TASK_METRICS_CHANNEL } from '../constants';

export function createTelemetryDiagnosticsTaskConfig() {
return {
Expand Down Expand Up @@ -49,14 +49,15 @@ export function createTelemetryDiagnosticsTaskConfig() {
return 0;
}
tlog(logger, `Received ${hits.length} diagnostic alerts`);
const diagAlerts: TelemetryEvent[] = hits.flatMap((h) =>
const alerts: TelemetryEvent[] = hits.flatMap((h) =>
h._source != null ? [h._source] : []
);
sender.queueTelemetryEvents(diagAlerts);

await sender.sendOnDemand(TELEMETRY_CHANNEL_ENDPOINT_ALERTS, alerts);
await sender.sendOnDemand(TASK_METRICS_CHANNEL, [
createTaskMetric(taskName, true, startTime),
]);
return diagAlerts.length;
return alerts.length;
} catch (err) {
await sender.sendOnDemand(TASK_METRICS_CHANNEL, [
createTaskMetric(taskName, false, startTime, err.message),
Expand Down

0 comments on commit 0e2ef90

Please sign in to comment.