Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alertmanager-dropevents-config): dropped events treatment config #439

Merged

Conversation

Lowaiz
Copy link
Contributor

@Lowaiz Lowaiz commented Mar 28, 2023

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area config

/area outputs

/area tests

What this PR does / why we need it:

Alertmanager output make some work on Event Drop Alerts, to re-calibrate priority and enabled Alertmanager to aggregate alerts (implemented here: #79 ). This was hard coded, and non configurable, re evaluating these alerts to Critical when the number of dropped event is > 100. This type of alert occurs time to time in our infra, and the re-evaluation to critical is a bit overkill.

So in order to make it more configurable, I created two more config flags for Alertmanager:

  • DropEventThresholdList: list of Threshold{"value": int64, "Priority": PriorityType}
  • DropEventDefaultPriority: Default Priority to override with, if no threshold has been triggered

With the default value of thresholds, it mimic the old behavior, except for value between 1 and 10 included, where the <10 will become >1, in order to keep a simple config structure, avoiding the parse of an operator.

I also kinda fix this behavior with the if current prio < new prio; current prio = new prio, as the falcopayload.Priority could be override with every new n_drops_* labels.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

I'll update the docs if changes are accepted

@Issif
Copy link
Member

Issif commented Apr 25, 2023

Can you also update the README and config_example.yaml please?

@Issif Issif self-assigned this Apr 27, 2023
README.md Outdated Show resolved Hide resolved
config.go Outdated Show resolved Hide resolved
config.go Outdated Show resolved Hide resolved
@Lowaiz Lowaiz force-pushed the feat/alertmanager-dropped-events/config branch 2 times, most recently from 9ec3b97 to a5ef039 Compare May 12, 2023 14:38
@Lowaiz
Copy link
Contributor Author

Lowaiz commented May 12, 2023

Now both MR are ready !

outputs/alertmanager_test.go Show resolved Hide resolved
outputs/alertmanager.go Show resolved Hide resolved
@Issif
Copy link
Member

Issif commented Jul 5, 2023

Can you rebase your two PR please?

@Lowaiz Lowaiz force-pushed the feat/alertmanager-dropped-events/config branch from f53bc14 to a5cdcb4 Compare July 6, 2023 15:35
Issif
Issif previously approved these changes Jul 6, 2023
@poiana poiana added the lgtm label Jul 6, 2023
@poiana
Copy link

poiana commented Jul 6, 2023

LGTM label has been added.

Git tree hash: f8a1e4db7560ccbb9bb39538417f0495e78bb117

@poiana poiana added the approved label Jul 6, 2023
@Lowaiz Lowaiz force-pushed the feat/alertmanager-dropped-events/config branch from a5cdcb4 to da64478 Compare July 6, 2023 16:06
@poiana poiana removed the lgtm label Jul 6, 2023
@poiana poiana requested a review from Issif July 6, 2023 16:06
@poiana poiana removed the approved label Jul 6, 2023
@Lowaiz
Copy link
Contributor Author

Lowaiz commented Jul 6, 2023

@Issif It needed a second rebase, as the previous merge introduced conflict

Issif
Issif previously approved these changes Jul 6, 2023
@poiana poiana added the lgtm label Jul 6, 2023
@poiana
Copy link

poiana commented Jul 6, 2023

LGTM label has been added.

Git tree hash: 77a0f2711ff5eab11881042e1533e4350ffd4208

@poiana poiana added the approved label Jul 6, 2023
@Issif
Copy link
Member

Issif commented Jul 6, 2023

A unit test failed:

=== RUN   TestNewAlertmanagerPayloadDropEvent
    alertmanager_test.go:59: 
        	Error Trace:	/home/runner/work/falcosidekick/falcosidekick/outputs/alertmanager_test.go:59
        	Error:      	Not equal: 
        	            	expected: []outputs.alertmanagerPayload{outputs.alertmanagerPayload{Labels:map[string]string{"ebpf_enabled":"1", "eventsource":"", "hostname":"host", "n_drops":">10000", "n_drops_buffer_clone_fork_enter":"0", "n_drops_buffer_clone_fork_exit":"0", "n_drops_buffer_connect_enter":"0", "n_drops_buffer_connect_exit":"0", "n_drops_buffer_dir_file_enter":">100", "n_drops_buffer_dir_file_exit":">100", "n_drops_buffer_execve_enter":"0", "n_drops_buffer_execve_exit":"0", "n_drops_buffer_open_enter":">100", "n_drops_buffer_open_exit":">100", "n_drops_buffer_other_interest_enter":"0", "n_drops_buffer_other_interest_exit":"0", "n_drops_buffer_total":">10000", "n_drops_bug":"0", "n_drops_page_faults":"0", "n_drops_scratch_map":"0", "priority":"Critical", "rule":"Falco internal: syscall event drop", "source":"falco"}, Annotations:map[string]string{"description":"Falco internal: syscall event drop. 815508 system calls dropped in last second.", "info":"Falco internal: syscall event drop. 815508 system calls dropped in last second.", "summary":"Falco internal: syscall event drop"}, EndsAt:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC)}}
        	            	actual  : []outputs.alertmanagerPayload{outputs.alertmanagerPayload{Labels:map[string]string{"ebpf_enabled":"1", "eventsource":"", "hostname":"host", "n_drops":">10000", "n_drops_buffer_clone_fork_enter":"0", "n_drops_buffer_clone_fork_exit":"0", "n_drops_buffer_connect_enter":"0", "n_drops_buffer_connect_exit":"0", "n_drops_buffer_dir_file_enter":">100", "n_drops_buffer_dir_file_exit":">100", "n_drops_buffer_execve_enter":"0", "n_drops_buffer_execve_exit":"0", "n_drops_buffer_open_enter":">100", "n_drops_buffer_open_exit":">100", "n_drops_buffer_other_interest_enter":"0", "n_drops_buffer_other_interest_exit":"0", "n_drops_buffer_total":">10000", "n_drops_bug":"0", "n_drops_page_faults":"0", "n_drops_scratch_map":"0", "priority":"Critical", "rule":"Falco internal: syscall event drop", "severity":"critical", "source":"falco"}, Annotations:map[string]string{"description":"Falco internal: syscall event drop. 815508 system calls dropped in last second.", "info":"Falco internal: syscall event drop. 815508 system calls dropped in last second.", "summary":"Falco internal: syscall event drop"}, EndsAt:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC)}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -2,3 +2,3 @@
        	            	  (outputs.alertmanagerPayload) {
        	            	-  Labels: (map[string]string) (len=23) {
        	            	+  Labels: (map[string]string) (len=24) {
        	            	    (string) (len=12) "ebpf_enabled": (string) (len=1) "1",
        	            	@@ -25,2 +25,3 @@
        	            	    (string) (len=4) "rule": (string) (len=34) "Falco internal: syscall event drop",
        	            	+   (string) (len=8) "severity": (string) (len=8) "critical",
        	            	    (string) (len=6) "source": (string) (len=5) "falco"
        	Test:       	TestNewAlertmanagerPayloadDropEvent
--- FAIL: TestNewAlertmanagerPayloadDropEvent (0.00s)

@poiana poiana removed the lgtm label Jul 7, 2023
@poiana poiana requested a review from Issif July 7, 2023 10:30
@poiana poiana added the lgtm label Jul 7, 2023
@poiana
Copy link

poiana commented Jul 7, 2023

LGTM label has been added.

Git tree hash: 6adf4f16c536c4ba1c665b1e623c284221efed6d

@poiana
Copy link

poiana commented Jul 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Issif, Lowaiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana added the approved label Jul 7, 2023
@poiana poiana merged commit 3bafeb6 into falcosecurity:master Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants