Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix revision discarded on event rate limiting key calculation #517

Merged
merged 1 commit into from
May 17, 2023

Conversation

matheuscscp
Copy link
Member

@matheuscscp matheuscscp commented Apr 30, 2023

Problem found while investigating fluxcd/helm-controller#678

@matheuscscp matheuscscp force-pushed the fix-rate-limit branch 3 times, most recently from 8f1eebc to 0556872 Compare April 30, 2023 23:55
@matheuscscp matheuscscp changed the title Fix event rate limiting discarding metadata on key Fix revision discarded on event rate limiting key calculation Apr 30, 2023
@matheuscscp
Copy link
Member Author

@hiddeco @darkowlzz I think this PR is ready to go as-is (except maybe for the change in the cleanupMetadata() function, I updated the PR description to explain my decision), the only thing I wanna fix in this PR is the revision key not being considered in the rate limiting (and print more debug info for events that were rate-limited). #518 builds on this one because it touches the same part of the code, but is in fact completely orthogonal and can be reviewed/merged later.

Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this bug and fixing it.
I've left a few inline comments which may help improve it.

I did some work on trying to figure out how to test this and found some issues with the current state of our tests. I've created #521 which should get merged soon. Currently, the event handler is being tested along with the controller (I've a refactor branch to change that, mentioned in the other PR). For now, you can do the following on top of those test changes to test the rate limit fix. In order to handle and check rate limited requests, I had to modify the request sent check.

diff --git a/internal/controllers/alert_controller_test.go b/internal/controllers/alert_controller_test.go
index 6aa7be1..7fe24e2 100644
--- a/internal/controllers/alert_controller_test.go
+++ b/internal/controllers/alert_controller_test.go
@@ -302,13 +302,13 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 	}
 	event := *eventFixture.DeepCopy()
 
-	testSent := func(g *WithT) {
+	testSent := func(g *WithT, reqFail bool) {
 		g.THelper()
 		buf := &bytes.Buffer{}
 		g.Expect(json.NewEncoder(buf).Encode(&event)).To(Succeed())
 		res, err := http.Post("http://localhost:56789/", "application/json", buf)
 		g.Expect(err).ToNot(HaveOccurred())
-		g.Expect(res.StatusCode).To(Equal(202)) // event_server responds with 202 Accepted
+		g.Expect(res.StatusCode != http.StatusAccepted).To(Equal(reqFail))
 	}
 
 	testForwarded := func(g *WithT) {
@@ -334,6 +334,7 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 		name            string
 		modifyEventFunc func(e eventv1.Event) eventv1.Event
 		forwarded       bool
+		reqFail         bool
 	}{
 		{
 			name:            "forwards when source is a match",
@@ -418,6 +419,52 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 			},
 			forwarded: false,
 		},
+		{
+			name: "forwards events with revision metadata",
+			modifyEventFunc: func(e eventv1.Event) eventv1.Event {
+				e.InvolvedObject.Kind = "GitRepository"
+				e.InvolvedObject.Name = "podinfo"
+				e.InvolvedObject.APIVersion = "source.toolkit.fluxcd.io/v1"
+				e.InvolvedObject.Namespace = namespace
+				e.Message = "test"
+				e.Metadata = map[string]string{
+					fmt.Sprintf("%s/%s", "source.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev1",
+				}
+				return e
+			},
+			forwarded: true,
+		},
+		{
+			name: "rate limit events with same revision metadata",
+			modifyEventFunc: func(e eventv1.Event) eventv1.Event {
+				e.InvolvedObject.Kind = "GitRepository"
+				e.InvolvedObject.Name = "podinfo"
+				e.InvolvedObject.APIVersion = "source.toolkit.fluxcd.io/v1"
+				e.InvolvedObject.Namespace = namespace
+				e.Message = "test"
+				e.Metadata = map[string]string{
+					fmt.Sprintf("%s/%s", "source.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev1",
+				}
+				return e
+			},
+			forwarded: false,
+			reqFail:   true,
+		},
+		{
+			name: "forwards events with different revision metadata",
+			modifyEventFunc: func(e eventv1.Event) eventv1.Event {
+				e.InvolvedObject.Kind = "GitRepository"
+				e.InvolvedObject.Name = "podinfo"
+				e.InvolvedObject.APIVersion = "source.toolkit.fluxcd.io/v1"
+				e.InvolvedObject.Namespace = namespace
+				e.Message = "test"
+				e.Metadata = map[string]string{
+					fmt.Sprintf("%s/%s", "source.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev2",
+				}
+				return e
+			},
+			forwarded: true,
+		},
 	}
 
 	for _, tt := range tests {
@@ -428,7 +475,7 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 			event = *eventFixture.DeepCopy()
 
 			event = tt.modifyEventFunc(event)
-			testSent(g)
+			testSent(g, tt.reqFail)
 			if tt.forwarded {
 				testForwarded(g)
 			} else {
@@ -470,6 +517,7 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 		name            string
 		modifyEventFunc func(e eventv1.Event) eventv1.Event
 		forwarded       bool
+		reqFail         bool
 	}{
 		{
 			name:            "forwards when message matches inclusion list",
@@ -502,7 +550,7 @@ func TestAlertReconciler_EventHandler(t *testing.T) {
 			event = *eventFixture2.DeepCopy()
 
 			event = tt.modifyEventFunc(event)
-			testSent(g)
+			testSent(g, tt.reqFail)
 			if tt.forwarded {
 				testForwarded(g)
 			} else {

Update: Due to the modification of cleanupMetadata(), initially I thought it'd be good to have e2e test for this, which the above controller test does. Later, I found out that just the eventKeyFunc can be tested with the rate limit middleware in a unit test in isolation:

diff --git a/internal/server/event_server_test.go b/internal/server/event_server_test.go
index 00a6e33..1709d15 100644
--- a/internal/server/event_server_test.go
+++ b/internal/server/event_server_test.go
@@ -53,6 +53,7 @@ func TestEventKeyFunc(t *testing.T) {
 		severity       string
 		message        string
 		rateLimit      bool
+		metadata       map[string]string
 	}{
 		{
 			involvedObject: corev1.ObjectReference{
@@ -120,6 +121,48 @@ func TestEventKeyFunc(t *testing.T) {
 			message:   "Health check passed",
 			rateLimit: true,
 		},
+		{
+			involvedObject: corev1.ObjectReference{
+				APIVersion: "kustomize.toolkit.fluxcd.io/v1beta1",
+				Kind:       "Kustomization",
+				Name:       "4",
+				Namespace:  "4",
+			},
+			severity: eventv1.EventSeverityInfo,
+			message:  "Health check passed",
+			metadata: map[string]string{
+				fmt.Sprintf("%s/%s", "kustomize.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev1",
+			},
+			rateLimit: false,
+		},
+		{
+			involvedObject: corev1.ObjectReference{
+				APIVersion: "kustomize.toolkit.fluxcd.io/v1beta1",
+				Kind:       "Kustomization",
+				Name:       "4",
+				Namespace:  "4",
+			},
+			severity: eventv1.EventSeverityInfo,
+			message:  "Health check passed",
+			metadata: map[string]string{
+				fmt.Sprintf("%s/%s", "kustomize.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev1",
+			},
+			rateLimit: true,
+		},
+		{
+			involvedObject: corev1.ObjectReference{
+				APIVersion: "kustomize.toolkit.fluxcd.io/v1beta1",
+				Kind:       "Kustomization",
+				Name:       "4",
+				Namespace:  "4",
+			},
+			severity: eventv1.EventSeverityInfo,
+			message:  "Health check passed",
+			metadata: map[string]string{
+				fmt.Sprintf("%s/%s", "kustomize.toolkit.fluxcd.io", eventv1.MetaRevisionKey): "rev2",
+			},
+			rateLimit: false,
+		},
 	}
 	for i, tt := range tests {
 		t.Run(fmt.Sprintf("%v", i), func(t *testing.T) {
@@ -127,6 +170,7 @@ func TestEventKeyFunc(t *testing.T) {
 				InvolvedObject: tt.involvedObject,
 				Severity:       tt.severity,
 				Message:        tt.message,
+				Metadata:       tt.metadata,
 			}
 			eventData, err := json.Marshal(event)
 			g.Expect(err).ShouldNot(gomega.HaveOccurred())

Maybe we can have both or just the unit test would be fine.

@darkowlzz darkowlzz added bug Something isn't working area/alerting Alerting related issues and PRs labels May 8, 2023
@matheuscscp matheuscscp force-pushed the fix-rate-limit branch 7 times, most recently from a9406fe to f95559e Compare May 11, 2023 19:01
@matheuscscp
Copy link
Member Author

@stefanprodan @darkowlzz I finished a new iteration of the PR. PTAL 🙏

@matheuscscp matheuscscp requested a review from darkowlzz May 12, 2023 15:57
Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very neat implementation. LGTM! Thanks.

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks @matheuscscp! 🥇

@hiddeco hiddeco merged commit 49122b9 into fluxcd:main May 17, 2023
@matheuscscp matheuscscp deleted the fix-rate-limit branch May 17, 2023 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alerting Alerting related issues and PRs bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants