setHeaderRoute error and memory leak #3276

dtelaroli · 2023-12-27T15:46:21Z

Checklist:

I've included steps to reproduce the bug.
I've included the version of argo rollouts.

Describe the bug

Problem 1:
The argo-rollouts is adding duplicated header route, flooding the virtual service with a content that is bigger than the etcd supports.

Problem 2:
After the problem 1, the argo-rollouts pod is leaking memory consuming all the node memory, it restarts and starts again the cycle.
This problem happens if happens any problem which generates a big manifest. I saw same behavior using the analysis-run for 24h of metrics collection.

time="2023-12-27T15:31:07Z" level=warning msg="Request entity too large: limit is 3145728" event_reason=TrafficRoutingError namespace=psm-test rollout=clismo

To Reproduce

I don't know how to reproduce the Problem 1.
It's possible to reproduce the Problem 2 creating a virtual service with this route duplicated.

- match:
        - headers:
            x-version:
              exact: PR-132-b36d66a
      name: header-route-version
      route:
        - destination:
            host: clismo
            subset: canary
          weight: 100

It's needed more than 6k lines to error happen.
After that, execute a change in the rollout to starts a new rollout version.

Expected behavior

Screenshots

Version

v1.5.0

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

time="2023-12-27T15:38:47Z" level=info msg="Started syncing rollout" generation=359 namespace=psm-test resourceVersion=3287885544 rollout=clismo
time="2023-12-27T15:38:48Z" level=info msg="Found 1 TrafficRouting Reconcilers" namespace=psm-test rollout=clismo
time="2023-12-27T15:38:48Z" level=info msg="Reconciling TrafficRouting with type 'Istio'" namespace=psm-test rollout=clismo
time="2023-12-27T15:38:50Z" level=warning msg="Request entity too large: limit is 3145728" event_reason=TrafficRoutingError namespace=psm-test rollout=clismo
time="2023-12-27T15:38:50Z" level=error msg="roCtx.reconcile err Request entity too large: limit is 3145728" generation=359 namespace=psm-test resourceVersion=3287885544 rollout=clismo
time="2023-12-27T15:38:50Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"psm-test\", Name:\"clismo\", UID:\"15051ab3-a968-4673-b1af-55ac0a8c525d\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"3287885544\", FieldPath:\"\"}): type: 'Warning' reason: 'TrafficRoutingError' Request entity too large: limit is 3145728"

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

zachaller · 2023-12-27T18:14:29Z

I think this is possibly fixed in 1.6, could you try 1.6.4?

#2887

dtelaroli · 2023-12-27T20:32:10Z

Hi @zachaller
I have another issue that is a blocker to me upgrade the argo-rollouts.
#3223

dtelaroli · 2023-12-27T20:35:15Z

Anyway, the PR #2887 fixes the problem 1, it doesn't solve the problem 2.

andyliuliming · 2024-05-06T13:08:01Z

@dtelaroli did you have some findings for the memory footprint issue?
we observed some potential memory leak issue in our env too. (usually the memory usage is 200Mi, but after 15 days, it becomes 600Mi, althrough we only have about 5 rollouts in our cluster.

dtelaroli · 2024-05-16T14:01:35Z

@andyliuliming i've discovered that the issue happens when you have a big manifest synced by the application.
There is a limit of size and when the size is over the limit the argo-rollouts dispatch error each sync cicle and this generates the memory leak. Request entity too large: limit is 3145728
Fixing the big manifest, the issue disappear.

Another issue that I had is because the rollouts adds a empty step during the setHeaderRoute: - {}
This brakes the rollouts also, generating memory leak.

dtelaroli added the bug Something isn't working label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setHeaderRoute error and memory leak #3276

setHeaderRoute error and memory leak #3276

dtelaroli commented Dec 27, 2023 •

edited

Loading

zachaller commented Dec 27, 2023 •

edited

Loading

dtelaroli commented Dec 27, 2023 •

edited

Loading

dtelaroli commented Dec 27, 2023

andyliuliming commented May 6, 2024

dtelaroli commented May 16, 2024

setHeaderRoute error and memory leak #3276

setHeaderRoute error and memory leak #3276

Comments

dtelaroli commented Dec 27, 2023 • edited Loading

zachaller commented Dec 27, 2023 • edited Loading

dtelaroli commented Dec 27, 2023 • edited Loading

dtelaroli commented Dec 27, 2023

andyliuliming commented May 6, 2024

dtelaroli commented May 16, 2024

dtelaroli commented Dec 27, 2023 •

edited

Loading

zachaller commented Dec 27, 2023 •

edited

Loading

dtelaroli commented Dec 27, 2023 •

edited

Loading