Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of native SidecarContainers (init container with restartPolicy: Always) results in pod status getting stuck on Init #3366

Closed
martynd opened this issue Feb 13, 2024 · 6 comments · Fixed by #3639 or argoproj/homebrew-tap#34
Labels
bug Something isn't working

Comments

@martynd
Copy link
Contributor

martynd commented Feb 13, 2024

K8s 1.29 enables the SidecarContainers feature gate by default (more info on the native sidecars here https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/)
These are defined as initContainers with the restartPolicy set to Always

This change in logic results in pods being perpetually (visually) stuck in the Init phase

This appears to have affected various software which interacts with k8s, eg the following issue on k9s
derailed/k9s#2520

Screenshots
Screenshot 2024-02-13 at 11 07 29 AM

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
@martynd martynd added the bug Something isn't working label Feb 13, 2024
@phyzical
Copy link

phyzical commented May 20, 2024

We will also need to add logic to allow the spec to support redinessProbe and startupProbe for initContainers with the restartPolicy set to Always

 spec.template.spec.initContainers[1].readinessProbe: Forbidden: may not be set for init containers

side question,

@martynd are you able to confirm that the rollout spec works with restartPolicy when i try to add this i find that the live manifest never adds its but i do see it in the desired manifest?

@martynd
Copy link
Contributor Author

martynd commented May 20, 2024

@phyzical Im not sure actually as the use cases we have get the restart always init containers added at pod creation via mutation webhooks (istio and gcsfuse csi)

It may be possible to manually edit the live rollout object to have the correct restart policy (similar to other cases where argo doesnt want to change things due to managed fields) but I suspect you will get a validation error there too

I recall from [https://github.com//issues/3130](this bug) that the deployment part of the rollout spec is validated against the k8s libraries deployment validation which would be fine since its a valid 1.28+ spec deployment spec except that rollouts is built against the 1.26 k8s apis

That said, it looks like the 1.7 branch is built against 1.29 as of this commit 2dc71e3 which should at least get it to validate

@phyzical
Copy link

phyzical commented May 23, 2024

I gave "release-1.7" tag a go, didn't help sadl.

turns out i missed part of the error from before its actually is invalid: spec.template.spec.initContainers[1].readinessProbe: Forbidden: may not be set for init containers without restartPolicy=Always

So i wondered if the restart policy for normal containers propagated, and it turns out it doesnt for that either.

that makes me think this is by design and couuuld explain the side effects your experiencing ? @martynd

Ill have another sniff at the source to see if theres any comments as to why it does this

edit: hmmm but i see this ref

"title": "RestartPolicy defines the restart behavior of individual containers in a pod.\nThis field may only be set for init containers, and the only allowed value is \"Always\".\nFor non-init containers or when this field is not specified,\nthe restart behavior is defined by the Pod's restart policy and the container type.\nSetting the RestartPolicy as \"Always\" for the init container will have the following effect:\nthis init container will be continually restarted on\nexit until all regular containers have terminated. Once all regular\ncontainers have completed, all init containers with restartPolicy \"Always\"\nwill be shut down. This lifecycle differs from normal init containers and\nis often referred to as a \"sidecar\" container. Although this init\ncontainer still starts in the init container sequence, it does not wait\nfor the container to complete before proceeding to the next init\ncontainer. Instead, the next init container starts immediately after this\ninit container is started, or after any startupProbe has successfully\ncompleted.\n+featureGate=SidecarContainers\n+optional"

Ah actually, the helm's crds still havn't been updated so pining the containing to 1.7 doesn't help the crd side ofc

After manually applying the updated crd all works as expected thanks 🦆

@Hronom
Copy link

Hronom commented Jun 10, 2024

Pretty actual, it's appear to be a blocker to enable native sidecars for istio as sidecars now can't work properly with Argo Rollouts.

This one is critical to release

@sherifabdlnaby
Copy link

This is very important! It's blocking us from using Argo Rollouts

@zachaller zachaller assigned zachaller and unassigned zachaller Jul 25, 2024
zachaller pushed a commit that referenced this issue Aug 13, 2024
…ixes #3366 (#3639)

* Update pod status logic to support native sidecars

Signed-off-by: Martyn Dale <[email protected]>

* Fix lint issues

Signed-off-by: Martyn Dale <[email protected]>

---------

Signed-off-by: Martyn Dale <[email protected]>
@nsaini-figma
Copy link

Is this fixed in 1.8.0? I see the commit that has the fix in the 1.8.0 code, but it looks like a subsequent commit deletes this code? I am still seeing this behavior in 1.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants