Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify behavior of parallel pod management policy of stateful sets #47085

Open
mittal-ishaan opened this issue Jul 4, 2024 · 7 comments
Open
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@mittal-ishaan
Copy link

Problem:
I was facing the issue described in kubernetes/kubernetes#67250

The way around discussed by community users to avoid this is to set the podManagementPolicy to Parallel. As suggested here

I have tried this and it works as expected, when I update the pod template to a good configuration it terminates all pods and does not wait for pods to be Running and Ready or completely terminated before launching or terminating another Pod.

All was good until I read the documentation for podManagementPolicy further, I saw one more line stated here

This option only affects the behaviour for scaling operations. Updates are not affected.

Setting it to Parallel worked for me and when I update the configuration, it works, contradicting what the above line in the docs says.

I went through the code for it and saw

https://github.com/kubernetes/kubernetes/blob/88313a445174e21ed326f40802429b854e5be9ba/pkg/controller/statefulset/stateful_set_control.go#L436-L440

when we set podManagementPolicy to parallel, monotonic is set to false and we never enter this if block. this in turn at the end leads to updating the pods.

https://github.com/kubernetes/kubernetes/blob/88313a445174e21ed326f40802429b854e5be9ba/pkg/controller/statefulset/stateful_set_control.go#L459

Proposed Solution:
This doc change was added for the Kubernetes 1.11 version and I suppose the code has changed for it since then. I have verified that updates are indeed affected by the parallel pod management policy. We should update the docs to remove the line stating updates are not affected.

Page to Update:
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset

Kubernetes Version: v1.30.0

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 4, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Ritikaa96
Copy link
Contributor

Ritikaa96 commented Jul 4, 2024

/sig apps
/sig architecture
/sig scheduling
/kind bug

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. kind/bug Categorizes issue or PR as related to a bug. labels Jul 4, 2024
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Scheduling Jul 4, 2024
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Jul 4, 2024
@mittal-ishaan
Copy link
Author

Hey,
Wanted to know, if there is any update on this

@ayushpatil2122
Copy link
Contributor

/assign

@tengqm
Copy link
Contributor

tengqm commented Oct 11, 2024

/sig apps

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@EronWright
Copy link

EronWright commented Feb 7, 2025

I would appreciate some clarifying remarks about how the parallel policy relates to revision changes. I would guess that is what the term 'update' means here. As discussed in kubernetes/kubernetes#67250, the parallel policy seems to unblock a stuck rollout, meaning that it does affect updates. Meanwhile, according to kubernetes/kubernetes#96218, it should not affect updates. Then there's the MaxUnavailable flag to consider, since it is said to have an additional effect (though I haven't observed any relation to this issue).

I think the documentation should be changed to say more (not less) about the behavior of updates (revision changes).

EronWright added a commit to pulumi/pulumi-kubernetes-operator that referenced this issue Feb 10, 2025
<!--Thanks for your contribution. See [CONTRIBUTING](CONTRIBUTING.md)
    for Pulumi's contribution guidelines.

    Help us merge your changes more quickly by adding more details such
    as labels, milestones, and reviewers.-->

### Proposed changes

<!--Give us a brief description of what you've done and what it solves.
-->
This PR seeks to address this issue ([k8s: "Forced
rollback"](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#forced-rollback))
that occurs when the workspace pod is in a crashloop:
> When using [Rolling
Updates](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#rolling-updates)
with the default [Pod Management
Policy](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies)
(OrderedReady), it's possible to get into a broken state that requires
manual intervention to repair.

The `parallel` policy seems to enable the statefulset controller to
forcibly remove a pod when a new revision is available. The controller
seems to obey the termination grace period as is important, and I can't
think of any other negatives. But there's a concern in the k8s community
about this approach: kubernetes/website#47085

Note that a workspace consists of one replica, and is rather like a
singleton with good behavior w.r.t. Pulumi state locking and compatible
with persistent volumes.

### Related issues (optional)

<!--Refer to related PRs or issues: #1234, or 'Fixes #1234' or 'Closes
#1234'.
Or link to full URLs to issues or pull requests in other GitHub
repositories. -->

Closes #801
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
Status: Needs Triage
Status: Needs Triage
Development

No branches or pull requests

7 participants