Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rhythm] Add rollout-operator and allow simultaneous update of block-builder pods #4660

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

mapno
Copy link
Member

@mapno mapno commented Feb 5, 2025

What this PR does:

Introduced the rollout-operator so the pods in the block-builder's statefulset can simultaneously update. Otherwise there is the risk of ownership conflicts in partitions during rollouts.

It solves the following scenario:

There are 6 partitions and 3 block-builders.

Partition assignment is as follows:

  • block-builder-0 has assigned partitions 0 and 3.
  • block-builder-1 has assigned partitions 1 and 4.
  • block-builder-2 has assigned partitions 2 and 5.

If a new block-builder is deployed, partition assignment will change to

  • block-builder-0 has assigned partitions 0 and 4.
  • block-builder-1 has assigned partitions 1 and 5.
  • block-builder-2 has assigned partitions 2.
  • block-builder-3 has assigned partitions 3.

As soon as block-builders are rolled out, they'll have the new config, while older replicas will have the old config—ie. at one point, block-builder-3 and block-builder-0 both will own partition 3.

Alternatives

A previous attempt was made in #4642, but in the end discarded. It required Kubernetes >=v1.24 and MaxUnavailableStatefulSet feature gate enabled, which is in alpha and not supported in many environments.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mapno mapno marked this pull request as ready for review February 5, 2025 12:55
'kubernetes.namespace': $._config.namespace,
},

rollout_operator_container::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a rollout-operator deployment in

local rollout_operator_enabled = $._config.multi_zone_ingester_enabled,

Should we merge both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think we have to, to at least avoid conflicting deployments. But since the other rollout operator looks last updated more than 2y ago, and we don't use it ourselves, I would defer to this new one where any differences are found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I had forgotten, thanks!

},

rollout_operator_container::
container.new('rollout-operator', $._images.rollout_operator) +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can bump the existing version wich is outdated:

rollout_operator: 'grafana/rollout-operator:v0.1.1',

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, latest is v0.23.0 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants