Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm] Ingester rollout-group collides with Mimir #13168

Closed
lindeskar opened this issue Jun 7, 2024 · 6 comments · Fixed by #15063
Closed

[Helm] Ingester rollout-group collides with Mimir #13168

lindeskar opened this issue Jun 7, 2024 · 6 comments · Fixed by #15063
Labels
area/helm type/bug Somehing is not working as expected

Comments

@lindeskar
Copy link
Contributor

lindeskar commented Jun 7, 2024

Describe the bug
The loki Helm chart with deploymentMode: Distributed and zoneAwareReplication enabled (default) generates Ingester StatefulSets with labels for rollout-operator. Ex.:

name: ingester-zone-a
rollout-group: ingester

The mimir-distributed Helm chart generates StatefulSets with the same label values. Ex. mimir-ingester-zone-a-0:

name: ingester-zone-a
rollout-group: ingester

Deploying the two charts to the same Namespace means rollout-operator will select both Mimir and Loki StatefulSets and get confused about the rollout status. For me one of the Mimir Ingester Pods is constantly being recreated.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy mimir-distributed chart with default values
  2. Deploy loki chart with distributed-values.yaml

Expected behavior
rollout-operator handles Mimir and Loki Ingesters as separate rollout-groups.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm
  • Chart version: 6.6.2

Screenshots, Promtail config, or terminal output
From rollout-operator Pod:

level=info ts=2024-06-07T08:17:49.880277442Z msg="StatefulSet status is reporting all pods ready, but the rollout operator has found some not-Ready pods" statefulset=loki-ingester-zone-a not_ready_pods=mimir-ingester-zone-a-0
@lindeskar
Copy link
Contributor Author

lindeskar commented Jun 17, 2024

My suggested fix; use loki.ingesterFullname in the rollout-group label: #13170

@Sadzeih
Copy link

Sadzeih commented Jul 10, 2024

I'm having the same issue, would it be possible for a maintainer to review the PR @lindeskar opened?

@nvmforero
Copy link

nvmforero commented Aug 28, 2024

I am also facing this issue. I don't see a workaround since ingester anti-affinity rules are ignored with zoneAwareReplication enabled. Disabling rollout_operator also does not remove the rollout-group: ingester labels from Loki ingester pods.

Edit: Workaround was to use kustomize and change the rollout-group label to loki-ingester for all relevant loki resources.

@cydergoth
Copy link

This just cost us a few days and a lot of panic

@slim-bean
Copy link
Collaborator

We are discussing this more internally, but generally we don't recommend running different databases in the same namespace. Mainly because we don't do it this way so we don't know where else problems like this might occur (service name collisions? memberlist joining between clusters?!)

@cydergoth
Copy link

cydergoth commented Nov 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm type/bug Somehing is not working as expected
Projects
None yet
6 participants