Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for three data hall and move something to implemented #2206

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

johscheuer
Copy link
Member

@johscheuer johscheuer added the documentation Improvements or additions to documentation label Jan 31, 2025
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2cc7eac
  • Duration 3:19:37
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this Feb 2, 2025
@johscheuer johscheuer reopened this Feb 2, 2025
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 2cc7eac
  • Duration 2:50:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@nicmorales9 nicmorales9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/manual/fault_domains.md Show resolved Hide resolved
@johscheuer johscheuer merged commit 92d516d into FoundationDB:main Feb 5, 2025
15 checks passed
@johscheuer johscheuer deleted the update-docs branch February 5, 2025 07:05
@simenl
Copy link
Collaborator

simenl commented Feb 7, 2025

I like the simplicity of this approach!

I do have some concerns about running with using topologySpreadConstraints in production, as the foundationdb operator does not control the placement of the pods.

  1. A pod recreation may fail, as the pod could be attached to a disk (PVC) in a zone which has a positive skew. This could happen for several reasons. E.g. a zone being unavailable, or lacking resources.

  2. Replacement of pods can cause an uneven distribution of pods across zones. The new pods are added while old pods are removed, and they work against the same topologySpreadConstraint.

@johscheuer
Copy link
Member Author

I like the simplicity of this approach!

I do have some concerns about running with using topologySpreadConstraints in production, as the foundationdb operator does not control the placement of the pods.

  1. A pod recreation may fail, as the pod could be attached to a disk (PVC) in a zone which has a positive skew. This could happen for several reasons. E.g. a zone being unavailable, or lacking resources.
  2. Replacement of pods can cause an uneven distribution of pods across zones. The new pods are added while old pods are removed, and they work against the same topologySpreadConstraint.

Both points are two valid concerns. We have a plan to improve the "placement" of pods inside the operator with some we called "logical fault domains": https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/docs/design/bin_pack_fault_domains.md. We target to work on this sometime this year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants