Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle etcd status condition when node is not ready and disable etcd #9084

Conversation

vitorsavian
Copy link
Member

Proposed Changes

  • Handle non ready nodes for etcd status condition
  • Handle node when disable etcd is set

Types of Changes

  • Feature

Verification

Create 2 or more servers in cluster mode
Take a snapshot to s3
Terminate all instances and bring up a new fresh instance
Restore s3 snapshot to the fresh instance

Then use

 kubectl get node {NodeName} -o=jsonpath='{range .status.conditions[*]}{"\n"}{"type: "}{.type},{"\n"}{"status: "}{.status},{"\n"}{"lastHeartbeatTime: "}{.lastHeartbeatTime},{"\n"}{"lastTransitionTime: "}{.lastTransitionTime},{"\n"}{"reason: "}{.reason},{"\n"}{"message: "}{.message},{"\n"}{end}'

Testing

Linked Issues

User-Facing Change


Further Comments

@vitorsavian vitorsavian requested a review from a team as a code owner December 21, 2023 00:37
pkg/cluster/cluster.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch 2 times, most recently from 78a6df9 to 4f3d3e6 Compare December 21, 2023 11:28
Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple nits, looking good though!

pkg/etcd/metadata_controller.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@brandond
Copy link
Member

Note that there is also SetNodeCondition that could probably save a fair bit of code here as well.

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple new nits, think we're close though.

pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@brandond
Copy link
Member

Another ask from the issue was

I wonder if it's possible to do periodic checking from a healthy node to determine what is the actual state and update based on that.

Can we also use the etcd member health information to update the status if the node is a member of the cluster, but not healthy? For example, if it is a member of the cluster, but the pod is not running?

@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from 087a9e1 to 201f619 Compare December 26, 2023 12:41
@vitorsavian vitorsavian changed the title Handle etcd status condition when node is not ready and disable etcd [WIP] Handle etcd status condition when node is not ready and disable etcd Dec 26, 2023
@vitorsavian
Copy link
Member Author

Another ask from the issue was

I wonder if it's possible to do periodic checking from a healthy node to determine what is the actual state and update based on that.

Can we also use the etcd member health information to update the status if the node is a member of the cluster, but not healthy? For example, if it is a member of the cluster, but the pod is not running?

I was searching for a method to get the etcd member health the same way etcdctl does, it just tries to get a key and then shows if it is healthy or not, but I'm not sure if it's the best approach in this case. What do you think @brandond ?

pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from 201f619 to c9d991c Compare January 2, 2024 23:35
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from c9d991c to 14ad3b5 Compare January 3, 2024 00:34
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from 14ad3b5 to 70076c1 Compare January 3, 2024 15:39
@vitorsavian vitorsavian changed the title [WIP] Handle etcd status condition when node is not ready and disable etcd Handle etcd status condition when node is not ready and disable etcd Jan 3, 2024
@vitorsavian vitorsavian requested a review from brandond January 3, 2024 15:40
pkg/etcd/etcd.go Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from 70076c1 to 55e21cf Compare January 3, 2024 23:20
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch 3 times, most recently from b189867 to f1d33ce Compare January 3, 2024 23:37
@vitorsavian vitorsavian requested a review from brandond January 4, 2024 18:48
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@dereknola
Copy link
Member

Drone is failing because its trying to run the broken secrets encryption test. You need to rebase off master to get the latest version that doesn't.

@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from f1d33ce to bf01c0b Compare January 5, 2024 05:19
pkg/etcd/etcd.go Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: 74 lines in your changes are missing coverage. Please review.

Comparison is base (eae221f) 41.46% compared to head (1f2aacb) 22.27%.
Report is 1 commits behind head on master.

Files Patch % Lines
pkg/etcd/etcd.go 0.00% 71 Missing ⚠️
pkg/etcd/metadata_controller.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #9084       +/-   ##
===========================================
- Coverage   41.46%   22.27%   -19.19%     
===========================================
  Files         151      148        -3     
  Lines       16048    16030       -18     
===========================================
- Hits         6654     3571     -3083     
- Misses       8259    11720     +3461     
+ Partials     1135      739      -396     
Flag Coverage Δ
inttests 22.27% <0.00%> (-16.41%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from bf01c0b to a94d7da Compare January 5, 2024 16:32
Signed-off-by: Vitor Savian <[email protected]>

Set condition if node is unhealthy

Signed-off-by: Vitor Savian <[email protected]>
@vitorsavian vitorsavian force-pushed the etcd-status-condition-reset-disable-etcd branch from a94d7da to 1f2aacb Compare January 5, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants