Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement 'StoppableCheck' Flag to Ensure Maintenance Mode is Avoided After Stoppable Instances are Shutdown #2736
Implement 'StoppableCheck' Flag to Ensure Maintenance Mode is Avoided After Stoppable Instances are Shutdown #2736
Changes from 1 commit
dc32bd4
50cc90a
9a6188f
77cac6e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should skip check when continueOnFailure is false as a perf improvement. Instead of filter it when consolidate result..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say our maxAllowedOffline = 3, and there are 5 instances. The problem is when we do batchGetStoppable for these 5 instances, we would have to process all 5 in parallel. However, if we process them in parallel, then there is no easy way for the individual instance to know the stop of itself will violate the maxAllowedOffline = 3 constraint unless we put a lock on everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to handle this is to process instances by the amount of maxAllowedOffline. Say our maxAllowedOffline = 3, and there are 10 instances in the same zone. In the first iteration, we process instance1-3. If the cumulative stoppable instances count doesn't exceed maxAllowedOffline, we do the next iteration of instances4-6 and so on. But I'm worried about the performance in this design because now our API can only batchProcess
maxAllowedOffline
number of instances in parallel. If the instanceList is super long, our check could take many iteration to be finished.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2cents: first comes correctness and then comes performance optimization. i am not completely plugged into your work, but correctness is very important to focus on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep -1 and have special handling when MaxOfflineInstances <0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a even more reasonable solution is to not allow user do stoppableCheck if they didn't provide
maxOfflineInstancesAllowed
in their cluster config. What do you think?