Added experimental script to kill old marathon deployments #249
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I don't know "why" this has started happening, and I don't consider this to be a normal condition, but somehow we get these marathon deployments that are left behind for a long period of time, and they seem to harm bounces.
This seems like a good start at remediation, just killing deployments that have been around for a long time?
Almost all the time what is left behind are the remnants of the "kill and scale" procedures we do, so I "think" this is safe? I'll start by manually running this in a few clusters to kick them, and then maybe automate this.