-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[artifacts] Improve cloud deployment error handling #137775
Conversation
08b6a37
to
a5ee12b
Compare
Pinging @elastic/kibana-operations (Team:Operations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, but LGTM
.buildkite/pipelines/artifacts.yml
Outdated
agents: | ||
queue: n2-2 | ||
timeout_in_minutes: 30 | ||
if: "build.env('RELEASE_BUILD') == null || build.env('RELEASE_BUILD') == '' || build.env('RELEASE_BUILD') == 'false'" | ||
retry: | ||
automatic: | ||
# Matches buildkite forced agent shutdown (timeout_in_minutes) and ecctl create failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you'll only get a 255
if the job times out AND the job gracefully stops. If the job has to be hard-killed because it's taking too long to exit after the timeout, you'll get a -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, -1
could also be a pre-emption hard-kill too right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, anything where the job or agent shuts down not gracefully. Problem with the GCP instance, OOM kill, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this isn't using spot instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed b774d16
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
* [artifacts] Improve cloud deployment error handling * Update .buildkite/scripts/steps/artifacts/cloud.sh Co-authored-by: Spencer <[email protected]> * update retry codes Co-authored-by: Spencer <[email protected]>
* [artifacts] Improve cloud deployment error handling * Update .buildkite/scripts/steps/artifacts/cloud.sh Co-authored-by: Spencer <[email protected]> * update retry codes Co-authored-by: Spencer <[email protected]> Co-authored-by: Spencer <[email protected]>
This updates Cloud tests to:
I plan on following up with updating slack notifications to distinguish a soft failure. Open to ideas on how to improve the UX further.
Failure: https://buildkite.com/elastic/kibana-artifacts-snapshot/builds/574
Success: https://buildkite.com/elastic/kibana-artifacts-snapshot/builds/575