Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade testing: make script error handling more robust #25152

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

tgross
Copy link
Member

@tgross tgross commented Feb 18, 2025

We're using set -eo pipefail everywhere in the Enos scripts, several of the scripts used for checking assertions didn't take advantage of pipefail in such a way that we could avoid early exits from transient errors. This meant that if a server was slightly late to come back up, we'd hit an error and exit the whole script instead of polling as expected.

While fixing this, I've made a number of other improvements to the shell scripts:

  • I've changed the design of the polling loops so that we're calling a function that returns an exit code and sets last_error value, along with any global variables required by downstream functions. This makes the loops more readable by reducing the number of global variables, and helped identify some places where we're exiting instead of returning into the loop.
  • Using shellcheck -s bash I fixes some unused variables and undefined variables that we were missing because they were only used on the error paths.

Ref: https://hashicorp.atlassian.net/browse/NET-11546

@tgross tgross added the theme/testing Test related issues label Feb 18, 2025
@tgross tgross added this to the 1.10.0 milestone Feb 18, 2025
@tgross tgross force-pushed the enos-script-error-handling branch from 32f1580 to d0a749f Compare February 19, 2025 19:16
We're using `set -eo pipefail` everywhere in the Enos scripts, several of the
scripts used for checking assertions didn't take advantage of pipefail in such a
way that we could avoid early exits from transient errors. This meant that if a
server was slightly late to come back up, we'd hit an error and exit the whole
script instead of polling as expected.

While fixing this, I've made a number of other improvements to the shell scripts:

* I've changed the design of the polling loops so that we're calling a function
that returns an exit code and sets `last_error` value, along with any global
variables required by downstream functions. This makes the loops more readable
by reducing the number of global variables, and helped identify some places
where we're exiting instead of returning into the loop.

* Using `shellcheck -s bash` I fixes some unused variables and undefined
variables that we were missing because they were only used on the error paths.
@tgross tgross force-pushed the enos-script-error-handling branch from d0a749f to 1891229 Compare February 19, 2025 19:47
@tgross tgross marked this pull request as ready for review February 19, 2025 19:49
@tgross tgross requested review from a team as code owners February 19, 2025 19:49
Copy link
Member

@Juanadelacuesta Juanadelacuesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@tgross tgross merged commit 73cd934 into main Feb 20, 2025
31 checks passed
@tgross tgross deleted the enos-script-error-handling branch February 20, 2025 13:44
tgross added a commit that referenced this pull request Feb 21, 2025
While testing #25172 I found a few spots where #25152 wasn't capturing the
errors from transient failures correctly or exiting early instead of
retrying.

Ref: https://hashicorp.atlassian.net/browse/NET-11546
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/testing Test related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants