-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-21.1: roachprod: improve the way cockroach is run #64641
Merged
tbg
merged 3 commits into
cockroachdb:release-21.1
from
tbg:backport21.1-64177-64436-64560
May 14, 2021
Merged
release-21.1: roachprod: improve the way cockroach is run #64641
tbg
merged 3 commits into
cockroachdb:release-21.1
from
tbg:backport21.1-64177-64436-64560
May 14, 2021
+147
−56
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
While running some stress tests with TPCH, I observed two big problems with roachprod: - the out-of-memory behavior is very bad: instead of the process being killed, the system enters a thrashing mode where everything in the VM slows to a crawl (to the point where just sshing in can take minutes). - when the cockroach process exits, the exit code is not recorded anywhere, making it impossible in some cases to figure out why it stopped. In my particular case, we were exiting with exit code 8 (which is `exit.TimeoutAfterFatalError()`) because writing to the logs was unacceptably slow. This commit attempts to improve things on both these fronts. Instead of running with `--background`, we use `systemd-run` to run cockroach as a service unit. This has several advantages: - we have much better monitoring infrastructure via `systemctl status cockroach` - we can now run code after the exit, allowing us to record it in various logs. - we can set a strict cgroups memory limit (set to `95%`) so that the process gets oom-killed before the system starts to thrash. As part of the commit, we also print out information about the status of cockroach when logging in. Fixes cockroachdb#64176. Release note: None
Upgrading the VM image for AWS roachprod VMs from ubuntu 16.04 to 20.04. This fixes problems recently introduced in the start command. Steps, recorded here in case someone in the future wants to do the same and looks at git history: 1. Found a new image using the AWS web console, under AMIs. 2. Modified the image_name in `vm/aws/terraform/aws-region/main.tf`. 3. Installed terraform 0.11; `inside vm/aws/terraform` I ran: - `terraform init` - `terraform apply` - `terraform output --json > ../config.json` 4. Regenerated `embedded.go` using `make generate PKG=./pkg/cmd/roachprod/...` Release note: None
Among likely many other nightly failures, this: Fixes cockroachdb#64457 Release note: None
@RaduBerinde @nvanbenschoten @andreimatei are we aware of any problems caused by the systemd setup? I would like to merge this next week after spot checking the health of the |
I'm not aware of any new problems. |
Neither am I. |
otan
approved these changes
May 14, 2021
RaduBerinde
approved these changes
May 14, 2021
YOLO it is then. I'll keep an eye out. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To be merged only post-21.1.0.
Closes #64967.
Backport:
Please see individual PRs for details.
/cc @cockroachdb/release