Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: improve the way cockroach is run #64177

Merged
merged 1 commit into from
Apr 29, 2021

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Apr 24, 2021

While running some stress tests with TPCH, I observed two big problems
with roachprod:

  • the out-of-memory behavior is very bad: instead of the process
    being killed, the system enters a thrashing mode where everything
    in the VM slows to a crawl (to the point where just sshing in can
    take minutes).
  • when the cockroach process exits, the exit code is not recorded
    anywhere, making it impossible in some cases to figure out why it
    stopped. In my particular case, we were exiting with exit code 8
    (which is exit.TimeoutAfterFatalError()) because writing to the
    logs was unacceptably slow.

This commit attempts to improve things on both these fronts. Instead
of running with --background, we use systemd-run to run cockroach
as a service unit. This has several advantages:

  • we have much better monitoring infrastructure via
    systemctl status cockroach
  • we can now run code after the exit, allowing us to record it in
    various logs.
  • we can set a strict cgroups memory limit (set to 95%) so that the
    process gets oom-killed before the system starts to thrash.

As part of the commit, we also print out information about the status
of cockroach when logging in.

Fixes #64176.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member Author

Hm, it looks like tmux is not available in the VMs in roachtest, even though it was there when I used roachprod create.

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from 56a5c29 to 95b0fda Compare April 24, 2021 17:09
@RaduBerinde
Copy link
Member Author

I think a better way to do this would be to use systemd-run to run as a service, which would allow for better monitoring (including exit code). We can also apply a memory limit so that the process gets killed reliably when OOMing instead of the entire VM thrashing. I will play around with that.

@tbg
Copy link
Member

tbg commented Apr 26, 2021

I will play around with that.

Cool, really appreciate it. I agree that it makes sense to productionize the ad-hoc starting/stopping we have right now. Let me know how I can help.

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from 95b0fda to 97bdb9d Compare April 26, 2021 23:59
@RaduBerinde RaduBerinde changed the title roachprod: record exit code from cockroach process roachprod: improve the way cockroach is run Apr 27, 2021
@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from 97bdb9d to 2fae05a Compare April 27, 2021 00:02
@RaduBerinde
Copy link
Member Author

Updated - we now run with systemd-run and apply a strict 95% memory limit.

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch 3 times, most recently from 4347aa8 to e6297b2 Compare April 27, 2021 04:05
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: neat

Reviewed 2 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @cucaroach, @irfansharif, and @tbg)

cat > ${HOME}/run-cockroach.sh <<'EOF' && \
sudo systemd-run --unit cockroach --uid $uid --gid $gid -p MemoryMax=%[9]s bash ${HOME}/run-cockroach.sh
#!/bin/bash
ulimit -c unlimited
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no set -euo pipefail?

# Output a message visible when checking the status (via sudo systemctl status).
echo "Starting cockroach (output redirected to %[1]s)"
export ROACHPROD=%[3]d%[4]s;
set -x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want -e and +e here? x prints the commands the shell executes but that doesn't seem what you'd want here.

code=$?
set +x
echo "cockroach exited with exit code $code" >> %[1]s/roachprod.log
echo "cockroach exited with exit code $code" >> %[1]s/cockroach.stderr.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider throwing a $(date) in these so that we can figure out the exact time it stopped

uid=$(id -u)
gid=$(id -g)
cat > ${HOME}/run-cockroach.sh <<'EOF' && \
sudo systemd-run --unit cockroach --uid $uid --gid $gid -p MemoryMax=%[9]s bash ${HOME}/run-cockroach.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more useful flags that we may want to add
-r, --remain-after-exit
--working-directory=

I also wonder, don't we need some probe to determine when to return? --background returns when the ports are bound (so that you don't get spurious connection refused when connecting right after c.Start()). What prevents that flake here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--user

    Talk to the service manager of the calling user, rather than the service manager of the system.

Maybe this avoids the need for going through root?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, the memory limit doesn't work with --user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question about the probe.. Are you sure it's not the same currently? AFAICT with --background we fork at the very beginning: https://github.com/cockroachdb/cockroach/blob/master/pkg/cli/start.go#L283

// NB: this is awkward as when the process fails, the test runner will show an
// unhelpful empty error (since everything has been redirected away). This is
// unfortunately equally awkward to address.
if h.c.IsLocal() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't let these diverge as much. Of course we can't ask for systemd to be installed locally, but can we share the run-cockroach.sh script and make the only difference that in local we invoke it with a --background arg directly?

Copy link
Contributor

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @irfansharif, and @RaduBerinde)


pkg/cmd/roachprod/install/cockroach.go, line 427 at r1 (raw file):

	// strict memory limit and provides tools to monitor the process.
	cmd := fmt.Sprintf(`
sudo systemctl reset-failed cockroach

Is sudo guaranteed to work in all envs roachprod supports, is this a new requirement?


pkg/cmd/roachprod/install/cockroach.go, line 454 at r1 (raw file):

gid=$(id -g)
cat > ${HOME}/run-cockroach.sh <<'EOF' && \
sudo systemd-run --unit cockroach --uid $uid --gid $gid -p MemoryMax=%[9]s bash ${HOME}/run-cockroach.sh

I think we'll probably want the ability to play with MemoryMax % and also specify MemoryHigh which appears to be a "soft" limit (https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#MemoryHigh=bytes)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:LGTM: thanks!

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @irfansharif, @RaduBerinde, and @tbg)


pkg/cmd/roachprod/install/cluster_synced.go, line 159 at r1 (raw file):

}

// Stop TODO(peter): document

would you mind taking the opportunity and document the wait?


pkg/cmd/roachprod/install/cluster_synced.go, line 176 at r1 (raw file):

			var resetServiceCmd string
			if !c.IsLocal() {
				// Since we're waiting (the default), clean up the service.

consider saying a few more words about what this reset-failed does.


pkg/cmd/roachprod/install/cockroach.go, line 451 at r1 (raw file):

fi

uid=$(id -u)

please write something about how --uid and --gid relate to --user, and how we can't use --user.


pkg/cmd/roachprod/install/cockroach.go, line 461 at r1 (raw file):

echo ">>> roachprod start: $starttime" >> %[1]s/roachprod.log
ps axeww -o pid -o command >> %[1]s/roachprod.log
[ -x /usr/bin/lslocks ] && /usr/bin/lslocks >> %[1]s/roachprod.log

If you know why this is here, consider putting a comment.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I'll work more on it.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @andreimatei, @cucaroach, @irfansharif, @RaduBerinde, and @tbg)


pkg/cmd/roachprod/install/cockroach.go, line 399 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

We shouldn't let these diverge as much. Of course we can't ask for systemd to be installed locally, but can we share the run-cockroach.sh script and make the only difference that in local we invoke it with a --background arg directly?

I don't want to create a script on your machine.. But I can extract the common stuff in the code here


pkg/cmd/roachprod/install/cockroach.go, line 427 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Is sudo guaranteed to work in all envs roachprod supports, is this a new requirement?

It's a new requirement, but it should be available in all VMs that we set up. We won't use it for "local" clusters.


pkg/cmd/roachprod/install/cockroach.go, line 454 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I think we'll probably want the ability to play with MemoryMax % and also specify MemoryHigh which appears to be a "soft" limit (https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#MemoryHigh=bytes)

We can play with that and add a config if we conclude it's useful. I am hesitant to use the soft limit (which slows down the program). It would be very hard to debug that slowdown, and it could be especially problematic with benchmarks.


pkg/cmd/roachprod/install/cockroach.go, line 461 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

If you know why this is here, consider putting a comment.

No idea, does anyone else?


pkg/cmd/roachprod/install/cockroach.go, line 468 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

do you want -e and +e here? x prints the commands the shell executes but that doesn't seem what you'd want here.

We definitely don't want -e, we want to be able to run the code after it when it fails. I wanted to show the command so you can see it if you do systemctl status cockroach.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @andreimatei, @cucaroach, @irfansharif, @RaduBerinde, and @tbg)


pkg/cmd/roachprod/install/cockroach.go, line 427 at r1 (raw file):

Previously, RaduBerinde wrote…

It's a new requirement, but it should be available in all VMs that we set up. We won't use it for "local" clusters.

We use sudo in SyncedCluster.Wipe and some other SyncedCluster operations.

@RaduBerinde
Copy link
Member Author


pkg/cmd/roachprod/install/cockroach.go, line 460 at r1 (raw file):

starttime=$(date)
echo ">>> roachprod start: $starttime" >> %[1]s/roachprod.log
ps axeww -o pid -o command >> %[1]s/roachprod.log

Do we need these ps axewws in the roachprod log? They are very unwieldy

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from e6297b2 to 9b6f4b0 Compare April 28, 2021 02:11
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner, @andreimatei, @cucaroach, @irfansharif, and @tbg)


pkg/cmd/roachprod/install/cluster_synced.go, line 159 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

would you mind taking the opportunity and document the wait?

Done.


pkg/cmd/roachprod/install/cluster_synced.go, line 176 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider saying a few more words about what this reset-failed does.

Removed.


pkg/cmd/roachprod/install/cockroach.go, line 451 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please write something about how --uid and --gid relate to --user, and how we can't use --user.

Done.


pkg/cmd/roachprod/install/cockroach.go, line 456 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

no set -euo pipefail?

Done.


pkg/cmd/roachprod/install/cockroach.go, line 475 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

consider throwing a $(date) in these so that we can figure out the exact time it stopped

Done.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner, @andreimatei, @cucaroach, @irfansharif, and @RaduBerinde)


pkg/cmd/roachprod/install/cluster_synced.go, line 159 at r1 (raw file):

Previously, RaduBerinde wrote…

Done.

But this takes the signal and wait, so there's no default here. Did you intend to put this comment elsewhere?


pkg/cmd/roachprod/install/cockroach.go, line 399 at r1 (raw file):

Previously, RaduBerinde wrote…

I don't want to create a script on your machine.. But I can extract the common stuff in the code here

Why not? I appreciate that you've introduced reuse here but... this stuff is just so hard to maintain or even read. I thought we would have a single script that simply knows whether it's invoked in local or remote mode. I have this PR checked out locally and it's so hard. I think we lost GOTRACEBACK=crash for the non-local case? We need to clean this up avoid the divergence between the paths or it'll be so hard to maintain and get right.

I spent some time on this - pushed a commit - pretty happy with how it turned out. We are of course pushing the boundaries here of what is reasonable but this is a lot less terrible than what we had before this PR.


pkg/cmd/roachprod/install/cockroach.go, line 454 at r1 (raw file):
--background uses sdnotify.Exec:

return true, sdnotify.Exec(cmd)

In other words, the parent waits until the child calls sdnotify.Ready, which it only does after binding the ports:

cockroach/pkg/cli/start.go

Lines 422 to 425 in 6b1b69a

// ReadyFn will be called when the server has started listening on
// its network sockets, but perhaps before it has done bootstrapping
// and thus before Start() completes.
serverCfg.ReadyFn = func(waitForInit bool) {

But I think this is... then already set up for success? We are running it under systemd now. The only question is, does systemd know that cockroach will signal systemd? It would have to decide to wait for the signal if things were to work out. I think that means using a service type of "Notify"?

https://askubuntu.com/questions/1120023/how-to-use-systemd-notify

If you set up a service using Type=notify, systemd will automatically set up a communication socket back to systemd and will export its path to the service under $NOTIFY_SOCKET.


pkg/cmd/roachprod/install/cockroach.go, line 454 at r1 (raw file):

Previously, RaduBerinde wrote…

We can play with that and add a config if we conclude it's useful. I am hesitant to use the soft limit (which slows down the program). It would be very hard to debug that slowdown, and it could be especially problematic with benchmarks.

It's easy enough to play with them on a branch, I don't think we need to add knobs at this point.


pkg/cmd/roachprod/install/cockroach.go, line 460 at r1 (raw file):

Previously, RaduBerinde wrote…

Do we need these ps axewws in the roachprod log? They are very unwieldy

No, they similarly come from past debugging. Let's remove.


pkg/cmd/roachprod/install/cockroach.go, line 461 at r1 (raw file):

Previously, RaduBerinde wrote…

No idea, does anyone else?

We sometimes run into issues in which cockroach start fails because .... well, we don't know, but one suspicion was that we were holding on to some RocksDB lock. This stuff hasn't been useful, let's remove. (Done, in my commit)


pkg/cmd/roachprod/install/cockroach.go, line 475 at r1 (raw file):

Previously, RaduBerinde wrote…

Done.

That'll be in UTC, right?
Yep, just checked:

ubuntu@tobias-1619593141-08-n5cpu16-0001:~$ date
Wed Apr 28 07:29:57 UTC 2021

@RaduBerinde
Copy link
Member Author


pkg/cmd/roachprod/install/cockroach.go, line 399 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Why not? I appreciate that you've introduced reuse here but... this stuff is just so hard to maintain or even read. I thought we would have a single script that simply knows whether it's invoked in local or remote mode. I have this PR checked out locally and it's so hard. I think we lost GOTRACEBACK=crash for the non-local case? We need to clean this up avoid the divergence between the paths or it'll be so hard to maintain and get right.

I spent some time on this - pushed a commit - pretty happy with how it turned out. We are of course pushing the boundaries here of what is reasonable but this is a lot less terrible than what we had before this PR.

Very nice!!

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from b1f4d48 to 31bf2d9 Compare April 28, 2021 21:28
@RaduBerinde RaduBerinde requested a review from a team as a code owner April 28, 2021 21:28
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg this looks good to me. If this looks good to you, I will squash and merge.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner, @andreimatei, @cucaroach, @irfansharif, @RaduBerinde, and @tbg)


pkg/cmd/roachprod/install/cluster_synced.go, line 159 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

But this takes the signal and wait, so there's no default here. Did you intend to put this comment elsewhere?

Rephrased.


pkg/cmd/roachprod/install/cockroach.go, line 454 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

--background uses sdnotify.Exec:

return true, sdnotify.Exec(cmd)

In other words, the parent waits until the child calls sdnotify.Ready, which it only does after binding the ports:

cockroach/pkg/cli/start.go

Lines 422 to 425 in 6b1b69a

// ReadyFn will be called when the server has started listening on
// its network sockets, but perhaps before it has done bootstrapping
// and thus before Start() completes.
serverCfg.ReadyFn = func(waitForInit bool) {

But I think this is... then already set up for success? We are running it under systemd now. The only question is, does systemd know that cockroach will signal systemd? It would have to decide to wait for the signal if things were to work out. I think that means using a service type of "Notify"?

https://askubuntu.com/questions/1120023/how-to-use-systemd-notify

If you set up a service using Type=notify, systemd will automatically set up a communication socket back to systemd and will export its path to the service under $NOTIFY_SOCKET.

Very cool! Yes, it works. I had to add NotifyAccess=all because the cockroach is a different PID than the one started by systemd (bash). I verified that it waits by adding a time.Sleep before calling sdnotify in the cockroach binary. In the meantime systemctl status cockroach shows it like this, very useful: Active: activating (start) since Wed 2021-04-28 21:30:40 UTC; 8s ago.


pkg/cmd/roachprod/install/cockroach.go, line 391 at r4 (raw file):

# \EOF ensures that no parameter expansion occurs in the heredoc. See:
# https://unix.stackexchange.com/questions/376103/here-document-without-interpreting-escape-sequences-but-with-interpolation
cat > "${helper}" << \EOF && chmod +x "${helper}" && "${helper}" "${verb}"

We are using 'EOF' below which disables everything, made it the same.


pkg/cmd/roachprod/install/cockroach.go, line 407 at r4 (raw file):

Previously, RaduBerinde wrote…

I don't think this is needed (because of the ||)

Removed and it still works (verified that the error code makes it to roachprod.log)

@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch 2 times, most recently from 416e59d to af7df9a Compare April 29, 2021 03:19
While running some stress tests with TPCH, I observed two big problems
with roachprod:
 - the out-of-memory behavior is very bad: instead of the process
   being killed, the system enters a thrashing mode where everything
   in the VM slows to a crawl (to the point where just sshing in can
   take minutes).
 - when the cockroach process exits, the exit code is not recorded
   anywhere, making it impossible in some cases to figure out why it
   stopped. In my particular case, we were exiting with exit code 8
   (which is `exit.TimeoutAfterFatalError()`) because writing to the
   logs was unacceptably slow.

This commit attempts to improve things on both these fronts. Instead
of running with `--background`, we use `systemd-run` to run cockroach
as a service unit. This has several advantages:
 - we have much better monitoring infrastructure via
   `systemctl status cockroach`
 - we can now run code after the exit, allowing us to record it in
   various logs.
 - we can set a strict cgroups memory limit (set to `95%`) so that the
   process gets oom-killed before the system starts to thrash.

As part of the commit, we also print out information about the status
of cockroach when logging in.

Fixes cockroachdb#64176.

Release note: None
@RaduBerinde RaduBerinde force-pushed the roachprod-exit-code branch from af7df9a to b937cc8 Compare April 29, 2021 03:20
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

🤞 for the nightly run, I expect to see a few more OOMs on clusters that were previously just scraping by.

Reviewed 1 of 3 files at r1, 2 of 3 files at r5, 1 of 1 files at r6.
Dismissed @cucaroach, and @cucaroach from 5 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei, @irfansharif, and @RaduBerinde)

@craig
Copy link
Contributor

craig bot commented Apr 29, 2021

Build succeeded:

@craig craig bot merged commit 3834668 into cockroachdb:master Apr 29, 2021
@RaduBerinde
Copy link
Member Author

for the nightly run, I expect to see a few more OOMs on clusters that were previously just scraping by.

Same here. I almost want to start a nightly run with a lower MemoryMax and adjust all the tests so they use less than 80-85%. If we let them scrape by, any small change can drive them over the limit.

@RaduBerinde RaduBerinde deleted the roachprod-exit-code branch May 3, 2021 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachprod: very poor VM behavior when running out of memory
7 participants