roachtest: tpccbench: handle overload vm crash in last search iter #64205

tbg · 2021-04-26T12:46:35Z

tpccbench is set up to "handle" (ignore) crashes during its line
search on the assumption that these are due to pushing CRDB into
overload territory, which at the time of writing it does not handle
gracefully.
There was a special case in which this was broken, namely that of
the line search terminating in a final step with a crash. In that
case, the cluster would be left running with one node down, which
roachtest checks and emits as an error.

Unconditionally restart the cluster after the line search (assuming
it found a passing warehouse count, i.e. didn't error out itself)
to make roachtest happy.

cc @cockroachdb/kv

Closes #64187.

Release note: None

`tpccbench` is set up to "handle" (ignore) crashes during its line search on the assumption that these are due to pushing CRDB into overload territory, which at the time of writing it does not handle gracefully. There was a special case in which this was broken, namely that of the line search terminating in a final step with a crash. In that case, the cluster would be left running with one node down, which roachtest checks and emits as an error. Unconditionally restart the cluster after the line search (assuming it found a passing warehouse count, i.e. didn't error out itself) to make roachtest happy. Closes cockroachdb#64187. Release note: None

cockroach-teamcity · 2021-04-26T12:46:42Z

This change is

erikgrinaker

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

pkg/cmd/roachtest/tpcc.go, line 844 at r1 (raw file):

		restart()

		time.Sleep(restartWait)

Will we need to sleep after the final restart() call? If so, maybe consider moving this into restart().

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker)

pkg/cmd/roachtest/tpcc.go, line 844 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Will we need to sleep after the final restart() call? If so, maybe consider moving this into restart().

No, there shouldn't be any reason to - we usually don't randomly sleep after starting the cluster. I guess there also shouldn't be any reason for this to be necessary here, but I don't want to ruffle any feathers :-)

erikgrinaker

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

tbg · 2021-04-26T14:07:40Z

bors r=erikgrinaker

craig · 2021-04-26T15:03:16Z

Build failed (retrying...):

GitHub CI (Cockroach)

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/cmd/roachtest/tpcc.go, line 812 at r1 (raw file):

			}
			shortCtx, cancel := context.WithTimeout(ctx, 2*time.Minute)
			if err := c.StopE(shortCtx, roachNodes); err != nil {

Tobi, what's the point of this Stop call after the previous c.Reset()? What's there to stop? Are we simply using it as a sanity check that we can talk to the machines?

tbg · 2021-04-26T19:14:55Z

@andreimatei yes pretty much. I hope all of this Reset business can go away when we've done something about #64177.

bors r=erikgrinaker

tbg requested a review from erikgrinaker April 26, 2021 12:46

erikgrinaker approved these changes Apr 26, 2021

View reviewed changes

tbg requested a review from erikgrinaker April 26, 2021 12:54

tbg commented Apr 26, 2021

View reviewed changes

erikgrinaker approved these changes Apr 26, 2021

View reviewed changes

andreimatei reviewed Apr 26, 2021

View reviewed changes

craig bot merged commit d72b855 into cockroachdb:master Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: tpccbench: handle overload vm crash in last search iter #64205

roachtest: tpccbench: handle overload vm crash in last search iter #64205

tbg commented Apr 26, 2021

cockroach-teamcity commented Apr 26, 2021

erikgrinaker left a comment

tbg left a comment

erikgrinaker left a comment

tbg commented Apr 26, 2021

craig bot commented Apr 26, 2021

andreimatei left a comment

tbg commented Apr 26, 2021

roachtest: tpccbench: handle overload vm crash in last search iter #64205

roachtest: tpccbench: handle overload vm crash in last search iter #64205

Conversation

tbg commented Apr 26, 2021

cockroach-teamcity commented Apr 26, 2021

erikgrinaker left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

tbg commented Apr 26, 2021

craig bot commented Apr 26, 2021

andreimatei left a comment

Choose a reason for hiding this comment

tbg commented Apr 26, 2021