Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scrub/all-checks/tpcc-1000 failed #33149

Closed
cockroach-teamcity opened this issue Dec 13, 2018 · 30 comments · Fixed by #34548
Closed

roachtest: scrub/all-checks/tpcc-1000 failed #33149

cockroach-teamcity opened this issue Dec 13, 2018 · 30 comments · Fixed by #34548
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/859214b81838a4ba33048b81497442ce5774baa7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1053297&tab=buildLog

The test failed on master:
	test.go:630,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.232.175.103:26257: connect: connection refused
	test.go:630,cluster.go:1486,tpcc.go:120,scrub.go:71: Goexit() was called

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Dec 13, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 13, 2018
@thoszhang thoszhang self-assigned this Dec 13, 2018
@thoszhang
Copy link
Contributor

The cluster was deadlocked and not making any progress, and I accidentally killed it while debugging. This would have been a real test failure, though. I'm going to try to fix this test in the short term by adding AS OF SYSTEM TIME '-5s' to the tests (I ran this test with that added earlier today and didn't see the deadlock), but I'm also looking into the deadlock itself.

thoszhang pushed a commit to thoszhang/cockroach that referenced this issue Dec 14, 2018
Add `AS OF SYSTEM TIME` to the `SCRUB` roachtest to reduce contention. I ran
this version of the test and didn't get the deadlock that showed up in cockroachdb#33149.

Eventually we'll want to have `AS OF SYSTEM TIME` be present by default in the
`SELECT` query that is run during SCRUB instead of having it be manually added.

Release note: None
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0c87b11cb99ba5c677c95ded55dcba385928474e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1054703&tab=buildLog

The test failed on release-2.1:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.192.64.27:26257: connect: connection refused
	test.go:628,cluster.go:1139,tpcc.go:110,cluster.go:1465,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1054703-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=3h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		079.2 103079.2 newOrder
		 1h35m4s    51320            1.0            0.3     29.4     29.4     29.4     29.4 orderStatus
		 1h35m4s    51320            6.0            1.7  42949.7 103079.2 103079.2 103079.2 payment
		 1h35m4s    51320            0.0            0.3      0.0      0.0      0.0      0.0 stockLevel
		E181214 09:24:15.064069 1 workload/cli/run.go:402  error in payment: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		 1h35m5s    59631            0.0            0.3      0.0      0.0      0.0      0.0 delivery
		 1h35m5s    59631            5.0            2.4    113.2 103079.2 103079.2 103079.2 newOrder
		 1h35m5s    59631            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		 1h35m5s    59631            4.0            1.7  40802.2 103079.2 103079.2 103079.2 payment
		 1h35m5s    59631            0.0            0.3      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7efc92a4dec689efc855ecd382a6f6b6065b98ec

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1055192&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.188.81.195:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f524717e66973da0c11655c860d4b131f82409b9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1056506&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.224.93.74:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/334cce5d61b32b0bb4a300668522c38fb9d6b96d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1056651&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.238.102.138:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e7dc507fa0ecc7dc5ed597ca5c6cdeb48086428c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1057350&tab=buildLog

The test failed on master:
	test.go:628,schemachange.go:329,scrub.go:75,cluster.go:1465,errgroup.go:57: dial tcp 35.238.253.72:26257: connect: connection refused
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: signal: interrupt

thoszhang pushed a commit to thoszhang/cockroach that referenced this issue Dec 17, 2018
Add `AS OF SYSTEM TIME` to the `SCRUB` roachtest to reduce contention. I ran
this version of the test and didn't get the deadlock that showed up in cockroachdb#33149.

Eventually we'll want to have `AS OF SYSTEM TIME` be present by default in the
`SELECT` query that is run during SCRUB instead of having it be manually added.

Release note: None
craig bot pushed a commit that referenced this issue Dec 17, 2018
33152: roachtest: add AS OF SYSTEM TIME to SCRUB test r=lucy-zhang a=lucy-zhang

Add `AS OF SYSTEM TIME` to the `SCRUB` roachtest to reduce contention. I ran
this version of the test and didn't get the deadlock that showed up in #33149.

Eventually we'll want to have `AS OF SYSTEM TIME` be present by default in the
`SELECT` query that is run during SCRUB instead of having it be manually added.

Release note: None

Co-authored-by: Lucy Zhang <[email protected]>
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8367c883c5db0f4b5aea949530e41a068f25530d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1058060&tab=buildLog

The test failed on master:
	test.go:628,cluster.go:1139,tpcc.go:110,cluster.go:1465,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1058060-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=3h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.128.0.11:53102->10.128.0.2:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	test.go:628,cluster.go:1486,tpcc.go:120,scrub.go:71: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3c80c5e06c4aa5ed25e1cc02b78037f6ec121939

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1072755&tab=buildLog

	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1072755-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		           0.7      0.0      0.0      0.0      0.0 delivery
		    6m7s      798            0.0            3.8      0.0      0.0      0.0      0.0 newOrder
		    6m7s      798            0.0            0.7      0.0      0.0      0.0      0.0 orderStatus
		    6m7s      798            0.0            1.1      0.0      0.0      0.0      0.0 payment
		    6m7s      798            0.0            0.6      0.0      0.0      0.0      0.0 stockLevel
		E181229 14:09:38.875391 1 workload/cli/run.go:402  error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		    6m8s      804            0.0            0.7      0.0      0.0      0.0      0.0 delivery
		    6m8s      804            0.0            3.8      0.0      0.0      0.0      0.0 newOrder
		    6m8s      804            0.0            0.7      0.0      0.0      0.0      0.0 orderStatus
		    6m8s      804            0.0            1.1      0.0      0.0      0.0      0.0 payment
		    6m8s      804            0.0            0.6      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,scrub.go:56: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b024b461265a7ca3cc1d156fef459818d127b065

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1074018&tab=buildLog

	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1074018-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		 1232            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   7m35s     1232            0.0            0.2      0.0      0.0      0.0      0.0 payment
		   7m35s     1232            0.0            0.2      0.0      0.0      0.0      0.0 stockLevel
		E181231 14:46:42.138647 1 workload/cli/run.go:402  error in newOrder: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		   7m36s     1234            0.0            0.2      0.0      0.0      0.0      0.0 delivery
		   7m36s     1234            0.0            0.7      0.0      0.0      0.0      0.0 newOrder
		   7m36s     1234            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   7m36s     1234            0.0            0.2      0.0      0.0      0.0      0.0 payment
		   7m36s     1234            0.0            0.2      0.0      0.0      0.0      0.0 stockLevel
		E181231 14:46:43.376830 1 workload/cli/run.go:402  error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,scrub.go:56: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f3cf71db94327abb6a164ad67383c35a696ec7d8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1077687&tab=buildLog

	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1077687-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     0.0            2.3      0.0      0.0      0.0      0.0 newOrder
		1h42m44s    20419            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		1h42m44s    20419            0.0            1.3      0.0      0.0      0.0      0.0 payment
		1h42m44s    20419            0.0            0.3      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h42m45s    20419            0.0            0.4      0.0      0.0      0.0      0.0 delivery
		1h42m45s    20419            0.0            2.3      0.0      0.0      0.0      0.0 newOrder
		1h42m45s    20419            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		1h42m45s    20419            0.0            1.3      0.0      0.0      0.0      0.0 payment
		1h42m45s    20419            0.0            0.3      0.0      0.0      0.0      0.0 stockLevel
		E190103 16:40:11.552241 1 workload/cli/run.go:402  error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		: signal: killed
	test.go:696,cluster.go:1484,tpcc.go:120,scrub.go:58: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/431b1846249fd2d110706ad221504706014e8b70

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1079927&tab=buildLog

The test failed on release-2.1:
	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1079927-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.128.0.24:58032->10.128.0.33:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	test.go:696,cluster.go:1221,cluster.go:1240,cluster.go:1335,scrub.go:72,cluster.go:1463,errgroup.go:57: context canceled
	test.go:696,cluster.go:1484,tpcc.go:120,scrub.go:58: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8f446e3a82c6d10965fa86a268ec96c94ac093ec

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1083973&tab=buildLog

	test.go:696,cluster.go:1164,tpcc.go:110,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1083973-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		79.2 103079.2 newOrder
		1h50m40s    22995            1.0            0.3     32.5     32.5     32.5     32.5 orderStatus
		1h50m40s    22995            0.0            1.0      0.0      0.0      0.0      0.0 payment
		1h50m40s    22995            2.0            0.3    121.6    167.8    167.8    167.8 stockLevel
		E190108 15:18:53.374079 1 workload/cli/run.go:402  error in newOrder: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h50m41s    22998            0.0            0.3      0.0      0.0      0.0      0.0 delivery
		1h50m41s    22998           13.0            1.9  23622.3 103079.2 103079.2 103079.2 newOrder
		1h50m41s    22998            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		1h50m41s    22998            0.0            1.0      0.0      0.0      0.0      0.0 payment
		1h50m41s    22998            0.0            0.3      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:696,cluster.go:1511,tpcc.go:120,scrub.go:58: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/12e28159b1d8b63b56d6a48f22ebbb5c75e8ee5c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1085451&tab=buildLog

The test failed on provisional_201901081731_v2.2.0-alpha-20190114:
	test.go:696,cluster.go:1164,tpcc.go:122,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1085451-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190109 08:02:36.863665 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 5.146890956s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fe6fbbb99f51f414804daaeb704635ee0ff17b28

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1091924&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:132,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1091924-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190114 15:46:39.411443 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 101.180345ms
		Error: check failed: 3.3.2.1: 7 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6885730c58f9a45511f92be95e94129005d6b875

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1099559&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1099559-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190118 16:12:17.832381 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 5.117299574s
		Error: check failed: 3.3.2.1: 40 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b52b7d1382b454ce1bb43f2187088aef9c557ed5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1101227&tab=buildLog

The test failed on master:
	test.go:727,test.go:739: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod create teamcity-1101227-scrub-all-checks-tpcc-1000 -n 5 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b --local-ssd-no-ext4-barrier returned:
		stderr:
		
		stdout:
		101244-schemachange-indexrollback-tpcc-1000-0005",
		    "serviceAccounts": [
		      {
		        "email": "[email protected]",
		        "scopes": [
		          "https://www.googleapis.com/auth/devstorage.read_only",
		          "https://www.googleapis.com/auth/devstorage.read_write",
		          "https://www.googleapis.com/auth/logging.write",
		          "https://www.googleapis.com/auth/monitoring.write",
		          "https://www.googleapis.com/auth/pubsub",
		          "https://www.googleapis.com/auth/service.management.readonly",
		          "https://www.googleapis.com/auth/servicecontrol",
		          "https://www.googleapis.com/auth/trace.append"
		        ]
		      }
		    ],
		    "startRestricted": false,
		    "status": "RUNNING",
		    "tags": {
		      "fingerprint": "42WmSpB8rSM="
		    },
		    "zone": "https://www.googleapis.com/compute/v1/projects/cockroach-ephemeral/zones/us-central1-b"
		  }
		]
		stderr: ERROR: (gcloud.compute.instances.list) Some requests did not succeed:
		 - Code: '-2770068671450108765'
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/798304879367166c8954825f40c404ba100cea0a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1103521&tab=buildLog

The test failed on master:
	test.go:727,cluster.go:1203,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1103521-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190122 16:17:21.622476 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.015238894s
		Error: check failed: 3.3.2.1: 74 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f37d09cd3cdd32f4d4894611cfd60caf25c10fff

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1105096&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1105096-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190123 16:11:24.153252 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.243434976s
		Error: check failed: 3.3.2.1: 94 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/295b6ae3142518f04a5771c79ce171043697ee1f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1107301&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1107301-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190124 16:37:05.626982 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.330197645s
		Error: check failed: 3.3.2.1: 66 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2952f08ba7260967d7dfd10addbfe80b51d2b8ed

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1109027&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1109027-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190125 16:36:37.784518 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 5.346791125s
		Error: check failed: 3.3.2.1: 51 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dc2fbcdc0dccb8cc676fc67370375bab36b3cff0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1110068&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1110068-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190126 17:17:37.210650 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.570505065s
		Error: check failed: 3.3.2.1: 6 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8cbeb534432b81c57564956ed7d645b854b426be

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1111300&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1111300-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190127 16:35:47.859200 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 282.093428ms
		Error: check failed: 3.3.2.1: 3 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9f084ad576e85756c5c5a7e41335d9aa2d3eee30

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1112101&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1112101-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190128 16:52:39.480560 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.577846245s
		Error: check failed: 3.3.2.1: 2 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e10fb557b11b5ff1b8609aa963da23c37a1143c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1113854&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1113854-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190129 16:16:46.732914 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.662471175s
		Error: check failed: 3.3.2.1: 3 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/395d842feb97c5bd8cad2b32b71a5156c03061eb

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1115923&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:118,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1115923-scrub-all-checks-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		2 103079.2 newOrder
		   4m14s    58207            0.0            8.1      0.0      0.0      0.0      0.0 orderStatus
		   4m14s    58207            7.0           64.7  10200.5 103079.2 103079.2 103079.2 payment
		   4m14s    58207            1.0            8.1   1744.8   1744.8   1744.8   1744.8 stockLevel
		E190130 15:02:26.667367 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.128.0.25:26257: connect: connection refused
		   4m15s   102271            3.0            7.8  12348.0  32212.3  32212.3  32212.3 delivery
		   4m15s   102271           95.8           70.5  12348.0 103079.2 103079.2 103079.2 newOrder
		   4m15s   102271            4.9            8.1    469.8   2080.4   2080.4   2080.4 orderStatus
		   4m15s   102271           29.6           64.5  11811.2 103079.2 103079.2 103079.2 payment
		   4m15s   102271            4.9            8.1   2684.4  10737.4  10737.4  10737.4 stockLevel
		E190130 15:02:27.667487 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.128.0.25:26257: connect: connection refused
		: signal: killed
	test.go:743,cluster.go:1302,cluster.go:1321,cluster.go:1425,scrub.go:72,cluster.go:1564,errgroup.go:57: context canceled
	test.go:743,cluster.go:1585,tpcc.go:128,scrub.go:58: unexpected node event: 3: dead

@nvanbenschoten
Copy link
Member

Previous failure was

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x190242d]

goroutine 719576 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0001c5830, 0x3898dc0, 0xc00f811050)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:183 +0x11f
panic(0x2d24dc0, 0x546ff60)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/cockroachdb/cockroach/pkg/storage.(*truncateDecision).raftSnapshotsForIndex(0xc0060625f0, 0x0, 0xc0054edb00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_log_queue.go:226 +0x3d

Fixed by #34399.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/407932a95f3ad53d61481e5a7493fc4ed468faa9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1117776&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1117776-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190131 16:43:23.108360 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.415188909s
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fc3ea118c87ae1a9d2ed6f4974f2296766607666

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1119860&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1119860-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190201 16:36:56.011932 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 166.995924ms
		Error: check failed: 3.3.2.1: 27 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1b8689c0b4df102e1bf4e271913c4bb096ca8ffe

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1121356&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1121356-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190202 15:53:24.047052 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.678838673s
		Error: check failed: 3.3.2.1: 2 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b9bd958fccddc699d47eccbbec80db75c10eab46

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1122796&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,scrub.go:58: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1122796-scrub-all-checks-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190204 16:53:23.173288 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 3.894402003s
		Error: check failed: 3.3.2.1: 5 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 5, 2019
Fixes cockroachdb#34025.
Fixes cockroachdb#33624.
Fixes cockroachdb#33335.
Fixes cockroachdb#33151.
Fixes cockroachdb#33149.
Fixes cockroachdb#34159.
Fixes cockroachdb#34293.
Fixes cockroachdb#32813.
Fixes cockroachdb#30886.
Fixes cockroachdb#34228.
Fixes cockroachdb#34321.

It is rare but possible for a replica to become a leaseholder but not
learn about this until it applies a snapshot. Immediately upon the
snapshot application's `ReplicaState` update, the replica will begin
operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger
in-memory side-effects to be performed. This could result in a regression
in the new leaseholder's timestamp cache compared to the previous
leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could
presumably result in other anomalies as well, because all of the
steps in `leasePostApply` were skipped.

This PR fixes this bug by detecting lease updates when applying
snapshots and making sure to react correctly to them. It also likely
fixes the referenced issue. The new test demonstrated that without
this fix, the serializable violation speculated about in the issue
was possible.

Release note (bug fix): Fix bug where lease transfers passed through
Snapshots could forget to update in-memory state on the new leaseholder,
allowing write-skew between read-modify-write operations.
craig bot pushed a commit that referenced this issue Feb 5, 2019
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten

Fixes #34025.
Fixes #33624.
Fixes #33335.
Fixes #33151.
Fixes #33149.
Fixes #34159.
Fixes #34293.
Fixes #32813.
Fixes #30886.
Fixes #34228.
Fixes #34321.

It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)).

This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #34548 Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants