Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: drain-and-decommission/nodes=9 failed #82396

Closed
cockroach-teamcity opened this issue Jun 3, 2022 · 2 comments
Closed

roachtest: drain-and-decommission/nodes=9 failed #82396

cockroach-teamcity opened this issue Jun 3, 2022 · 2 comments
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team X-stale

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 3, 2022

roachtest.drain-and-decommission/nodes=9 failed with artifacts on master @ 2181204e9c7ac6b316573073b6b8010f43920f8b:

		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... 
		  | ERROR: drain timeout, consider adjusting --drain-wait, especially under custom server.shutdown.{drain,query,connection,lease_transfer}_wait cluster settings
		  | Failed running "node drain"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 8. Command with error:
		  | ``````
		  | ./cockroach node drain --insecure
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError
Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-16350

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 3, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jun 3, 2022
@lidorcarmel
Copy link
Contributor

@AlexTalks do you want to take a look?

test.log says:

08:28:55 decommission.go:187: test status: draining node 9
08:28:55 decommission.go:187: test status: draining node 7
08:28:55 decommission.go:187: test status: draining node 8
08:29:25 decommission.go:201: test status: decommissioning node 6
08:38:56 test_impl.go:318: test failure: 	decommission.go:207,decommission.go:66,test_runner.go:884: output in run_082855.012944794_n8_cockroach_node_drain: ./cockroach node drain --insecure returned: COMMAND_PROBLEM: exit status 1
		(1) attached stack trace
		  -- stack trace:
		  | main.(*clusterImpl).RunE
		  | 	main/pkg/cmd/roachtest/cluster.go:1948
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runDrainAndDecommission.func3.1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/decommission.go:188
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runDrainAndDecommission.func3
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/decommission.go:190
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_082855.012944794_n8_cockroach_node_drain
		Wraps: (3) ./cockroach node drain --insecure returned
		  | stderr:
		  | warning: draining a node without node ID or passing --self explicitly is deprecated.
		  | node is draining... remaining: 248
		  | node is draining... remaining: 96
		  | node is draining... remaining: 96
		  | node is draining... remaining: 96
		  | node is draining... remaining: 96
		  | node is draining... remaining: 96
		  | node is draining... remaining: 85
		  | node is draining... remaining: 84
		  | node is draining... remaining: 84
		  | node is draining... remaining: 91
		  | node is draining... remaining: 82
		  | node is draining... remaining: 72
		  | node is draining... remaining: 72
		  | node is draining... remaining: 72
		  | node is draining... remaining: 61
		  | node is draining... remaining: 60
		  | node is draining... remaining: 65
		  | node is draining... remaining: 60
		  | node is draining... remaining: 48
		  | node is draining... remaining: 48
		  | node is draining... remaining: 48
		  | node is draining... remaining: 46
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 36
		  | node is draining... remaining: 25
		  | node is draining... remaining: 24
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... W220603 08:32:59.208536 41 2@rpc/pkg/rpc/clock_offset.go:216  [rnode=63,raddr=localhost:26257,class=system,heartbeat] 1  latency jump (prev avg 0.54ms, current 3.51ms)
		  | remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 12
		  | node is draining... remaining: 13
		  | node is draining... 
		  | ERROR: drain timeout, consider adjusting --drain-wait, especially under custom server.shutdown.{drain,query,connection,lease_transfer}_wait cluster settings
		  | Failed running "node drain"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 8. Command with error:
		  | ```
		  | ./cockroach node drain --insecure
		  | ```
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError
08:38:56 test_runner.go:895: tearing down after failure; see teardown.log

and node 8 has logs such as:

I220603 08:38:41.684276 587099 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r756/4:‹/Table/106/1/-2{11004…-09161…}›] 484210  not moving out
I220603 08:38:41.684283 587095 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r104/6:‹/Table/106/1/90{20660…-39088…}›] 484211  not moving out
I220603 08:38:41.684305 587086 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r320/4:‹/Table/106/1/-12{2548…-0705…}›] 484212  not moving out
I220603 08:38:41.684309 587123 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r905/2:‹/Table/106/1/-3{96208…-77780…}›] 484213  not moving out
I220603 08:38:41.684333 587081 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r233/5:‹/Table/106/1/65{14409…-32837…}›] 484214  not moving out
I220603 08:38:41.684335 587101 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r866/1:‹/Table/106/1/-5{50085…-48242…}›] 484215  not moving out
I220603 08:38:41.684285 587080 kv/kvserver/pkg/kv/kvserver/store.go:1529 ⋮ [drain,n8,s8,r105/2:‹/Table/106/1/66{43407…-61836…}›] 484216  attempting to transfer lease repl=(n8,s8):2 seq=5 start=1654244846.300010485,0 epo=1 pro=1654244846.299684213,0 for range r105:‹/Table/106/1/66{43407830741551104-61836146499504128}› [(n9,s9):1, (n8,s8):2, (n7,s7):3, next=4, gen=30, sticky=9223372036.854775807,2147483647]
I220603 08:38:41.684363 587082 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r785/2:‹/Table/106/1/-6{20112…-18269…}›] 484217  not moving out
I220603 08:38:41.684362 587102 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r661/1:‹/Table/106/1/79{33389…-51818…}›] 484218  not moving out
I220603 08:38:41.684390 587083 kv/kvserver/pkg/kv/kvserver/store.go:1519 ⋮ [drain,n8,s8,r664/1:‹/Table/106/1/-57{2199…-0356…}›] 484219  not moving out

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team X-stale
Projects
None yet
Development

No branches or pull requests

2 participants