Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: deflake+improve replicagc-changed-peers test #51394

Merged

Conversation

irfansharif
Copy link
Contributor

Fixes #51097. Fixes #51367.

This is fallout from #50329, this test previously attempted to
recommission a fully decommissioned node. It seems we relied on the
decomm/recomm subsystems to trigger replica gc operations, that that
test was then asserting on. It suffices to simply mark the nodes as
decommissioning instead of fully decommissioning them. While here, I've
re-written this test in the more stateful style of the
decommission-recommission roachtest.

Release note: None

@irfansharif irfansharif requested a review from tbg July 13, 2020 21:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/cmd/roachtest/replicagc.go, line 38 at r1 (raw file):
While you have this paged in, could you add a comment with a synopsis here? Something like a (better version of)

// Checks that when a node has all of its replicas taken away in absentia restarts without being able to talk to any of its old peers, it will still replicaGC its replicas quickly.


pkg/cmd/roachtest/replicagc.go, line 69 at r1 (raw file):

	}

	t.Status("waiting for zero replicas on n1 and n2")

remove "and n2"


pkg/cmd/roachtest/replicagc.go, line 81 at r1 (raw file):

	// attribute. We'll later start n3 using this attribute to test GC replica
	// count.
	h.isolateDeadNodes(ctx, 4)

// run this on n4 (it's live, that's all that matters)

Fixes cockroachdb#51097. Fixes cockroachdb#51367.

This is fallout from cockroachdb#50329, this test previously attempted to
recommission a fully decommissioned node. It seems we relied on the
decomm/recomm subsystems to trigger replica gc operations, that that
test was then asserting on. It suffices to simply mark the nodes as
decommissioning instead of fully decommissioning them. While here, I've
re-written this test in the more stateful style of the
`decommission-recommission` roachtest.

Release note: None
@irfansharif irfansharif force-pushed the 200713.replicagc-roachtest-decomm branch from 207e79c to d1521dd Compare July 14, 2020 14:00
@irfansharif
Copy link
Contributor Author

TFTR! bors r=tbg

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 14, 2020

Build failed

@irfansharif
Copy link
Contributor Author

irfansharif commented Jul 14, 2020

Flaked on #51263.

@irfansharif
Copy link
Contributor Author

bors retry

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build failed

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build failed

@irfansharif
Copy link
Contributor Author

This is getting a bit ridiculous. Flaked on #51331.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build failed

@irfansharif
Copy link
Contributor Author

Flaked on #51263.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 15, 2020

Build succeeded

@craig craig bot merged commit 5eb3908 into cockroachdb:master Jul 15, 2020
@irfansharif irfansharif deleted the 200713.replicagc-roachtest-decomm branch July 15, 2020 13:22
@irfansharif
Copy link
Contributor Author

Only took 15 hours get merged, woot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: replicagc-changed-peers/noRestart failed roachtest: replicagc-changed-peers/withRestart failed
3 participants