Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: follower-reads/mixed-version/single-region failed [should stop after beta1] #70350

Closed
cockroach-teamcity opened this issue Sep 17, 2021 · 2 comments · Fixed by #70432
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

roachtest.follower-reads/mixed-version/single-region failed with artifacts on master @ 1f0dfa35126af18c678226a77e1848ca489a1bb2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/follower-reads/mixed-version/single-region/run_1
	follower_reads.go:244,follower_reads.go:778,follower_reads.go:81,test_runner.go:777: failed to get follower read counts: Get "http://34.75.225.57:26258/_status/vars": dial tcp 34.75.225.57:26258: connect: connection refused

	cluster.go:1253,context.go:89,cluster.go:1241,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3457472-1631859726-44-n3cpu2 --oneshot --ignore-empty-nodes: exit status 1 3: 11882
		2: dead (exit status 7)
		1: 12545
		Error: UNCLASSIFIED_PROBLEM: 2: dead (exit status 7)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:225
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (3) 2: dead (exit status 7)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: roachtest README

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 17, 2021
@nvanbenschoten
Copy link
Member

Node 2 failed with the following fatal:

F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129  on-disk and in-memory state diverged: [UsingAppliedStateKey: true != false]
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !goroutine 1757 [running]:
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x8794801, 0x4a88e7, 0x7a77e6c, 0x7fe743b26500)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb9
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0xc000ec35a0, 0xc00175ba10, 0x24, 0x2, 0x0, 0x0, 0x0, 0x16a593db549ddefd, 0x400000000, 0x0, ...)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:274 +0xbd2
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/log.logfDepth(0x5aab160, 0xc000bce360, 0x1, 0x4, 0x4ceff6f, 0x28, 0xc001bf0030, 0x1, 0x1)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/channels.go:58 +0x198
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log_channels_generated.go:834
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).assertStateRaftMuLockedReplicaMuRLocked(0xc002162a80, 0x5aab160, 0xc000bce360, 0x5b173a0, 0xc000ba56c0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica.go:1230 +0x7c8
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).applySnapshot(0xc002162a80, 0x5aab160, 0xc000bce360, 0x5d4563ab57b62e99, 0xf9acb4b1174bbb89, 0xc001f7c200, 0x0, 0x0, 0x0, 0xc001f57440, ...)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raftstorage.go:1066 +0x11aa
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked(0xc002162a80, 0x5aab160, 0xc000bce360, 0x5d4563ab57b62e99, 0xf9acb4b1174bbb89, 0xc001f7c200, 0x0, 0x0, 0x0, 0xc001f57440, ...)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:573 +0x1df8
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRaftSnapshotRequest.func1(0x5aab160, 0xc000d43830, 0xc002162a80, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:358 +0x329
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).withReplicaForRequest(0xc0010fca00, 0x5aab160, 0xc000d43830, 0xc002395d98, 0xc001bf1388, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:219 +0x128
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processRaftSnapshotRequest(0xc0010fca00, 0x5aab160, 0xc000d43830, 0xc002395d40, 0x5d4563ab57b62e99, 0xf9acb4b1174bbb89, 0xc001f7c200, 0x0, 0x0, 0x0, ...)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:280 +0x15e
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).receiveSnapshot(0xc0010fca00, 0x5aab160, 0xc000d43830, 0xc002395d40, 0x7fe743acd840, 0xc00253ab40, 0x0, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_snapshot.go:810 +0x57d
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot.func1(0x5aab160, 0xc000d43830, 0x4a12151c9, 0xc04937dc89eced0f)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:83 +0x18d
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc000b8a700, 0x5aab160, 0xc000d43830, 0x4cb7016, 0x1e, 0xc0013bdc38, 0x0, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:328 +0xb2
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).HandleSnapshot(0xc0010fca00, 0xc002395d40, 0x7fe743acd810, 0xc00253ab40, 0xc00253ab40, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:73 +0xe5
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftSnapshot.func1.1(0x5b0fa20, 0xc00253ab40, 0xc00143efc0, 0x5aab160, 0xc000d43770, 0xc00003f740, 0x0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:412 +0x13d
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).RaftSnapshot.func1(0x5aab160, 0xc000d43770)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/raft_transport.go:413 +0x5d
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc000b8a700, 0x5aab160, 0xc000d43770, 0x0, 0x0, 0xc000d437d0)
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:446 +0xf3
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
F210917 10:03:30.218426 1757 kv/kvserver/replica.go:1230 ⋮ [n2,s2,r4/5:‹/System{/tsd-tse}›] 129 !	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:438 +0x22b

Given the timing of this and the subject of the divergence, this is very likely fallout from github.com//pull/69887. cc. @irfansharif could you give this a look when you get a chance?

@nvanbenschoten
Copy link
Member

We're also seeing this in #70252. I'm going to give this the same treatment regarding GA-blocker status, just to be safe.

@nvanbenschoten nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 20, 2021
@tbg tbg changed the title roachtest: follower-reads/mixed-version/single-region failed roachtest: follower-reads/mixed-version/single-region failed [should stop after beta1] Sep 20, 2021
@irfansharif irfansharif self-assigned this Sep 20, 2021
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 20, 2021
This reverts commit 6464de2. That PR
broke few of our roachtests since we haven't release the 21.2 beta yet.
For our roachtests that exercised the upgrade path, we were effectively
upgrading from 21.1 to 22.1 code (as of that PR) that asserted on the
completion of the long running migration removing the legacy raft
truncated state -- something that would only happen when going through
21.2. Given that, we temporarily revert cockroachdb#69887 while our beta gets
prepared. cockroachdb#69887 (or rather, the revert of this commit) will be
re-introduced to master once cockroachdb#69826 lands.

Fixes cockroachdb#70244.
Fixes cockroachdb#70252.
Fixes cockroachdb#70253.
Fixes cockroachdb#70283.
Fixes cockroachdb#70350.
Fixes cockroachdb#70390.

Release note: None
craig bot pushed a commit that referenced this issue Sep 20, 2021
70325: vendor: Add dependency on prometheus r=dhartunian a=rimadeodhar

This PR adds an external dependency on prometheus. We need
the promql library in order to enforce validity of promql
expressions which will be contained in upcoming alerting
and aggregation rules. These rule implementations are
upcoming as a part of the new metrics upgrade.

Resolves #69796

Release note: None

70347: pgcode: use XC instead of CDB r=knz a=otan

Release note (sql change): Change the pgerror code XC instead of CD
for CockroachDB specific errors. This is because the "C" class is
reserved for the SQL standard. The pgcode `CDB00` used for
unsatisfiable bounded staleness is now `XCUBS`.

70374: ui: updates the jobs table styling r=maryliag a=maryliag

This commit updates the style of the table on the Jobs page
and adds tooltips to its columns.

Resolves #70149

Before
<img width="924" alt="Screen Shot 2021-09-17 at 3 35 04 PM" src="https://user-images.githubusercontent.com/1017486/133844066-3168bec7-db52-4194-9f97-c7b10628d98e.png">

After
<img width="880" alt="Screen Shot 2021-09-17 at 3 35 52 PM" src="https://user-images.githubusercontent.com/1017486/133844146-86e94611-ca99-4764-a97e-c8ca4e09f269.png">


Release note (ui change): Updating job table style to
match all other tables on the console.

70432: Revert "kv,migration: rm code handling legacy raft truncated state" r=irfansharif a=irfansharif

This reverts commit 6464de2. That PR
broke few of our roachtests since we haven't release the 21.2 beta yet.
For our roachtests that exercised the upgrade path, we were effectively
upgrading from 21.1 to 22.1 code (as of that PR) that asserted on the
completion of the long running migration removing the legacy raft
truncated state -- something that would only happen when going through
21.2. Given that, we temporarily revert #69887 while our beta gets
prepared. #69887 (or rather, the revert of _this_ commit) will be
re-introduced to master once #69826 lands.

Fixes #70244.
Fixes #70252.
Fixes #70253.
Fixes #70283.
Fixes #70350.
Fixes #70390.

Release note: None

70436: spanconfig: fix an erroneous usage of timeutil.Timer r=irfansharif a=irfansharif

The contract for timeutil.Timer indicates that we should only be
setting .Read when reading from the timer channel, not unconditionally
before a call to .Reset().

Release note: None

Co-authored-by: rimadeodhar <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in ef1dd6f Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants