forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: clean up preemptive snapshot when receiving replica ID as learner #3
Closed
ajwerner
wants to merge
52
commits into
ajwerner/prevent-replica-id-changes-pr
from
ajwerner/deal-with-upgrade-from-preemptive-snapshots
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
4 times, most recently
from
September 19, 2019 15:32
683c35c
to
b45f805
Compare
Release note: Release justification: test only
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 19, 2019 17:46
b45f805
to
6194436
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
3 times, most recently
from
September 20, 2019 01:27
9cd9bee
to
18937a5
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 20, 2019 01:44
6194436
to
ecfb690
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 20, 2019 01:44
18937a5
to
52436cc
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 20, 2019 01:50
ecfb690
to
df0cc78
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 20, 2019 01:51
52436cc
to
b11044b
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
2 times, most recently
from
September 20, 2019 03:29
06c84d5
to
40b4b30
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 20, 2019 14:42
b11044b
to
3c642a7
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 20, 2019 15:24
40b4b30
to
0fb88ee
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
2 times, most recently
from
September 20, 2019 17:20
aacc284
to
65f9e93
Compare
Previously, syntax for the special oid wrapper types (like REGCLASS, REGTYPE, and so on) was not available, despite the backend being able to handle these types just fine. This commit adds syntax for this (e.g. `select t::regclass[]`) to the parser and hooks it up to the type system. Release note (sql change): support the syntax for oid wrapper arrays, like REGCLASS[]. Release justification: low risk, cosmetic compatibility improvement.
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 20, 2019 19:55
65f9e93
to
7fd2f9a
Compare
Release note: None Release justification: low-risk, cosmetic compatibility improvement
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 23, 2019 05:00
7fd2f9a
to
3935353
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 23, 2019 05:04
0fb88ee
to
611a624
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
from
September 23, 2019 11:22
3935353
to
0d8438b
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
2 times, most recently
from
September 23, 2019 13:01
11abe9a
to
1dbee4c
Compare
ajwerner
force-pushed
the
ajwerner/prevent-replica-id-changes-pr
branch
2 times, most recently
from
September 23, 2019 13:25
4f89404
to
95084f0
Compare
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 23, 2019 13:25
1dbee4c
to
63c7882
Compare
40790: sql/sem/tree: run TestEval with vectorize=experimental_on r=jordanlewis,rafiss a=asubiotto Release note: None Release justification: Category 1 non-production code change to increase test coverage of the vectorized execution engine. Close cockroachdb#40635 Co-authored-by: Alfonso Subiotto Marqués <[email protected]>
See the discussion on cockroachdb#41045: running a 9-node demo cluster under the race detector results in large CPU and ram usage that exceeds the CI budget. Since the various moving pieces in the code are race-tested in other places, we are solely interested here to unit test functionality, and it's not necessary to run in testrace. Release justification: removes test flakiness Release note: None
41044: cli: avoid running Example_demo_locality in the race detector r=knz a=tbg Fixes cockroachdb#41045. See the discussion on cockroachdb#41045: running a 9-node demo cluster under the race detector results in large CPU and ram usage that exceeds the CI budget. Since the various moving pieces in the code are race-tested in other places, we are solely interested here to unit test functionality, and it's not necessary to run in `testrace`. Release justification: removes test flakiness Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Fixes cockroachdb#40846. Release justification: Low risk bug fix. Release note (bug fix): Fix panic in cockroach workload movr run.
This commit cleans up a few things: - removes Int8 and Float32 methods from Vec interface (these types were recently removed) - removes special casing for CFetcher when catching a vectorized panic (CFetcher has been recently moved into colexec package) - adds leaktest to all tests in colexec package. Release justification: Category 1: Non-production code changes. Release note: None
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 25, 2019 15:01
c1e450a
to
c4767ba
Compare
40949: sql: remove last 2 blockers for pgcli support r=jordanlewis a=jordanlewis ``` sql: add syntax for arrays of oid wrapper types Previously, syntax for the special oid wrapper types (like REGCLASS, REGTYPE, and so on) was not available, despite the backend being able to handle these types just fine. This commit adds syntax for this (e.g. `select t::regclass[]`) to the parser and hooks it up to the type system. ``` ``` tree: add support for REGTYPE casts of the trigger type ``` Release note (sql change): support the syntax for oid wrapper arrays, like REGCLASS[], and REGTYPE casts for the unsupported `trigger` type. Release justification: low risk, cosmetic compatibility improvement. Co-authored-by: Jordan Lewis <[email protected]>
Found by ajwerner's `pkg/storage` stressing. `TestReplicaGCRace` was occasionally failing with the crash: ``` F190924 21:08:14.747519 710 vendor/go.etcd.io/etcd/raft/log.go:203 [s3,r1/3:{-}] tocommit(20) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? goroutine 710 [running]: github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000432301, 0xc000432300, 0x0, 0xc00030ae48) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1 github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x63cdc60, 0xc000000004, 0x5dedf9a, 0x22, 0xcb, 0xc00135e1c0, 0x6a) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x93a github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x40609e0, 0xc000d05b60, 0x4, 0x3, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x40609e0, 0xc000d05b60, 0x2, 0xc000000004, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c github.com/cockroachdb/cockroach/pkg/util/log.FatalfDepth(...) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:199 github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc000429f60, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:102 +0xf1 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raftLog).commitTo(0xc0005fb7a0, 0x14) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/log.go:203 +0x131 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raft).handleHeartbeat(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:1396 +0x54 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.stepFollower(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:1341 +0x480 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raft).Step(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:984 +0x1113 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*RawNode).Step(0xc000d64ac0, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/rawnode.go:114 +0xcc github.com/cockroachdb/cockroach/pkg/storage.(*Replica).stepRaftGroup.func1(0xc000d64ac0, 0x0, 0x0, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:396 +0x108 github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked.func1(...) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1362 github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc000588800, 0x38d2900, 0xc000c15778, 0x0, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1363 +0x79 ``` This was because a call to `tryGetOrCreateReplica` was racing with a snapshot. The call to `tryGetOrCreateReplica` would check the raft tombstone and then stall. The snapshot would be processed and the test would quickly remove the replica, leaving a raft tombstone. The call to `tryGetOrCreateReplica` would then continue without ever looking for the raft tombstone again. It would then find an unexpected HardState and fatal. This can be made much more frequent with the patch: ``` diff --git a/pkg/storage/store.go b/pkg/storage/store.go index 99fa192..0f6e47e 100644 --- a/pkg/storage/store.go +++ b/pkg/storage/store.go @@ -4014,6 +4014,7 @@ func (s *Store) tryGetOrCreateReplica( return nil, false, &roachpb.RaftGroupDeletedError{} } } + time.Sleep(100 * time.Millisecond) // Create a new replica and lock it for raft processing. repl := newReplica(rangeID, s) ``` The fix for this is to check the raft tombstone again after inserting into the Store's Range map. This double-checked locking ensures that we make sure to synchronize with any goroutine that wrote a tombstone and then removed an old replica from the Range map. Release justification: Fixes a crash due to a race between setting and retrieving a Range's RaftTombstone. Release note: None
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 25, 2019 15:19
c4767ba
to
41a8f44
Compare
The vectorized engine currently has support for AndExpr, OrExpr, CaseExpr, Like and NotLike, InExpr, and all of these things are now added to sqlsmith when vectorizable is true. Release justification: Category 1: Non-production code changes. Release note: None
40999: colexec: miscellaneous cleanups r=jordanlewis a=yuzefovich This commit cleans up a few things: - removes Int8 and Float32 methods from Vec interface (these types were recently removed) - removes special casing for CFetcher when catching a vectorized panic (CFetcher has been recently moved into colexec package) - adds leaktest to all tests in colexec package. Release justification: Category 1: Non-production code changes. Release note: None 41072: movr: Initialize movr's faker if it hasn't been already in workload r=danhhz a=rohany Fixes cockroachdb#40846. Release justification: Low risk bug fix. Release note (bug fix): Fix panic in cockroach workload movr run. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rohan Yadav <[email protected]>
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 25, 2019 15:42
41a8f44
to
31b9e11
Compare
41075: sqlsmith: tell sqlsmith about more things that vectorized supports r=mjibson a=yuzefovich The vectorized engine currently has support for AndExpr, OrExpr, CaseExpr, Like and NotLike, InExpr, and all of these things are now added to sqlsmith when vectorizable is true. Release justification: Category 1: Non-production code changes. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
Closes cockroachdb#41016. Closes cockroachdb#40864. Closes cockroachdb#40578. In all of the referenced issues, we were seeing uploading cockroach binaries fail (which should be idempotent). We could see this in the log: ``` 05:11:12 test.go:182: test status: uploading binary 05:11:12 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod put teamcity-1569301790-07-n4cpu4 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach teamcity-1569301790-07-n4cpu4: putting (dist) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach ................ 1: done 2: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 [email protected]:./cockroach Warning: Permanently added '35.222.255.152' (ECDSA) to the list of known hosts. packet_write_wait: Connection to 35.222.255.152 port 22: Broken pipe lost connection : exit status 1 3: done 4: done I190924 05:11:29.022222 1 cluster_synced.go:1088 put /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 failed ``` The test would then ignore the failure and proceed to get tripped up when starting cockroach: ``` 05:11:34 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --racks=1 [email protected] teamcity-1569301790-07-n4cpu4:2 teamcity-1569301790-07-n4cpu4: starting 0: exit status 255 ~ ./cockroach version github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:88 github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:149 github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1535 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337: ``` The problem was that we were ignoring the Put error accidentally, so the tests got very confused. This commit fixes this by properly handling the Put error. This doesn't actually fix the referenced issues entirely, but it gets us a step closer to doing so, so I'm going to use it as an opportunity to close them. Release justification: Testing only. Release note: None
41050: storage: check for RaftTombstone again once synchronized with Store r=nvanbenschoten a=nvanbenschoten Found by @ajwerner's `pkg/storage` stressing. `TestReplicaGCRace` was occasionally failing with the crash: ``` F190924 21:08:14.747519 710 vendor/go.etcd.io/etcd/raft/log.go:203 [s3,r1/3:{-}] tocommit(20) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? goroutine 710 [running]: github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000432301, 0xc000432300, 0x0, 0xc00030ae48) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1 github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x63cdc60, 0xc000000004, 0x5dedf9a, 0x22, 0xcb, 0xc00135e1c0, 0x6a) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x93a github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x40609e0, 0xc000d05b60, 0x4, 0x3, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x40609e0, 0xc000d05b60, 0x2, 0xc000000004, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c github.com/cockroachdb/cockroach/pkg/util/log.FatalfDepth(...) /go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:199 github.com/cockroachdb/cockroach/pkg/storage.(*raftLogger).Panicf(0xc000429f60, 0x382af56, 0x5d, 0xc000c81140, 0x2, 0x2) /go/src/github.com/cockroachdb/cockroach/pkg/storage/raft.go:102 +0xf1 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raftLog).commitTo(0xc0005fb7a0, 0x14) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/log.go:203 +0x131 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raft).handleHeartbeat(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:1396 +0x54 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.stepFollower(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:1341 +0x480 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*raft).Step(0xc0013cb680, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/raft.go:984 +0x1113 github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.(*RawNode).Step(0xc000d64ac0, 0x8, 0x3, 0x2, 0x6, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/rawnode.go:114 +0xcc github.com/cockroachdb/cockroach/pkg/storage.(*Replica).stepRaftGroup.func1(0xc000d64ac0, 0x0, 0x0, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:396 +0x108 github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked.func1(...) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1362 github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked(0xc000588800, 0x38d2900, 0xc000c15778, 0x0, 0x0) /go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raft.go:1363 +0x79 ``` This was because a call to `tryGetOrCreateReplica` was racing with a snapshot. The call to `tryGetOrCreateReplica` would check the raft tombstone and then stall. The snapshot would be processed and the test would quickly remove the replica, leaving a raft tombstone. The call to `tryGetOrCreateReplica` would then continue without ever looking for the raft tombstone again. It would then find an unexpected HardState and fatal. This can be made much more frequent with the patch: ``` diff --git a/pkg/storage/store.go b/pkg/storage/store.go index 99fa192..0f6e47e 100644 --- a/pkg/storage/store.go +++ b/pkg/storage/store.go @@ -4014,6 +4014,7 @@ func (s *Store) tryGetOrCreateReplica( return nil, false, &roachpb.RaftGroupDeletedError{} } } + time.Sleep(100 * time.Millisecond) // Create a new replica and lock it for raft processing. repl := newReplica(rangeID, s) ``` The fix for this is to check the raft tombstone again after inserting into the Store's Range map. This double-checked locking ensures that we make sure to synchronize with any goroutine that wrote a tombstone and then removed an old replica from the Range map. Release justification: Fixes a crash due to a race between setting and retrieving a Range's RaftTombstone. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
41083: roachtest: properly fail when uploading binaries fails r=nvanbenschoten a=nvanbenschoten Closes cockroachdb#41016. Closes cockroachdb#40864. Closes cockroachdb#40578. In all of the referenced issues, we were seeing uploading cockroach binaries fail (which should be idempotent). We could see this in the log: ``` 05:11:12 test.go:182: test status: uploading binary 05:11:12 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod put teamcity-1569301790-07-n4cpu4 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach teamcity-1569301790-07-n4cpu4: putting (dist) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 ./cockroach ................ 1: done 2: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 [email protected]:./cockroach Warning: Permanently added '35.222.255.152' (ECDSA) to the list of known hosts. packet_write_wait: Connection to 35.222.255.152 port 22: Broken pipe lost connection : exit status 1 3: done 4: done I190924 05:11:29.022222 1 cluster_synced.go:1088 put /home/agent/work/.go/src/github.com/cockroachdb/cockroach/cockroach.linux-2.6.32-gnu-amd64 failed ``` The test would then ignore the failure and proceed to get tripped up when starting cockroach: ``` 05:11:34 cluster.go:315: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --racks=1 [email protected] teamcity-1569301790-07-n4cpu4:2 teamcity-1569301790-07-n4cpu4: starting 0: exit status 255 ~ ./cockroach version github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:88 github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:149 github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1 /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1535 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337: ``` The problem was that we were ignoring the Put error accidentally, so the tests got very confused. This commit fixes this by properly handling the Put error. This doesn't actually fix the referenced issues entirely, but it gets us a step closer to doing so, so I'm going to use it as an opportunity to close them. Release justification: Testing only. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
`schemachange/mixed/tpcc` uses `CREATE TABLE AS` in 19.2. This PR will have the test correctly create a similar table in 19.1 without using `CREATE TABLE AS`. Release justification: Fixes a test. Release note: None
41079: roachtest: fix schemachange/mixed/tpcc for 19.1 r=lucy-zhang a=lucy-zhang `schemachange/mixed/tpcc` uses `CREATE TABLE AS` in 19.2. This PR will have the test correctly create a similar table in 19.1 without using `CREATE TABLE AS`. See cockroachdb#40935 (comment). Release justification: Fixes a test. Release note: None Co-authored-by: Lucy Zhang <[email protected]>
``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x61498e6] goroutine 67 [running]: github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0000e9560, 0x7942500, 0xc0005beb10) /Users/lucy/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0x121 panic(0x6ce8ee0, 0x9b12c10) /usr/local/go/src/runtime/panic.go:522 +0x1b5 github.com/cockroachdb/cockroach/pkg/util/quotapool.(*IntAlloc).Freeze(0x0) ``` Release justification: Not production code. Release note: None
41086: roachtest: fix panic when tests fail using an existing cluster r=andreimatei a=ajwerner ``` panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x61498e6] goroutine 67 [running]: github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0000e9560, 0x7942500, 0xc0005beb10) /Users/lucy/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:181 +0x121 panic(0x6ce8ee0, 0x9b12c10) /usr/local/go/src/runtime/panic.go:522 +0x1b5 github.com/cockroachdb/cockroach/pkg/util/quotapool.(*IntAlloc).Freeze(0x0) ``` Release justification: Not production code. Release note: None Co-authored-by: Andrew Werner <[email protected]>
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
4 times, most recently
from
September 25, 2019 21:54
90d8173
to
17511f9
Compare
…arner This commit adds an annotation to raft request messages to indicate that the sender believes the current replica is a learner. If the current replica on the recipient was created as a preemptive snapshot (it's initialized but not in the range) then we should remove that replica immediately. Release Justification: Release blocker, required for backwards compatibility. Release note: None
ajwerner
force-pushed
the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
from
September 25, 2019 22:10
17511f9
to
99a031d
Compare
ajwerner
pushed a commit
that referenced
this pull request
Apr 29, 2022
…db#80762 cockroachdb#80773 79911: opt: refactor and test lookup join key column and expr generation r=mgartner a=mgartner #### opt: simplify fetching outer column in CustomFuncs.findComputedColJoinEquality Previously, `CustomFuncs.findComputedColJoinEquality` used `CustomFuncs.OuterCols` to retrieve the outer columns of computed column expressions. `CustomFuncs.OuterCols` returns the cached outer columns in the expression if it is a `memo.ScalarPropsExpr`, and falls back to calculating the outer columns with `memo.BuildSharedProps` otherwise. Computed column expressions are never `memo.ScalarPropsExpr`s, so we use just use `memo.BuildSharedProps` directly. Release note: None #### opt: make RemapCols a method on Factory instead of CustomFuncs Release note: None #### opt: use partial-index-reduced filters when building lookup expressions This commit makes a minor change to `generateLookupJoinsImpl`. Previously, equality filters were extracted from the original `ON` filters. Now they are extracted from filters that have been reduced by partial index implication. This has no effect on behavior because equality filters that reference columns in two tables cannot exist in partial index predicates, so they will never be eliminated during partial index implication. Release note: None #### opt: moves some lookup join generation logic to lookup join package This commit adds a new `lookupjoin` package. Logic for determining the key columns and lookup expressions for lookup joins has been moved to `lookupJoin.ConstraintBuilder`. The code was moved with as few changes as possible, and the behavior does not change in any way. This move will make it easier to test this code in isolation in the future, and allow for further refactoring. Release note: None #### opt: generalize lookupjoin.ConstraintBuilder API This commit makes the lookupjoin.ConstraintBuilder API more general to make unit testing easier in a future commit. Release note: None #### opt: add data-driven tests for lookupjoin.ConstraintBuilder Release note: None #### opt: add lookupjoin.Constraint struct The `lookupjoin.Constraint` struct has been added to encapsulate multiple data structures that represent a strategy for constraining a lookup join. Release note: None 80511: pkg/cloud/azure: Support specifying Azure environments in storage URLs r=adityamaru a=nlowe-sx The Azure Storage cloud provider learned a new parameter, AZURE_ENVIRONMENT, which specifies which azure environment the storage account in question belongs to. This allows cockroach to backup and restore data to Azure Storage Accounts outside the main Azure Public Cloud. For backwards compatibility, this defaults to "AzurePublicCloud" if AZURE_ENVIRONMENT is not specified. Fixes cockroachdb#47163 ## Verification Evidence I spun up a single node cluster: ``` nlowe@nlowe-z4l:~/projects/github/cockroachdb/cockroach [feat/47163-azure-storage-support-multiple-environments L|✚ 2] [🗓 2022-04-22 08:25:49] $ bazel run //pkg/cmd/cockroach:cockroach -- start-single-node --insecure WARNING: Option 'host_javabase' is deprecated WARNING: Option 'javabase' is deprecated WARNING: Option 'host_java_toolchain' is deprecated WARNING: Option 'java_toolchain' is deprecated INFO: Invocation ID: 11504a98-f767-413a-8994-8f92793c2ecf INFO: Analyzed target //pkg/cmd/cockroach:cockroach (0 packages loaded, 0 targets configured). INFO: Found 1 target... Target //pkg/cmd/cockroach:cockroach up-to-date: _bazel/bin/pkg/cmd/cockroach/cockroach_/cockroach INFO: Elapsed time: 0.358s, Critical Path: 0.00s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action INFO: Build completed successfully, 1 total action * * WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED! * * This mode is intended for non-production testing only. * * In this mode: * - Your cluster is open to any client that can access any of your IP addresses. * - Intruders with access to your machine or network can observe client-server traffic. * - Intruders can log in without password and read or write any data in the cluster. * - Intruders can consume all your server's resources and cause unavailability. * * * INFO: To start a secure server without mandating TLS for clients, * consider --accept-sql-without-tls instead. For other options, see: * * - https://go.crdb.dev/issue-v/53404/dev * - https://www.cockroachlabs.com/docs/dev/secure-a-cluster.html * * * WARNING: neither --listen-addr nor --advertise-addr was specified. * The server will advertise "nlowe-z4l" to other nodes, is this routable? * * Consider using: * - for local-only servers: --listen-addr=localhost * - for multi-node clusters: --advertise-addr=<host/IP addr> * * CockroachDB node starting at 2022-04-22 15:25:55.461315977 +0000 UTC (took 2.1s) build: CCL unknown @ (go1.17.6) webui: http://nlowe-z4l:8080/ sql: postgresql://root@nlowe-z4l:26257/defaultdb?sslmode=disable sql (JDBC): jdbc:postgresql://nlowe-z4l:26257/defaultdb?sslmode=disable&user=root RPC client flags: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach <client cmd> --host=nlowe-z4l:26257 --insecure logs: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/logs temp dir: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/cockroach-temp4100501952 external I/O path: /home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data/extern store[0]: path=/home/nlowe/.cache/bazel/_bazel_nlowe/cf6ed4d0d14c8e474a5c30d572846d8a/execroot/cockroach/bazel-out/k8-fastbuild/bin/pkg/cmd/cockroach/cockroach_/cockroach.runfiles/cockroach/cockroach-data storage engine: pebble clusterID: bb3942d7-f241-4d26-aa4a-1bd0d6556e4d status: initialized new cluster nodeID: 1 ``` I was then able to view the contents of a backup hosted in an azure government storage account: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***&AZURE_ENVIRONMENT=AzureUSGovernmentCloud'] WHERE object_type = 'database'; object_name ------------------------------------------ example_database ... (17 rows) Time: 5.859632889s ``` Omitting the `AZURE_ENVIRONMENT` parameter, we can see cockroach defaults to the public cloud where my storage account does not exist: ``` root@:26257/defaultdb> SELECT DISTINCT object_name FROM [SHOW BACKUP 'azure://container/path/to/backup?AZURE_ACCOUNT_NAME=account&AZURE_ACCOUNT_KEY=***'] WHERE object_type = 'database'; ERROR: reading previous backup layers: unable to list files for specified blob: Get "https://account.blob.core.windows.net/container?comp=list&delimiter=path%2Fto%2Fbackup&restype=container&timeout=61": dial tcp: lookup account.blob.core.windows.net on 8.8.8.8:53: no such host ``` ## Tests Two new tests are added to verify that the storage account URL is correctly built from the provided Azure Environment name, and that the Environment defaults to the Public Cloud if unspecified for backwards compatibility. I verified the existing tests pass against a government storage account after specifying `AZURE_ENVIRONMENT` as `AzureUSGovernmentCloud` in the backup URL query parameters: ``` nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:26] $ export AZURE_ACCOUNT_NAME=account nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:38:42] $ export AZURE_ACCOUNT_KEY=*** nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:25] $ export AZURE_CONTAINER=container nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:39:48] $ export AZURE_ENVIRONMENT=AzureUSGovernmentCloud nlowe@nlowe-mbp:~/projects/github/cockroachdb/cockroachdb [feat/47163-azure-storage-support-multiple-environments| …3] [🗓 2022-04-22 17:40:15] $ bazel test --test_output=streamed --test_arg=-test.v --action_env=AZURE_ACCOUNT_NAME --action_env=AZURE_ACCOUNT_KEY --action_env=AZURE_CONTAINER --action_env=AZURE_ENVIRONMENT //pkg/cloud/azure:azure_test INFO: Invocation ID: aa88a942-f3c7-4df6-bade-8f5f0e18041f WARNING: Streamed test output requested. All tests will be run locally, without sharding, one at a time INFO: Build option --action_env has changed, discarding analysis cache. INFO: Analyzed target //pkg/cloud/azure:azure_test (468 packages loaded, 16382 targets configured). INFO: Found 1 test target... initialized metamorphic constant "span-reuse-rate" with value 28 === RUN TestAzure === RUN TestAzure/simple_round_trip === RUN TestAzure/exceeds-4mb-chunk === RUN TestAzure/exceeds-4mb-chunk/rand-readats === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#00 cloud_test_helpers.go:226: read 3345 of file at 4778744 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#1 cloud_test_helpers.go:226: read 7228 of file at 226589 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#2 cloud_test_helpers.go:226: read 634 of file at 256284 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#3 cloud_test_helpers.go:226: read 7546 of file at 3546208 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#4 cloud_test_helpers.go:226: read 24123 of file at 4821795 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#5 cloud_test_helpers.go:226: read 16899 of file at 403428 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#6 cloud_test_helpers.go:226: read 29467 of file at 4886370 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#7 cloud_test_helpers.go:226: read 11700 of file at 1876920 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/#8 cloud_test_helpers.go:226: read 2928 of file at 489781 === RUN TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 cloud_test_helpers.go:226: read 19933 of file at 1483342 === RUN TestAzure/read-single-file-by-uri === RUN TestAzure/write-single-file-by-uri === RUN TestAzure/file-does-not-exist === RUN TestAzure/List === RUN TestAzure/List/root === RUN TestAzure/List/file-slash-numbers-slash === RUN TestAzure/List/root-slash === RUN TestAzure/List/file === RUN TestAzure/List/file-slash === RUN TestAzure/List/slash-f === RUN TestAzure/List/nothing === RUN TestAzure/List/delim-slash-file-slash === RUN TestAzure/List/delim-data --- PASS: TestAzure (34.81s) --- PASS: TestAzure/simple_round_trip (9.66s) --- PASS: TestAzure/exceeds-4mb-chunk (16.45s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats (6.41s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#00 (0.15s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#1 (0.64s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#2 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#3 (0.60s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#4 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#5 (0.80s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#6 (0.75s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#7 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/#8 (0.65s) --- PASS: TestAzure/exceeds-4mb-chunk/rand-readats/cockroachdb#9 (0.77s) --- PASS: TestAzure/read-single-file-by-uri (0.60s) --- PASS: TestAzure/write-single-file-by-uri (0.60s) --- PASS: TestAzure/file-does-not-exist (1.05s) --- PASS: TestAzure/List (2.40s) --- PASS: TestAzure/List/root (0.30s) --- PASS: TestAzure/List/file-slash-numbers-slash (0.30s) --- PASS: TestAzure/List/root-slash (0.30s) --- PASS: TestAzure/List/file (0.30s) --- PASS: TestAzure/List/file-slash (0.30s) --- PASS: TestAzure/List/slash-f (0.30s) --- PASS: TestAzure/List/nothing (0.15s) --- PASS: TestAzure/List/delim-slash-file-slash (0.15s) --- PASS: TestAzure/List/delim-data (0.30s) === RUN TestAntagonisticAzureRead --- PASS: TestAntagonisticAzureRead (103.90s) === RUN TestParseAzureURL === RUN TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset === RUN TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT --- PASS: TestParseAzureURL (0.00s) --- PASS: TestParseAzureURL/Defaults_to_Public_Cloud_when_AZURE_ENVIRONEMNT_unset (0.00s) --- PASS: TestParseAzureURL/Can_Override_AZURE_ENVIRONMENT (0.00s) === RUN TestMakeAzureStorageURLFromEnvironment === RUN TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud === RUN TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud --- PASS: TestMakeAzureStorageURLFromEnvironment (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzurePublicCloud (0.00s) --- PASS: TestMakeAzureStorageURLFromEnvironment/AzureUSGovernmentCloud (0.00s) PASS Target //pkg/cloud/azure:azure_test up-to-date: _bazel/bin/pkg/cloud/azure/azure_test_/azure_test INFO: Elapsed time: 159.865s, Critical Path: 152.35s INFO: 66 processes: 2 internal, 64 darwin-sandbox. INFO: Build completed successfully, 66 total actions //pkg/cloud/azure:azure_test PASSED in 139.9s INFO: Build completed successfully, 66 total actions ``` 80705: kvclient: fix gRPC stream leak in rangefeed client r=tbg,srosenberg a=erikgrinaker When the DistSender rangefeed client received a `RangeFeedError` message and propagated a retryable error up the stack, it would fail to close the existing gRPC stream, causing stream/goroutine leaks. Release note (bug fix): Fixed a goroutine leak when internal rangefeed clients received certain kinds of retriable errors. 80762: joberror: add ConnectionReset/ConnectionRefused to retryable err allow list r=miretskiy a=adityamaru Bulk jobs will no longer treat `sysutil.IsErrConnectionReset` and `sysutil.IsErrConnectionRefused` as permanent errors. IMPORT, RESTORE and BACKUP will treat this error as transient and retry. Release note: None 80773: backupccl: break dependency to testcluster r=irfansharif a=irfansharif Noticed we were building testing library packages when building CRDB binaries. $ bazel query "somepath(//pkg/cmd/cockroach-short, //pkg/testutils/testcluster)" //pkg/cmd/cockroach-short:cockroach-short //pkg/cmd/cockroach-short:cockroach-short_lib //pkg/ccl:ccl //pkg/ccl/backupccl:backupccl //pkg/testutils/testcluster:testcluster Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Nathan Lowe <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Preemptive snapshots are long gone. |
ajwerner
deleted the
ajwerner/deal-with-upgrade-from-preemptive-snapshots
branch
March 1, 2024 20:19
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit adds an annotation to raft request messages to
indicate that the sender believes the current replica is a
learner. If the current replica on the recipient was created as a
preemptive snapshot (it's initialized but not in the range) then
we should remove that replica immediately.
Release note: None