-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: panic in metrics #63218
Comments
I hit this again today doing a live demo demo@127.0.0.1:26257/movr> *
* ERROR: [n7] a panic has occurred!
* runtime error: invalid memory address or nil pointer dereference
* (1) attached stack trace
* -- stack trace:
* | runtime.gopanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969
* | runtime.panicmem
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:212
* | runtime.sigpanic
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/signal_unix.go:742
* | go.etcd.io/etcd/raft/v3.(*raft).hardState
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/raft.go:374
* | go.etcd.io/etcd/raft/v3.getBasicStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/status.go:61
* | go.etcd.io/etcd/raft/v3.getStatus
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/status.go:70
* | go.etcd.io/etcd/raft/v3.(*RawNode).Status
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/rawnode.go:184
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).raftStatusRLocked
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica.go:1109
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Metrics
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_metrics.go:56
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2543
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*storeReplicaVisitor).Visit
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:376
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2542
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetrics
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2642
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).computePeriodicMetrics.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:676
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:148
* | github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:352
* | github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:147
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).computePeriodicMetrics
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:675
* | github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:662
* | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1
* | /Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351
* | runtime.goexit
* | /usr/local/Cellar/go/1.15.4/libexec/src/runtime/asm_amd64.s:1374
* Wraps: (2) runtime error: invalid memory address or nil pointer dereference
* Error types: (1) *withstack.withStack (2) runtime.errorString
*
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x59dbbef]
goroutine 2436 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc00372c800, 0x9535560, 0xc0039185d0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:233 +0x126
panic(0x802a6c0, 0xbe9a370)
/usr/local/Cellar/go/1.15.4/libexec/src/runtime/panic.go:969 +0x1b9
go.etcd.io/etcd/raft/v3.(*raft).hardState(...)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/raft.go:374
go.etcd.io/etcd/raft/v3.getBasicStatus(...)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/status.go:61
go.etcd.io/etcd/raft/v3.getStatus(0xc02aa1b900, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/status.go:70 +0xaf
go.etcd.io/etcd/raft/v3.(*RawNode).Status(...)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft/v3/rawnode.go:184
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).raftStatusRLocked(0xc0033d5600, 0xc05ff068d0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica.go:1109 +0xa5
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Metrics(0xc0033d5600, 0x9535560, 0xc05ff068d0, 0x16758555d06ffd60, 0x0, 0xc05fbd5f80, 0x9, 0x0, 0x0, 0x0, ...)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_metrics.go:56 +0x7c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges.func1(0xc0033d5600, 0xc05fbd5f01)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2543 +0x148
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*storeReplicaVisitor).Visit(0xc05fbd5fb0, 0xc0234739e8)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:376 +0x151
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges(0xc008149000, 0x9535560, 0xc05ff068d0, 0x0, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2542 +0x32c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetrics(0xc008149000, 0x9535560, 0xc0039185d0, 0xef, 0x0, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2642 +0xd8
github.com/cockroachdb/cockroach/pkg/server.(*Node).computePeriodicMetrics.func1(0xc008149000, 0x4014a45, 0xc0a784fd48)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:676 +0x57
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1(0x7, 0xc008149000, 0xc0a784fd48)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:148 +0x38
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range(0xc000f9eab0, 0xc023473dd8)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:352 +0x130
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores(0xc000f9ea80, 0xc0a784fe20, 0x0, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:147 +0x75
github.com/cockroachdb/cockroach/pkg/server.(*Node).computePeriodicMetrics(0xc003b20000, 0x9535560, 0xc0039185d0, 0xef, 0x1, 0x0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:675 +0x77
github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics.func1(0x9535560, 0xc0039185d0)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:662 +0x165
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc00372c800, 0x9535560, 0xc0039185d0, 0x0, 0xc00802d820)
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:351 +0xb9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
/Users/andrewwoods/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xfc``` |
This is a different panic. It's interesting that you can hit these at will. Is there anything particular you're doing? Does this happen "early" in the lifetime of the cluster or randomly in the middle? What version? |
@irfansharif could you look into this (doesn't have to be during breather week, but next week would be good)? |
I have been running |
The stack trace says that one of the metric objects added to the registry is a |
The two stack traces above appear unrelated
|
The second failure is perplexing. We start here cockroach/pkg/kv/kvserver/replica.go Lines 1108 to 1111 in 1416a4e
so cockroach/pkg/kv/kvserver/replica_raft.go Lines 1434 to 1440 in 28be982
and if you look inside it's quite clear that its |
As for the metrics crash, there is some nil handling here: cockroach/pkg/util/metric/registry.go Lines 97 to 110 in e013e6d
I don't know if that would save us. But also, given that we never see this outside of this issue, and that the raft nil thing is basically impossible given a sane env, I am putting my money on this being something weird about Andy's setup, though unclear what. |
The first step I'd like to suggest is for @awoods187 to upgrade Go to 1.15.10 or later. The version used here (1.15.4) has known bugs which we need to exclude first from this analysis. |
@knz: Are you thinking of golang/go#44614? |
yes |
That does sound plausible with the second panic, looks like in go1.15.4 we can accidentally GC memory referenced in the same manner |
Ok I'm going to mark this issue as resolved when #63837 merges. |
Yesterday, in demo, I hit a panic in metrics collection.
The text was updated successfully, but these errors were encountered: