Skip to content

Commit dc0bb66

Browse files
committed
storage: emit health alert when >100 Raft snapshots are queued
This is a dangerous condition. Adding this to the health checker has the additional benefit of logging it during the nightly restore/import tests, which can in turn help diagnose whether a particular run is affected by cockroachdb#31409. Release note: None
1 parent ef0423b commit dc0bb66

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

pkg/server/status/health_check.go

+4
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,10 @@ var trackedMetrics = map[string]threshold{
7777
"queue.raftsnapshot.process.failure": counterZero,
7878
"queue.tsmaintenance.process.failure": counterZero,
7979
"queue.consistency.process.failure": counterZero,
80+
81+
// When there are more than 100 pending items in the Raft snapshot queue,
82+
// this is certainly worth pointing out.
83+
"queue.raftsnapshot.pending": {gauge: true, min: 100},
8084
}
8185

8286
type metricsMap map[roachpb.StoreID]map[string]float64

0 commit comments

Comments
 (0)