storage: de-flake TestRefreshPendingCommands

The test ran a for loop without preemption points. The loop checked a condition that would only become true after another goroutine had been scheduled and carried out its job. If, with only few cores (four in my case) GC kicked in before that other goroutine got scheduled, that loop would just run hot forever until the test timed out, and the resulting stack dump looked quite unhelpful. Add a small sleep so the runtime can preempt the goroutine. The issue was harder to run into when stressing only the test, since there was less garbage available at that point. Adding some print statements, I accidentally made it much more likely. Previously flaked (got stuck) within <500iters, now ran past 1.5k without problems. Fixes cockroachdb#19397. Fixes cockroachdb#19388. Touches cockroachdb#19367. Fixes cockroachdb#18554.
tbg · Oct 20, 2017 · 1b44843 · 1b44843
1 parent d41bab2
commit 1b44843
Showing 1 changed file with 8 additions and 0 deletions.
diff --git a/pkg/storage/client_raft_test.go b/pkg/storage/client_raft_test.go
@@ -1090,6 +1090,14 @@ func TestRefreshPendingCommands(t *testing.T) {
 				for i := 0; i < 2; i++ {
 					draining = draining && mtc.stores[i].IsDraining()
 				}
+				// If this goroutine can't be preempted, and we don't have other threads
+				// available (for example since you only have four and the other three
+				// are in GC, or you even only have one to begin with!), it can end up
+				// spinning forever. In the case that prompted adding this, there are
+				// four cores and "gc assist" got us stuck.
+				//
+				// See #18554.
+				time.Sleep(time.Nanosecond)
 			}
 			mtc.advanceClock(context.Background())