Skip to content

Commit

Permalink
runtime: skip work recheck for non-spinning Ms
Browse files Browse the repository at this point in the history
When an M transitions from spinning to non-spinning state, it must
recheck most sources of work to avoid missing work submitted between its
initial check and decrementing sched.nmspinning (see "delicate dance"
comment).

Ever since the scheduler rewrite in Go 1.1 (golang.org/cl/7314062), we
have performed this recheck on all Ms before stopping, regardless of
whether or not they were spinning.

Unfortunately, there is a problem with this approach: non-spinning Ms
are not eligible to steal work (note the skip over the stealWork block),
but can detect work during the recheck. If there is work available, this
non-spinning M will jump to top, skip stealing, land in recheck again,
and repeat. i.e., it will spin uselessly.

The spin is bounded. This can only occur if there is another spinning M,
which will either take the work, allowing this M to stop, or take some
other work, allowing this M to upgrade to spinning. But the spinning is
ultimately just a fancy spin-wait.

golang.org/issue/43997 discusses several ways to address this. This CL
takes the simplest approach: skipping the recheck on non-spinning Ms and
allowing them to go to stop.

Results for scheduler-relevant runtime and time benchmarks can be found
at https://perf.golang.org/search?q=upload:20210420.5.

The new BenchmarkCreateGoroutinesSingle is a characteristic example
workload that hits this issue hard. A single M readies lots of work
without itself parking. Other Ms must spin to steal work, which is very
short-lived, forcing those Ms to spin again. Some of the Ms will be
non-spinning and hit the above bug.

With this fixed, that benchmark drops in CPU usage by a massive 68%, and
wall time 24%. BenchmarkNetpollBreak shows similar drops because it is
unintentionally almost the same benchmark (create short-living Gs in a
loop). Typical well-behaved programs show little change.

We also measure scheduling latency (time from goready to execute). Note
that many of these benchmarks are very noisy because they don't involve
much scheduling. Those that do, like CreateGoroutinesSingle, are
expected to increase as we are replacing unintentional spin waiting with
a real park.

Fixes #43997

Change-Id: Ie1d1e1800f393cee1792455412caaa5865d13562
Reviewed-on: https://go-review.googlesource.com/c/go/+/310850
Trust: Michael Pratt <[email protected]>
Run-TryBot: Michael Pratt <[email protected]>
TryBot-Result: Go Bot <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
  • Loading branch information
prattmic committed Apr 22, 2021
1 parent b6ff3c6 commit ecfce58
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 28 deletions.
58 changes: 30 additions & 28 deletions src/runtime/proc.go
Original file line number Diff line number Diff line change
Expand Up @@ -2841,44 +2841,46 @@ top:
if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
throw("findrunnable: negative nmspinning")
}
}

// Check all runqueues once again.
_p_ = checkRunqsNoP(allpSnapshot, idlepMaskSnapshot)
if _p_ != nil {
acquirep(_p_)
if wasSpinning {
// Note the for correctness, only the last M transitioning from
// spinning to non-spinning must perform these rechecks to
// ensure no missed work. We are performing it on every M that
// transitions as a conservative change to monitor effects on
// latency. See golang.org/issue/43997.

// Check all runqueues once again.
_p_ = checkRunqsNoP(allpSnapshot, idlepMaskSnapshot)
if _p_ != nil {
acquirep(_p_)
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
goto top
}
goto top
}

// Check for idle-priority GC work again.
_p_, gp = checkIdleGCNoP()
if _p_ != nil {
acquirep(_p_)
if wasSpinning {
// Check for idle-priority GC work again.
_p_, gp = checkIdleGCNoP()
if _p_ != nil {
acquirep(_p_)
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}

// Run the idle worker.
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
// Run the idle worker.
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
return gp, false
}

// Finally, check for timer creation or expiry concurrently with
// transitioning from spinning to non-spinning.
//
// Note that we cannot use checkTimers here because it calls
// adjusttimers which may need to allocate memory, and that isn't
// allowed when we don't have an active P.
pollUntil = checkTimersNoP(allpSnapshot, timerpMaskSnapshot, pollUntil)
// Finally, check for timer creation or expiry concurrently with
// transitioning from spinning to non-spinning.
//
// Note that we cannot use checkTimers here because it calls
// adjusttimers which may need to allocate memory, and that isn't
// allowed when we don't have an active P.
pollUntil = checkTimersNoP(allpSnapshot, timerpMaskSnapshot, pollUntil)
}

// Poll network until next timer.
if netpollinited() && (atomic.Load(&netpollWaiters) > 0 || pollUntil != 0) && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
Expand Down
49 changes: 49 additions & 0 deletions src/runtime/proc_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,55 @@ func BenchmarkCreateGoroutinesCapture(b *testing.B) {
}
}

// warmupScheduler ensures the scheduler has at least targetThreadCount threads
// in its thread pool.
func warmupScheduler(targetThreadCount int) {
var wg sync.WaitGroup
var count int32
for i := 0; i < targetThreadCount; i++ {
wg.Add(1)
go func() {
atomic.AddInt32(&count, 1)
for atomic.LoadInt32(&count) < int32(targetThreadCount) {
// spin until all threads started
}

// spin a bit more to ensure they are all running on separate CPUs.
doWork(time.Millisecond)
wg.Done()
}()
}
wg.Wait()
}

func doWork(dur time.Duration) {
start := time.Now()
for time.Since(start) < dur {
}
}

// BenchmarkCreateGoroutinesSingle creates many goroutines, all from a single
// producer (the main benchmark goroutine).
//
// Compared to BenchmarkCreateGoroutines, this causes different behavior in the
// scheduler because Ms are much more likely to need to steal work from the
// main P rather than having work in the local run queue.
func BenchmarkCreateGoroutinesSingle(b *testing.B) {
// Since we are interested in stealing behavior, warm the scheduler to
// get all the Ps running first.
warmupScheduler(runtime.GOMAXPROCS(0))
b.ResetTimer()

var wg sync.WaitGroup
wg.Add(b.N)
for i := 0; i < b.N; i++ {
go func(){
wg.Done()
}()
}
wg.Wait()
}

func BenchmarkClosureCall(b *testing.B) {
sum := 0
off1 := 1
Expand Down

0 comments on commit ecfce58

Please sign in to comment.