runtime: let sysmon sleep in netpoll if possible #51651

hzzb · 2022-03-14T11:54:28Z

Currently, the scheduler picks up next to-run g in the following
delicate order:

1 locked g.
2 trace reader g.
3 GC backgroud mark worker g.
4 global runq if schedtick%61 == 0.
5 local runq including timers.
6 findrunnable loop.
6.1 finalizer g.
6.2 local runq including timers.
6.3 global runq, no schedtick checked.
6.4 non-blocking netpoll(0) if netpollWaiters > 0 and no other M polling.
6.5 steal any timers and runnable g from other P. spinning state.
6.6 GC idle mark worker g.
6.7 recheck all other P's local runq, idle GC marker and timers.
non-spinning state, P is idle.
6.8 blocking netpoll(delay) until next earliest timer or IO ready.
if any other M already polling, no blocking netpoll. M parked.

If found a runnable g in any step, all steps following it will be skipped.

Suppose there are many runnable gs, scheduler would almost find a g before
trying findrunnable, so netpoll left untouched. Normal Ms have no opportunity
to unpark netpoll waiters g even though IO readied. The special M sysmon polls
network if not polled for more than 10ms, which is the only opportunity to
ready any netpoll waiters g into the global runq. This causes a false positive
timeout problem.

I did a simple test to demostrate this problem. Dialing a localhost TCP
listening port costs about 10ms. This is obviously not credible. See file
runtime/sysmon_test.go for detail test code.

Normal Ms will sleep in netpoll waiting for timer and IO ready if no g
found at last. The special M sysmon simply sleeps 20us~10ms and does
a non-blocking netpoll if needed. This CL let sysmon sleep in netpoll,
do a blocking netpoll like what normal Ms do. So netpoll waiters g will
be readied into runq as soon as IO readied.

With this change, the above mentioned test shows that dialing to localhost
TCP listening port costs mostly less than 1ms. Its much more reasonable than
10ms. Detailed test results for the old and new versions are as follows:

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN TestSysmonReadyNetpollWaitersASAP
sysmon_test.go:105: dialed 85 times within 1.000029211s
sysmon_test.go:106: timeBucket count percent
sysmon_test.go:108: [ 0, 1)ms 1 1.18%
sysmon_test.go:108: [ 1, 2)ms 0 0.00%
sysmon_test.go:108: [ 2, 3)ms 0 0.00%
sysmon_test.go:108: [ 3, 4)ms 0 0.00%
sysmon_test.go:108: [ 4, 5)ms 0 0.00%
sysmon_test.go:108: [ 5, 6)ms 0 0.00%
sysmon_test.go:108: [ 6, 7)ms 0 0.00%
sysmon_test.go:108: [ 7, 8)ms 0 0.00%
sysmon_test.go:108: [ 8, 9)ms 1 1.18%
sysmon_test.go:108: [ 9,10)ms 2 2.35%
sysmon_test.go:108: [10,11)ms 32 37.65%
sysmon_test.go:108: [11,12)ms 49 57.65%
--- FAIL: TestSysmonReadyNetpollWaitersASAP (1.11s)

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN TestSysmonReadyNetpollWaitersASAP
sysmon_test.go:105: dialed 2369 times within 1.000021368s
sysmon_test.go:106: timeBucket count percent
sysmon_test.go:108: [ 0, 1)ms 2209 93.25%
sysmon_test.go:108: [ 1, 2)ms 13 0.55%
sysmon_test.go:108: [ 2, 3)ms 33 1.39%
sysmon_test.go:108: [ 3, 4)ms 68 2.87%
sysmon_test.go:108: [ 4, 5)ms 8 0.34%
sysmon_test.go:108: [ 5, 6)ms 9 0.38%
sysmon_test.go:108: [ 6, 7)ms 6 0.25%
sysmon_test.go:108: [ 7, 8)ms 4 0.17%
sysmon_test.go:108: [ 8, 9)ms 7 0.30%
sysmon_test.go:108: [ 9,10)ms 4 0.17%
sysmon_test.go:108: [10,11)ms 5 0.21%
sysmon_test.go:108: [11,12)ms 3 0.13%
--- PASS: TestSysmonReadyNetpollWaitersASAP (1.11s)

Currently, the scheduler picks up next to-run g in the following delicate order: 1 locked g. 2 trace reader g. 3 GC backgroud mark worker g. 4 global runq if schedtick%61 == 0. 5 local runq including timers. 6 findrunnable loop. 6.1 finalizer g. 6.2 local runq including timers. 6.3 global runq, no schedtick checked. 6.4 non-blocking netpoll(0) if netpollWaiters > 0 and no other M polling. 6.5 steal any timers and runnable g from other P. spinning state. 6.6 GC idle mark worker g. 6.7 recheck all other P's local runq, idle GC marker and timers. non-spinning state, P is idle. 6.8 blocking netpoll(delay) until next earliest timer or IO ready. if any other M already polling, no blocking netpoll. M parked. If found a runnable g in any step, all steps following it will be skipped. Suppose there are many runnable gs, scheduler would almost find a g before trying findrunnable, so netpoll left untouched. Normal Ms have no opportunity to unpark netpoll waiters g even though IO readied. The special M sysmon polls network if not polled for more than 10ms, which is the only opportunity to ready any netpoll waiters g into the global runq. This causes a false positive timeout problem. I did a simple test to demostrate this problem. Dialing a localhost TCP listening port costs about 10ms. This is obviously not credible. See file runtime/sysmon_test.go for detail test code. Normal Ms will sleep in netpoll waiting for timer and IO ready if no g found at last. The special M sysmon simply sleeps 20us~10ms and does a non-blocking netpoll if needed. This CL let sysmon sleep in netpoll, do a blocking netpoll like what normal Ms do. So netpoll waiters g will be readied into runq as soon as IO readied. With this change, the above mentioned test shows that dialing to localhost TCP listening port costs mostly less than 1ms. Its much more reasonable than 10ms. Detailed test results for the old and new versions are as follows: $ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP === RUN TestSysmonReadyNetpollWaitersASAP sysmon_test.go:105: dialed 85 times within 1.000029211s sysmon_test.go:106: timeBucket count percent sysmon_test.go:108: [ 0, 1)ms 1 1.18% sysmon_test.go:108: [ 1, 2)ms 0 0.00% sysmon_test.go:108: [ 2, 3)ms 0 0.00% sysmon_test.go:108: [ 3, 4)ms 0 0.00% sysmon_test.go:108: [ 4, 5)ms 0 0.00% sysmon_test.go:108: [ 5, 6)ms 0 0.00% sysmon_test.go:108: [ 6, 7)ms 0 0.00% sysmon_test.go:108: [ 7, 8)ms 0 0.00% sysmon_test.go:108: [ 8, 9)ms 1 1.18% sysmon_test.go:108: [ 9,10)ms 2 2.35% sysmon_test.go:108: [10,11)ms 32 37.65% sysmon_test.go:108: [11,12)ms 49 57.65% --- FAIL: TestSysmonReadyNetpollWaitersASAP (1.11s) $ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP === RUN TestSysmonReadyNetpollWaitersASAP sysmon_test.go:105: dialed 2369 times within 1.000021368s sysmon_test.go:106: timeBucket count percent sysmon_test.go:108: [ 0, 1)ms 2209 93.25% sysmon_test.go:108: [ 1, 2)ms 13 0.55% sysmon_test.go:108: [ 2, 3)ms 33 1.39% sysmon_test.go:108: [ 3, 4)ms 68 2.87% sysmon_test.go:108: [ 4, 5)ms 8 0.34% sysmon_test.go:108: [ 5, 6)ms 9 0.38% sysmon_test.go:108: [ 6, 7)ms 6 0.25% sysmon_test.go:108: [ 7, 8)ms 4 0.17% sysmon_test.go:108: [ 8, 9)ms 7 0.30% sysmon_test.go:108: [ 9,10)ms 4 0.17% sysmon_test.go:108: [10,11)ms 5 0.21% sysmon_test.go:108: [11,12)ms 3 0.13% --- PASS: TestSysmonReadyNetpollWaitersASAP (1.11s)

gopherbot · 2022-03-14T11:57:25Z

This PR (HEAD: d38fc4e) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

gopherbot · 2022-03-14T16:36:14Z

Message from Michael Pratt:

Patch Set 1: Run-TryBot+1 Trust+1

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-14T16:36:15Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-14T16:36:15Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-14T16:36:16Z

Message from Gopher Robot:

Patch Set 1: TryBot-Result-1

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-14T16:36:17Z

Message from Michael Pratt:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

skip sysmon test on wasm. high system load may starve sysmon thead, skip test.

gopherbot · 2022-03-15T11:57:49Z

This PR (HEAD: 38aaf26) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

gopherbot · 2022-03-15T12:25:45Z

Message from hzzb:

Patch Set 2:

(3 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-15T15:36:35Z

Message from Michael Pratt:

Patch Set 2: Run-TryBot+1 Trust+1

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-15T15:42:08Z

Message from Gopher Robot:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-15T15:42:09Z

Message from Gopher Robot:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-15T16:01:17Z

Message from Gopher Robot:

Patch Set 2: TryBot-Result-1

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

android emulator may be slow. last time bucket looks misleading.

gopherbot · 2022-03-22T02:49:44Z

This PR (HEAD: 0cd8c92) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

gopherbot · 2022-03-22T03:22:31Z

Message from hzzb:

Patch Set 2:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-22T03:22:31Z

Message from hzzb:

Patch Set 3:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-03-23T10:51:09Z

Message from hzzb:

Patch Set 3:

(4 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-05-20T13:47:37Z

Message from Michael Pratt:

Patch Set 3: Run-TryBot+1

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-05-20T13:51:26Z

Message from Gopher Robot:

Patch Set 3:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2022-05-20T14:03:36Z

Message from Gopher Robot:

Patch Set 3: TryBot-Result+1

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

netpollGenericInit's callee may need a P on some platforms.

38aaf26

skip sysmon test on wasm. high system load may starve sysmon thead, skip test.

sysmon usleep(20us) will actually sleep 20ms on openbsd.

0cd8c92

android emulator may be slow. last time bucket looks misleading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: let sysmon sleep in netpoll if possible #51651

runtime: let sysmon sleep in netpoll if possible #51651

hzzb commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 23, 2022

gopherbot commented May 20, 2022

gopherbot commented May 20, 2022

gopherbot commented May 20, 2022

runtime: let sysmon sleep in netpoll if possible #51651

Are you sure you want to change the base?

runtime: let sysmon sleep in netpoll if possible #51651

Conversation

hzzb commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 14, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 15, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 22, 2022

gopherbot commented Mar 23, 2022

gopherbot commented May 20, 2022

gopherbot commented May 20, 2022

gopherbot commented May 20, 2022