Skip to content

runtime: let sysmon sleep in netpoll if possible #51651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hzzb
Copy link
Contributor

@hzzb hzzb commented Mar 14, 2022

Currently, the scheduler picks up next to-run g in the following
delicate order:

1 locked g.
2 trace reader g.
3 GC backgroud mark worker g.
4 global runq if schedtick%61 == 0.
5 local runq including timers.
6 findrunnable loop.
6.1 finalizer g.
6.2 local runq including timers.
6.3 global runq, no schedtick checked.
6.4 non-blocking netpoll(0) if netpollWaiters > 0 and no other M polling.
6.5 steal any timers and runnable g from other P. spinning state.
6.6 GC idle mark worker g.
6.7 recheck all other P's local runq, idle GC marker and timers.
non-spinning state, P is idle.
6.8 blocking netpoll(delay) until next earliest timer or IO ready.
if any other M already polling, no blocking netpoll. M parked.

If found a runnable g in any step, all steps following it will be skipped.

Suppose there are many runnable gs, scheduler would almost find a g before
trying findrunnable, so netpoll left untouched. Normal Ms have no opportunity
to unpark netpoll waiters g even though IO readied. The special M sysmon polls
network if not polled for more than 10ms, which is the only opportunity to
ready any netpoll waiters g into the global runq. This causes a false positive
timeout problem.

I did a simple test to demostrate this problem. Dialing a localhost TCP
listening port costs about 10ms. This is obviously not credible. See file
runtime/sysmon_test.go for detail test code.

Normal Ms will sleep in netpoll waiting for timer and IO ready if no g
found at last. The special M sysmon simply sleeps 20us~10ms and does
a non-blocking netpoll if needed. This CL let sysmon sleep in netpoll,
do a blocking netpoll like what normal Ms do. So netpoll waiters g will
be readied into runq as soon as IO readied.

With this change, the above mentioned test shows that dialing to localhost
TCP listening port costs mostly less than 1ms. Its much more reasonable than
10ms. Detailed test results for the old and new versions are as follows:

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN TestSysmonReadyNetpollWaitersASAP
sysmon_test.go:105: dialed 85 times within 1.000029211s
sysmon_test.go:106: timeBucket count percent
sysmon_test.go:108: [ 0, 1)ms 1 1.18%
sysmon_test.go:108: [ 1, 2)ms 0 0.00%
sysmon_test.go:108: [ 2, 3)ms 0 0.00%
sysmon_test.go:108: [ 3, 4)ms 0 0.00%
sysmon_test.go:108: [ 4, 5)ms 0 0.00%
sysmon_test.go:108: [ 5, 6)ms 0 0.00%
sysmon_test.go:108: [ 6, 7)ms 0 0.00%
sysmon_test.go:108: [ 7, 8)ms 0 0.00%
sysmon_test.go:108: [ 8, 9)ms 1 1.18%
sysmon_test.go:108: [ 9,10)ms 2 2.35%
sysmon_test.go:108: [10,11)ms 32 37.65%
sysmon_test.go:108: [11,12)ms 49 57.65%
--- FAIL: TestSysmonReadyNetpollWaitersASAP (1.11s)

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN TestSysmonReadyNetpollWaitersASAP
sysmon_test.go:105: dialed 2369 times within 1.000021368s
sysmon_test.go:106: timeBucket count percent
sysmon_test.go:108: [ 0, 1)ms 2209 93.25%
sysmon_test.go:108: [ 1, 2)ms 13 0.55%
sysmon_test.go:108: [ 2, 3)ms 33 1.39%
sysmon_test.go:108: [ 3, 4)ms 68 2.87%
sysmon_test.go:108: [ 4, 5)ms 8 0.34%
sysmon_test.go:108: [ 5, 6)ms 9 0.38%
sysmon_test.go:108: [ 6, 7)ms 6 0.25%
sysmon_test.go:108: [ 7, 8)ms 4 0.17%
sysmon_test.go:108: [ 8, 9)ms 7 0.30%
sysmon_test.go:108: [ 9,10)ms 4 0.17%
sysmon_test.go:108: [10,11)ms 5 0.21%
sysmon_test.go:108: [11,12)ms 3 0.13%
--- PASS: TestSysmonReadyNetpollWaitersASAP (1.11s)

Currently, the scheduler picks up next to-run g in the following
delicate order:

1 locked g.
2 trace reader g.
3 GC backgroud mark worker g.
4 global runq if schedtick%61 == 0.
5 local runq including timers.
6 findrunnable loop.
  6.1 finalizer g.
  6.2 local runq including timers.
  6.3 global runq, no schedtick checked.
  6.4 non-blocking netpoll(0) if netpollWaiters > 0 and no other M polling.
  6.5 steal any timers and runnable g from other P. spinning state.
  6.6 GC idle mark worker g.
  6.7 recheck all other P's local runq, idle GC marker and timers.
      non-spinning state, P is idle.
  6.8 blocking netpoll(delay) until next earliest timer or IO ready.
      if any other M already polling, no blocking netpoll. M parked.

If found a runnable g in any step, all steps following it will be skipped.

Suppose there are many runnable gs, scheduler would almost find a g before
trying findrunnable, so netpoll left untouched. Normal Ms have no opportunity
to unpark netpoll waiters g even though IO readied. The special M sysmon polls
network if not polled for more than 10ms, which is the only opportunity to
ready any netpoll waiters g into the global runq. This causes a false positive
timeout problem.

I did a simple test to demostrate this problem. Dialing a localhost TCP
listening port costs about 10ms. This is obviously not credible. See file
runtime/sysmon_test.go for detail test code.

Normal Ms will sleep in netpoll waiting for timer and IO ready if no g
found at last. The special M sysmon simply sleeps 20us~10ms and does
a non-blocking netpoll if needed. This CL let sysmon sleep in netpoll,
do a blocking netpoll like what normal Ms do. So netpoll waiters g will
be readied into runq as soon as IO readied.

With this change, the above mentioned test shows that dialing to localhost
TCP listening port costs mostly less than 1ms. Its much more reasonable than
10ms. Detailed test results for the old and new versions are as follows:

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN   TestSysmonReadyNetpollWaitersASAP
    sysmon_test.go:105: dialed 85 times within 1.000029211s
    sysmon_test.go:106: timeBucket      count   percent
    sysmon_test.go:108: [ 0, 1)ms       1       1.18%
    sysmon_test.go:108: [ 1, 2)ms       0       0.00%
    sysmon_test.go:108: [ 2, 3)ms       0       0.00%
    sysmon_test.go:108: [ 3, 4)ms       0       0.00%
    sysmon_test.go:108: [ 4, 5)ms       0       0.00%
    sysmon_test.go:108: [ 5, 6)ms       0       0.00%
    sysmon_test.go:108: [ 6, 7)ms       0       0.00%
    sysmon_test.go:108: [ 7, 8)ms       0       0.00%
    sysmon_test.go:108: [ 8, 9)ms       1       1.18%
    sysmon_test.go:108: [ 9,10)ms       2       2.35%
    sysmon_test.go:108: [10,11)ms       32      37.65%
    sysmon_test.go:108: [11,12)ms       49      57.65%
--- FAIL: TestSysmonReadyNetpollWaitersASAP (1.11s)

$ ../../bin/go test -v -run TestSysmonReadyNetpollWaitersASAP
=== RUN   TestSysmonReadyNetpollWaitersASAP
    sysmon_test.go:105: dialed 2369 times within 1.000021368s
    sysmon_test.go:106: timeBucket      count   percent
    sysmon_test.go:108: [ 0, 1)ms       2209    93.25%
    sysmon_test.go:108: [ 1, 2)ms       13      0.55%
    sysmon_test.go:108: [ 2, 3)ms       33      1.39%
    sysmon_test.go:108: [ 3, 4)ms       68      2.87%
    sysmon_test.go:108: [ 4, 5)ms       8       0.34%
    sysmon_test.go:108: [ 5, 6)ms       9       0.38%
    sysmon_test.go:108: [ 6, 7)ms       6       0.25%
    sysmon_test.go:108: [ 7, 8)ms       4       0.17%
    sysmon_test.go:108: [ 8, 9)ms       7       0.30%
    sysmon_test.go:108: [ 9,10)ms       4       0.17%
    sysmon_test.go:108: [10,11)ms       5       0.21%
    sysmon_test.go:108: [11,12)ms       3       0.13%
--- PASS: TestSysmonReadyNetpollWaitersASAP (1.11s)
@gopherbot
Copy link
Contributor

This PR (HEAD: d38fc4e) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from Michael Pratt:

Patch Set 1: Run-TryBot+1 Trust+1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 1: TryBot-Result-1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Michael Pratt:

Patch Set 1:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

skip sysmon test on wasm.
high system load may starve sysmon thead, skip test.
@gopherbot
Copy link
Contributor

This PR (HEAD: 38aaf26) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from hzzb:

Patch Set 2:

(3 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Michael Pratt:

Patch Set 2: Run-TryBot+1 Trust+1


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 2: TryBot-Result-1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

android emulator may be slow.
last time bucket looks misleading.
@gopherbot
Copy link
Contributor

This PR (HEAD: 0cd8c92) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/392314 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link
Contributor

Message from hzzb:

Patch Set 2:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from hzzb:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from hzzb:

Patch Set 3:

(4 comments)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Michael Pratt:

Patch Set 3: Run-TryBot+1


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 3:

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link
Contributor

Message from Gopher Robot:

Patch Set 3: TryBot-Result+1

(1 comment)


Please don’t reply on this GitHub thread. Visit golang.org/cl/392314.
After addressing review feedback, remember to publish your drafts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants