Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: apparent deadlock in TestCloseWrite on darwin-arm64-corellium #34837

Closed
bcmills opened this issue Oct 11, 2019 · 12 comments
Closed

net: apparent deadlock in TestCloseWrite on darwin-arm64-corellium #34837

bcmills opened this issue Oct 11, 2019 · 12 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Oct 11, 2019

From the darwin-arm64-corellium builder (https://build.golang.org/log/0f26cd7aadb20043bcb06081b5b9c0a633bcb9fe):

panic: test timed out after 3m0s

goroutine 601 [running]:
testing.(*M).startAlarm.func1()
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:1377 +0xc0
created by time.goFunc
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/time/sleep.go:168 +0x38

[…]

goroutine 599 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x10578ce98, 0x72, 0x102f20380)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/runtime/netpoll.go:184 +0x3c
internal/poll.(*pollDesc).wait(0x130356718, 0x72, 0x0, 0x1, 0xffffffffffffffff)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_poll_runtime.go:87 +0x30
internal/poll.(*pollDesc).waitRead(...)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0x130356700, 0x13011000f, 0x1, 0x1, 0x0, 0x0, 0x0)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/internal/poll/fd_unix.go:169 +0x1b8
net.(*netFD).Read(0x130356700, 0x13011000f, 0x1, 0x1, 0x130356700, 0x102e44fd4, 0x1)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/fd_unix.go:202 +0x3c
net.(*conn).Read(0x130018010, 0x13011000f, 0x1, 0x1, 0x0, 0x0, 0x0)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/net.go:184 +0x68
net.TestCloseWrite(0x13017c600)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/net/net_test.go:151 +0x3cc
testing.tRunner(0x13017c600, 0x102f1ee80)
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:909 +0xb0
created by testing.(*T).Run
	/tmp/workdir-host-darwin-arm64-corellium-ios/go/src/testing/testing.go:960 +0x29c

[…]

FAIL	net	180.181s

CC @mikioh @bradfitz @ianlancetaylor

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 11, 2019
@bcmills bcmills added this to the Go1.14 milestone Oct 11, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Oct 11, 2019

Another one on darwin-arm64-corellium. Is it possible that this is a recent regression?

https://build.golang.org/log/175d78eebd4aa686657b7faf57755ff9ee52d02e

@odeke-em

This comment has been minimized.

@bcmills

This comment has been minimized.

@ianlancetaylor
Copy link
Member

I don't really see how but it's conceivable that this is a recent regression due to https://golang.org/cl/197938.

@ianlancetaylor
Copy link
Member

The test is both fairly straightforward and not all that important. If someone wants to debug it, great, but I would be inclined to just skip it on darwim/arm64.

@ianlancetaylor
Copy link
Member

Hasn't happened since November 7. I'm calling this fixed.

@bcmills
Copy link
Contributor Author

bcmills commented Mar 16, 2020

2020-03-14T04:12:41-70dc28f/darwin-arm64-corellium
2020-03-03T19:53:02-24343cb/darwin-arm64-corellium
2020-02-27T21:24:58-1c4e515/darwin-arm64-corellium

Given the apparent slowness of the network stack on this builder (#37322, #35498, and others), I wonder if this test is deadlocking due to a race that the other builders just aren't slow enough to trigger.

@bcmills
Copy link
Contributor Author

bcmills commented Mar 16, 2020

Or, perhaps the test is timing out somewhere and written in such a way that timeouts manifest as deadlocks?

@bcmills bcmills modified the milestones: Go1.14, Unplanned Apr 8, 2020
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227588 mentions this issue: net: convert many Close tests to use parallel subtests

gopherbot pushed a commit that referenced this issue Apr 9, 2020
Also set a deadline in TestCloseWrite so that we can more easily
determine which kind of connection is getting stuck on the
darwin-arm64-corellium builder (#34837).

Change-Id: I8ccacbf436e8e493fb2298a79b17e0af8fc6eb81
Reviewed-on: https://go-review.googlesource.com/c/go/+/227588
Run-TryBot: Bryan C. Mills <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@bcmills
Copy link
Contributor Author

bcmills commented Apr 14, 2020

Looks like at least the tcp stack is affected (2020-04-13T21:56:15-1b15c7f/darwin-arm64-corellium):

--- FAIL: TestCloseWrite (0.00s)
    --- FAIL: TestCloseWrite/tcp (158.92s)
        net_test.go:172: got (0, read tcp 127.0.0.1:58175->127.0.0.1:58174: i/o timeout); want (0, io.EOF)
        net_test.go:112: got (0, read tcp4 127.0.0.1:58174->127.0.0.1:58175: i/o timeout); want (0, io.EOF)
FAIL
FAIL	net	162.420s

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Darwin
Projects
None yet
Development

No branches or pull requests

4 participants