-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/http2: random read deadlock #39812
Comments
Have you built and deployed a race enabled version of your application? |
Thanks for this link, @davecheney. I've just deployed a race-enabled binary on our primary proxy. No race detection so far. Side question: are race enabled binaries supposed to be noticeably slower ? |
yes, race detection does add overhead. I always recommend that people use |
cc @fraenkel |
If it's of any help: with Go 1.14.3 + race enabled build, 24 days later, the issue popped again. Same symptoms, same stack trace, no race condition reported. It's an unusually long interval (the average duration between outages is around 10 days), I can't tell if it's related to the upgraded Go, the race enabled build or a change in our load in this period. Or just pure random. |
@Xfennec please follow up on one of the related issues. Thank you |
Hi!
What version of Go are you using (
go version
)?(seen from at least
go1.10.4
)Does this issue reproduce with the latest release?
Currently testing with 1.14.3
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
We use a chain of two HTTP proxies written in Go to route incoming traffic to our virtual machines:
What did you expect to see?
Stable service over time.
What did you see instead?
Every ~10 days, we detect a total stall of requests between our frontal ("primary") proxy and one of the secondary proxies. New requests are accepted, sent to the secondary, then to the VM, but the response is stalled somewhere between the secondary and the primary proxy.
All request to this secondary proxy will then wait forever (since all requests are multiplexed inside the same HTTP2 connection)
Restarting one of the proxies, or even killing the "faulty" HTTP2 connection (using
ss -K
) between our proxies makes the service work perfectly again.We've had this issue for months now, and I still fail to see any common pattern (time of the day, server load, requests, …) that would be the trigger. I've tried to write a few test/repro, without success either.
Here's a sample of the stack on the primary proxy during one of the outages:
At the same time, on the secondary proxy, I see this:
Tranport
level to mitigate this issue?The text was updated successfully, but these errors were encountered: