-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client connection phase should optionally wait for SETTINGS frame and set deadlines #1444
Comments
We will see if we can temporarily mask timed-out endpoint in application layer. |
It looks like clients are only waiting for a connection to be made and for the client preface and a settings frame to be sent to the server -- never waiting for the server to send a valid settings frame back -- before attempting to use the connection. It may make sense to wait for that settings frame before using the connection. We'd need to do this through a Further, we noticed there are no deadlines on the reads/writes happening during connection initialization, which is problematic -- we should set these to the deadline of the context during this phase. |
cc @vtubati The changes to implement this are not significant, but we have higher priority things in flight right now. I expect this to be done within a month. |
Any update on this bug? We regularly run into crazy busy loop situations and I have to manually patch |
Thanks for the ping. We should hopefully be able to have this done by the end of next week. |
We haven't made much progress on this, but it's at the top of our priority list. Also, we have a slightly different plan:
|
Any update on this? |
Any update please? This is a problem with the client busy-looping when connecting to a TCP reverse-proxy like haproxy that accepts the connection and has no other choice than closing it if no backend is healthy. |
This should be in this week. Sorry for the delay I got distracted by something else. |
Note: this is PR #1648 if you are curious. |
Just to be clear (and for the casual pedestrian stumbling on this issue and seeing it closed) the issue isn't actually fixed unless we use the new |
If I understand your concerns correctly, then I believe it should be fixed for everyone. We will not consider a connection "successful" (from a backoff perspective) if the server never sent the HTTP2 preface to the client. The option is there to prevent RPCs from being assigned to the channel until after the handshake has been received. This can be set if you want extra-stable behavior so RPCs don't fail due to a connection that fails in this way. |
What version of gRPC are you using?
Master branch as of today (bfaf042).
What version of Go are you using (
go version
)?What operating system (Linux, Windows, …) and version?
MacOS
What did you do?
c.f. etcd-io/etcd#8258
What did you expect to see?
We want to use
keepalive
for HTTP/2 ping health checking. We expect endpoint switch when one endpoint times out onkeepalive
.What did you see instead?
keepalive
time-out triggers address connection state update to transient failure, andresetTransport
retries this endpoint: balancer keeps callingUp
on the timed-out endpoint. If the endpoint never comes back, balancer gets stuck with retrying.Is there any other way to stop those retries on timed-out endpoint, and try others? We have our own balancer interface implementation, but the
keepalive
time-out error is not distinguishable in client side, so not much we can do.Here's the code path for reference:
grpc.Balancer(ep1,ep2)
withkeepalive
1-secondBlackhole(ep1)
keepalive(ep1)
times out in 1-second, which is expectedgrpc-go/transport/http2_client.go/*http2Client
calls(*http2Client).Close
onep1
ep1
hastransportState
reachable
at the momentclose(t.errorChan)
<-t.Error()
ongrpc-go/clientconn.go/(*addrConn).transportMonitor()
ep1
*addrConn.(connectivity.State)
isconnectivity.Ready
ep1
*addrConn.(connectivity.State)
is set toconnectivity.TransientFailure
resetTransport(drain=false)
onep1
ep1
'sdown
withgrpc: failed with network I/O error
resetTransport(drain=false)
retries onep1
unless*addrConn.(connectivity.State) != connectivity.Shutdown
for retries := 0; ; retries++ {
ep1
's*addrConn.(connectivity.State) == connectivity.TransientFailure
ac.cc.dopts.balancer.Up(ep1)
ep1
Thanks.
The text was updated successfully, but these errors were encountered: