detect and close dead TCP connection #3206

mechpen · 2019-11-22T18:48:18Z

What version of gRPC are you using?

v1.23.0

What version of Go are you using (`go version`)?

1.13

What operating system (Linux, Windows, …) and version?

ubuntu 18.04

What did you do?

A "dead" TCP connection is when no packet is received from a TCP peer. This could happen when the peer kernel panic or packets from the peer are dropped by iptables (e.g. in kubernetes, when a node is removed, some CNI may start dropping all packets from the node).

By default, grpc-go could not detect "dead" TCP connections. All RPC calls return "DEADLINE_EXCEEDED":

rpc error: code = DeadlineExceeded desc = context deadline exceeded

This error continues for about 15 minutes, until kernel TCP retransmission times out and closes the connection.

One solution is to enable gRPC "keepalive" pings. But this not enabled by default.

To reproduce the problem, run the following command on a gRPC client host:

iptables -I INPUT -s <server-ip> -p tcp --sport <server-port> -j DROP

What did you expect to see?

gRPC should enable keepalive by default to detect dead TCP connections.

What did you see instead?

The text was updated successfully, but these errors were encountered:

gotwarlost · 2019-11-22T18:59:37Z

To add to @mechpen 's comments, we have seen this behavior under various situations all on Kubernetes.

gRPC client to an envoy proxy (using istio) where the envoy upstream process is dead. This could be related to how the envoy proxy deals with broken upstream connections.
gRPC client to services running on kubernetes masters as used by the kiam project when the server is abruptly killed example of symptoms

/cc @kyessenov @mandarjog @duderino

menghanl · 2019-11-22T21:21:22Z

As mentioned in the original post, keepalive is the solution here.

To enable it, see the doc and the example.
The parameters and default values can also be found at the godoc.

There's no plan to change the default behavior for keepalive. We don't want to change default behavior unless there's a strong reason to. Please try enabling it and see if it solves all the problems.

mechpen · 2019-11-22T22:31:26Z

Yes, enabling keepalive does fix the issue.

Do you recommend enabling keepalive in general? If so, could you please add this in your guides, such that many grpc users could enable keepalive in their applications?

mechpen added the Type: Bug label Nov 22, 2019

mechpen mentioned this issue Nov 22, 2019

enable gRPC keepalive to detect dead TCP connections (#217) uswitch/kiam#331

Closed

menghanl closed this as completed Nov 22, 2019

mechpen mentioned this issue Nov 22, 2019

kiam server HA. Context deadline exceeded while rotating servers. uswitch/kiam#217

Closed

mehstg mentioned this issue Nov 25, 2019

Issue #217 - Add GRPC keepalive to server/client uswitch/kiam#333

Closed

lock bot locked as resolved and limited conversation to collaborators May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect and close dead TCP connection #3206

detect and close dead TCP connection #3206

mechpen commented Nov 22, 2019 •

edited

Loading

gotwarlost commented Nov 22, 2019

menghanl commented Nov 22, 2019

mechpen commented Nov 22, 2019

detect and close dead TCP connection #3206

detect and close dead TCP connection #3206

Comments

mechpen commented Nov 22, 2019 • edited Loading

What version of gRPC are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

What did you see instead?

gotwarlost commented Nov 22, 2019

menghanl commented Nov 22, 2019

mechpen commented Nov 22, 2019

mechpen commented Nov 22, 2019 •

edited

Loading

What version of Go are you using (`go version`)?