dropped internal Raft message since sending buffer is full (overloaded network) #19635

rahulbapumore · 2025-03-21T14:44:14Z

Bug report criteria

This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
You have read the etcd bug reporting guidelines.
Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

We are facing below error messages and etcd is restarting because it could not maintain quorum.
2025-03-18T12:21:38.350+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.451+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.551+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.651+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.751+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.850+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.951+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:38.972+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.041+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.151+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.251+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.351+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.451+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.550+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:39.850+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.275+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.277+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.351+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.651+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.751+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.851+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:40.950+01:00 dropped internal Raft message since sending buffer is full (overloaded network) 2025-03-18T12:21:41.050+01:00 dropped internal Raft message since sending buffer is full (overloaded network

From ETCD documentation , we found that its happening because too many client requests creating congestion in network, delaying peer communication.
https://etcd.io/docs/v3.5/tuning/

There are few manual steps given to set traffic priority, But we need some internal solution/WA like some parameter if that set then we wont see any restarts in ETCD

Could you please help with query?

Thanks in advance

What did you expect to happen?

No restart in etcd

How can we reproduce it (as minimally and precisely as possible)?

ETCD is deployed in the from of container controlled by statefulset, 3 replicas are setup.
We are upgrading our chart by changing certificates for ETCD, we are setting PEER_AUTO_TLS_ENABLED variable from true to false during upgrade.
When pod-2 is restarted due to upgrade and it started trusting siptls cert, it is not able to join older cluster because pod-0 and pod1 are trusting self-signed certs. So, pod-2 is out of the cluster, and pod-0/pod-1 are continuously flooding with peer connection request in order to have pod-2 inside existing cluster. This is the expected behavior from DCED but due to the high traffic during the upgrade and flooding of the peer requests, buffer is getting full inside the DCED pod-1 which restarted the etcd process inside pod-1 .

Anything else we need to know?

No response

Etcd version (please run commands below)

bash-4.4$ etcd --version
etcd Version: 3.5.15
Git SHA: 9a55333
Go Version: go1.21.12
Go OS/Arch: linux/amd64
bash-4.4$ etcdctl version
etcdctl version: 3.5.15
API version: 3.5
bash-4.4$

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

The text was updated successfully, but these errors were encountered:

rahulbapumore · 2025-03-25T08:38:08Z

Hi @kumarlokesh @ahrtr @jmhbnz
We are already using etcd v3.5.12, and in that version pipelineBufSize is already set to 64, but still we are facing above error.
So do you mean after that parameter is made dynamically available[through https://github.com//pull/19663] to set through etcd config, we need to increase pipelineBufSize further more?

Thanks

rahulbapumore · 2025-03-26T09:58:27Z

Hi @ahrtr @jmhbnz @kumarlokesh
Any updates?

rahulbapumore added the type/bug label Mar 21, 2025

kumarlokesh mentioned this issue Mar 24, 2025

Dynamically assign pipeline buffer size #19663

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dropped internal Raft message since sending buffer is full (overloaded network) #19635

dropped internal Raft message since sending buffer is full (overloaded network) #19635

rahulbapumore commented Mar 21, 2025

rahulbapumore commented Mar 25, 2025 •

edited

Loading

rahulbapumore commented Mar 26, 2025

dropped internal Raft message since sending buffer is full (overloaded network) #19635

dropped internal Raft message since sending buffer is full (overloaded network) #19635

Comments

rahulbapumore commented Mar 21, 2025

Bug report criteria

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

rahulbapumore commented Mar 25, 2025 • edited Loading

rahulbapumore commented Mar 26, 2025

rahulbapumore commented Mar 25, 2025 •

edited

Loading