Etcd loses track of leases #14019

cjbottaro · 2022-05-08T00:20:34Z

What happened?

Etcd thinks these leases still exist...

%{
   header: %{
     cluster_id: 8855822249941472443,
     member_id: 16980875726527512478,
     raft_term: 252,
     revision: 16834726
   },
   leases: [
     %{ID: 3492964341253159182},
     %{ID: 4584242825964808783},
     %{ID: 362681125277541271},
     %{ID: 3492964341253159049},
     %{ID: 362681125277541277},
     %{ID: 362681125277541327},
     %{ID: 362681125277541281},
     %{ID: 362681125277541309},
     %{ID: 3492964341253159261},
     %{ID: 362681125277541235},
     %{ID: 362681125277541359},
     %{ID: 362681125277541357},
     %{ID: 362681125277541263},
     %{ID: 362681125277541267},
     %{ID: 362681125277541319},
     %{ID: 3492964341253159252},
     %{ID: 4584242825964808733},
...

Each lease has a ttl of 5s. We shutdown our system (no new leases created) and these leases still exist over 24h later.

Trying to revoke any of these leases results in an error saying they don't exist.

:eetcd_lease.revoke(Etcd, 3492964341253159182)
{:error,
 {:grpc_error,
  %{"grpc-message": "etcdserver: requested lease not found", "grpc-status": 5}}}

We had to shutdown our system because asking Etcd for a lease would result in a timeout. After a few hours of our system being shutdown, Etcd stopped timing out when asking for a lease.

What did you expect to happen?

Etcd to continue to grant leases under heavy load and not lose track of leases.

How can we reproduce it (as minimally and precisely as possible)?

Locking/unlocking unique locks at a rate of 80 per second, each using a lease with ttl=5. Letting that run for a day or so.

Anything else we need to know?

We use Etcd for distributed locking and nothing else. Each lease is 5s. We have a 3 node cluster running on a single Core i3-8100 machine. We are only requesting about 80 locks per second.

--auto-compaction-retention=5m

It seems this setting is reasonable given that our locks have a ttl of 5s.

Etcd version (please run commands below)

quay.io/coreos/etcd:v3.5.3

Etcd configuration (command line flags or environment variables)

--auto-compaction-retention=5m

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ kubectl exec etcd-0-6776dd499d-k92mn -- etcdctl member list -w table
+------------------+---------+--------+--------------------+--------------------+------------+
|        ID        | STATUS  |  NAME  |     PEER ADDRS     |    CLIENT ADDRS    | IS LEARNER |
+------------------+---------+--------+--------------------+--------------------+------------+
| 554ff50439e23079 | started | etcd-0 | http://etcd-0:2380 | http://etcd-0:2379 |      false |
| 5ba2a9fe8d840508 | started | etcd-2 | http://etcd-2:2380 | http://etcd-2:2379 |      false |
| eba8308136b9bf9e | started | etcd-1 | http://etcd-1:2380 | http://etcd-1:2379 |      false |
+------------------+---------+--------+--------------------+--------------------+------------+

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

ahrtr · 2022-05-08T06:23:51Z

Probably you are running into 13205. Could you try to reproduce this issue using the latest code in release-3.5 or main branches.

Please also provide the following info so that others can take a closer look.

The complete etcd command-line configuration;
The detailed steps to reproduce this issue.

ahrtr · 2022-09-08T08:26:58Z

@cjbottaro any update on this? Have you enabled auth in the cluster?

cjbottaro · 2022-09-08T15:29:00Z

Ahh, I stopped using Etcd for locks and went back to using single node Redis for now. Will come back to Etcd if the need for HA ever arises though.

stale · 2022-12-31T23:10:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

cjbottaro added the type/bug label May 8, 2022

serathius mentioned this issue Jun 21, 2022

Plans for v3.5.5 release #14138

Closed

16 tasks

serathius added the release/v3.5 label Sep 7, 2022

stale bot added the stale label Dec 31, 2022

stale bot closed this as completed Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcd loses track of leases #14019

Etcd loses track of leases #14019

cjbottaro commented May 8, 2022

ahrtr commented May 8, 2022

ahrtr commented Sep 8, 2022

cjbottaro commented Sep 8, 2022

stale bot commented Dec 31, 2022

Etcd loses track of leases #14019

Etcd loses track of leases #14019

Comments

cjbottaro commented May 8, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

ahrtr commented May 8, 2022

ahrtr commented Sep 8, 2022

cjbottaro commented Sep 8, 2022

stale bot commented Dec 31, 2022