14370

issue etcd-io/etcd#14370

Background

This is a legacy issue in all releases, including 3.4, 3.5 and main. We will resolve this issue in 3.4.21, 3.5.5 and main respectively.

The issue can only happen in one-member cluster. The etcd instance might run into a situation that a client gets a success response to the write request, but the data is actually lost when the etcd instance crashes immediately after it responds to the client but before it successfully persists the data in both WAL file and BoltDB db.

How to reproduce this issue

Steps:

Get the gofail runtime package for both etcdserver and etcdutl. Specifically, run command below in both server and etcdutl,

$ go get go.etcd.io/gofail/runtime   # execute this command for both server and etcdutl

Build etcd with failpoint and start etcd

$ FAILPOINTS=enable make
$ GOFAIL_HTTP="127.0.0.1:22381" ./bin/etcd

Trigger the failpoints
Note: run this step and the following steps in a separate terminal,

curl http://127.0.0.1:22381/etcdserver/raftBeforeLeaderSend -XPUT -d'sleep(100)'
curl http://127.0.0.1:22381/etcdserver/raftBeforeSave -XPUT -d'panic'
curl http://127.0.0.1:22381/backend/beforeCommit -XPUT -d'sleep(200)'

Send a put request to etcdserver

$ ./bin/etcdctl  put k1 v1

Please note that the client(etcdctl) will get an "OK" response, and the etcd crashes.

Start etcd again and get the data

$ ./bin/etcd
$ ./bin/etcdctl  get k1   ## no data because the data was lost

You will get nothing because the data has already been lost.
Note:

You can refer to the original's reporters' steps to reproduce this issue as well.
Please read gofail/design to get more detailed info on the gofail project.

Root cause

The leader updates the pr.Match and marks an entry as committed immediately after appending it to the unstable logs for one-member cluster.

Afterwards, it sends identical Entries and CommittedEntries to etcdserver via the Ready channel. After receiving the ready data, etcdserver persists the Entries to the WAL file concurrently with applying the CommittedEntries, and responds to the client immediately after it finishes the applying workflow.

Unfortunately, it doesn't mean that the data has been successfully saved in both WAL file and BoltDB db when the client receives the response, see reasons below. If the etcd instance crashes after the client gets the response but before etcdserver successfully persists the Entries in WAL file and CommittedEntries in boltDB, then the data loss is lost.

etcdserver persists the Entries concurrently with the applying workflow.
etcdserver commits the boltDB transaction periodically instead of on each request.

Why multi-member cluster doesn't have this issue

It's a little complicated. In short, it's possible for the leader to mark an entry as committed before it successfully being persisted in the WAL files of majority members, but there is no chance for the leader to broadcast the new commitId to any follower before the entry is successfully persisted in its local WAL file and of course WAL files of majority members.

Please also refer to 14370#issuecomment-1232584558 to get deeper understanding if you are interested.

How to fix this issue

I delivered 4 solutions/PRs for this issue,

etcd/pull/14411 is the best one among all the four solutions. But I suggested to fix the issue for release-3.5 and 3.4 using etcd/pull/14400 for safety. It isn't finalized yet. Please refer to issues/14370#issuecomment-1232560340 to get detailed info.

Piotr Tabor (ptabor@) and Tobias Grieger (tbg@) also raised two draft PRs for this issue.

etcd/pull/14406, it's ptabor's PR and already closed.
etcd/pull/14413, it's tbg's draft PR which is still in progress of development.

Please see 14370#issuecomment-1235091312 to get my personal analysis & comparison between etcd/pull/14411 and etcd/pull/14413. We are still waiting for the feedback from Tobias Grieger and Ben Darnell, who are the maintainers of the etcd raft package. Please also refer to 14370#issuecomment-1235790496

Update(2022-09-06): The second solution has already been merged to main branch, and cherry-picked to release-3.5 and release-3.4 respectively. Tobias Grieger is still working on the raft side change against main branch. Please refer to 14370#issuecomment-1237574729.

How to workaround this issue

Please try to set up multi-member cluster in production environment.
We haven't received any real issue coming from production environment for single-member cluster yet. You have to upgrade to 3.5.5 (to be released) or 3.4.21 (to be released) if you have to set up one-member cluster.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

14370

14370

README.md

issue etcd-io/etcd#14370

Table of Contents

Background

How to reproduce this issue

Root cause

Why multi-member cluster doesn't have this issue

How to fix this issue

How to workaround this issue

Files

14370

Directory actions

More options

Directory actions

More options

Latest commit

History

14370

Folders and files

parent directory

README.md

issue etcd-io/etcd#14370

Table of Contents

Background

How to reproduce this issue

Root cause

Why multi-member cluster doesn't have this issue

How to fix this issue

How to workaround this issue