RATIS-2245. Ratis should wait for all apply transaction futures before taking snapshot and group remove #1218

swamirishi · 2025-02-07T17:03:03Z

What changes were proposed in this pull request?

On Ratis Snapshot and group removal the statemachine just waits for apply transactions that have been applied on a single iteration. If there are no more transactions added onto the state machine and all of the apply transaction future are still in progress. The state machine ends up not waiting for the updater thread and ends up calling the notifyGroupRemove function and deletes the raft group directory. So this could lead to some node not being able to apply some of the transactions still in flight in case of a restart.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-2245

How was this patch tested?

Adding unit tests.

…e taking snapshot and group remove

szetszwo

@swamirishi , thanks a lot for working on this!

Please see the comments inlined.

szetszwo · 2025-02-07T18:24:34Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

@@ -263,7 +261,7 @@ private MemoizedSupplier<List<CompletableFuture<Message>>> applyLog() throws Raf
        final long incremented = appliedIndex.incrementAndGet(debugIndexChange);
        Preconditions.assertTrue(incremented == nextIndex);
        if (f != null) {
-          futures.get().add(f);
+          applyLogFutures = applyLogFutures.thenCombine(f, (previous, current) -> previous);


It should return current, i.e. (previous, current) -> current.

current is a Message class returned and previous is null value Void

We don't need the Message returned after apply transactionLog

You are right. Then, let's return null, i.e. (v, message) -> null.

@swamirishi , any update for this?

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

szetszwo

@swamirishi , thanks for the update! Please see the comments inlined.

szetszwo · 2025-02-08T17:06:11Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

@@ -263,7 +261,7 @@ private MemoizedSupplier<List<CompletableFuture<Message>>> applyLog() throws Raf
        final long incremented = appliedIndex.incrementAndGet(debugIndexChange);
        Preconditions.assertTrue(incremented == nextIndex);
        if (f != null) {
-          futures.get().add(f);
+          applyLogFutures = applyLogFutures.thenCombine(f, (previous, current) -> previous);


@swamirishi , any update for this?

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

szetszwo · 2025-02-08T17:08:17Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

@@ -190,11 +189,10 @@ public void run() {
          reload();
        }

-        final MemoizedSupplier<List<CompletableFuture<Message>>> futures = applyLog();
-        checkAndTakeSnapshot(futures);
+        applyLogFutures = applyLog(applyLogFutures);


Let's check if applyLogFutures.isCompletedExceptionally() before applying logs.

But in this case we have already submitted next set of tasks right? What would be the point of checking the exception?

The point is to fail fast. Otherwise, it will keep applying log even if the state machine has failed.

The statemachine can take a call right if it has failed. From what I understand statemachine should receive all the transactions, and it is on the statemachine to have the guardrail if it should apply a transaction or not.

If we want to fail fast we should fail the group remove as well.

sumitagrawl

@swamirishi I have few concern about implementation, please refer comment. Do we need this impact only in stop() case or all case? ....

sumitagrawl · 2025-02-10T13:35:28Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

    for(; state != State.STOP; ) {
      try {
-        waitForCommit();
+        waitForCommit(applyLogFutures);


Trying to understand changes and impact,

waitForCoimmit(applyLogFutures) --> call takeSnapshot(applyLogFutures) --> future.get() ==> No Ops

applyLog(applyLogFutures): can add some future object to be wait

checkAndTakeSnapshot(applyLogFutures) : This is moved before STOP(), and does future.get() by internal takeSnapshot()

In Stop() check, just do future.get() for waiting task completion which was not there earlier

Q1: Step 3 changes checkAndTakeSnapshot() moving out of stop() and force checking to take snapshot do have performance impact?
Q2: Let No snapshot to be taken, in this case, future.get() is never called, do this is intended ?

wait on future is called if need take snapshot in checkAndTakeSnapshot()

wait on future is called if stop()

else its not called

Q3: We can refactor code as above behavior, that,

applyLog() can return existing future set

check for future.get() before stop()

move checkAndTakeSnapshot() out before stop()

This have similar impact instead of passing future deep inside takeSnapshot().

ratis/ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

Line 196 in e25ba08

applyLogFutures.get();

Future.get() will be called inconsequential of whether the snapshot is taken or not.

We need to do a future.get() just before taking a snapshot. We cannot be having transactions still being applied in the background which could cause some inconsistency. It makes more sense to wait just before taking a snapshot that is why I moved the future.get() inside takeSnapshot method

ratis/ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

Line 286 in e25ba08

applyLogFutures.get();

sumitagrawl · 2025-02-10T13:49:02Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

-        final MemoizedSupplier<List<CompletableFuture<Message>>> futures = applyLog();
-        checkAndTakeSnapshot(futures);
+        applyLogFutures = applyLog(applyLogFutures);
+        checkAndTakeSnapshot(applyLogFutures);


In one of case where no snpashot to be taken and no stop(), future.get() will not be called. Do this will fix the issue as expected for applyTransaction to finish ? I think no, it should wait for all case.

And related to performance impact if always need wait, do this call is always in stop() flow or all flow ? this part is not clear.

as discussed, this is not a problem.

sumitagrawl

LGTM

swamirishi · 2025-02-11T15:17:58Z

@szetszwo there are some flaky tests failure here, can you please retrigger the CI here

szetszwo · 2025-02-11T18:50:42Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

+        try {
+          applyLogFutures.get();
+        } catch (InterruptedException | ExecutionException e) {
+          LOG.info("{}: interrupted while waiting for apply transactions", this, t);
+        }


@swamirishi , Why calling applyLogFutures.get() when there is already an exception/error? It could cause problems such as:

The future may never complete.

The thread may already be interrupted.

In may throw another runtime exception/error (such as OOM)

The test failures could be caused by this change.

I have updated the catch block to not wait when the exception is ExecutionException & InterruptedException. We should wait for apply transactions in all other cases

@swamirishi , Why calling applyLogFutures.get() when there is already an exception/error? It could cause problems such as:

The future may never complete.

It is guaranteed it will always complete right?

The thread may already be interrupted.

In may throw another runtime exception/error (such as OOM)

It won't be thrown by the future right? If the future throws exception we anyways handle it and mark the future as completed.
https://github.com/apache/ratis/blob/b635b137329f011e715c9083e93c7bb7f22b221f/ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java#L271C11-L274C25

It is guaranteed it will always complete right?

@swamirishi , Who will guarantee that? Ozone? Any doc to support this statement? It seems impossible in an erroneous condition.

It won't be thrown by the future right? ...

It definitely could throw OOM.

szetszwo · 2025-02-12T00:06:44Z

... We should wait for apply transactions in all other cases

The question is why?

The future may never complete.

The thread may already be interrupted.

In may throw another runtime exception/error (such as OOM)

How about (1) and (3) above?

szetszwo

@swamirishi , thanks for the update! Please see the comments inlined.

szetszwo · 2025-02-19T19:07:48Z

ratis-server-api/src/main/java/org/apache/ratis/statemachine/StateMachine.java

+    default void notifyGroupRemove(boolean applyTransactionFailure) {
+      notifyGroupRemove();
+    }


Please don't add notifyGroupRemove(boolean applyTransactionFailure) here. Instead, it should fail the groupRemove request.

BTW, let's change groupRemove in a separated JIRA instead of changing it here.

Will create a jira for this. Removed it from here.

szetszwo · 2025-02-19T19:11:29Z

ratis-server/src/main/java/org/apache/ratis/server/impl/StateMachineUpdater.java

+          applyLogFutures = applyLogFutures.thenCombine(f.exceptionally(ex -> {
+            LOG.error("Exception while {}: applying txn index={}, nextLog={}", this, nextIndex,
+                    LogProtoUtils.toLogEntryString(entry));
+            return null;
+          }), (v, message) -> null);


Move out f.exceptionally, i.e.

f.exceptionally(ex -> { LOG.error("Exception while {}: applying txn index={}, nextLog={}", this, nextIndex, LogProtoUtils.toLogEntryString(entry)); return null; }) applyLogFutures = applyLogFutures.thenCombine(f, (v, message) -> null);

combine will fail if one apply transaction fails right?

Both them are the same in our case. Separating them makes it more readable.

szetszwo · 2025-02-19T19:13:41Z

ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java

+      deleteDirectory = false;
+      renameDirectory = true;


It seems wrong to change the request silently. We need some design on this.

removed it. WIll create a follow up jira for this.

szetszwo

+1 the change looks good.

…e taking snapshot and group remove (apache#1218)

RATIS-2245. Ratis should wait for all apply transaction futures befor…

858aa1c

…e taking snapshot and group remove

swamirishi force-pushed the RATIS-2245 branch from ed7c066 to 858aa1c Compare February 7, 2025 17:04

szetszwo reviewed Feb 7, 2025

View reviewed changes

RATIS-2245. Address review comments

ef35f59

szetszwo reviewed Feb 8, 2025

View reviewed changes

swamirishi added 2 commits February 8, 2025 09:18

RATIS-2245. Add futures.get() on stop

7626c79

RATIS-2245. Wait for apply log futures before taking snapshot

e25ba08

sumitagrawl reviewed Feb 10, 2025

View reviewed changes

RATIS-2245. Add test case

e76749e

sumitagrawl approved these changes Feb 11, 2025

View reviewed changes

swamirishi marked this pull request as ready for review February 11, 2025 15:15

szetszwo reviewed Feb 11, 2025

View reviewed changes

RATIS-2245. Fix exception handling

b635b13

swamirishi added 2 commits February 17, 2025 18:24

RATIS-2245. Fix exception handling and group removal

fcbf213

RATIS-2245. Fix checkstyle

f956ea6

szetszwo requested changes Feb 19, 2025

View reviewed changes

swamirishi added 3 commits February 19, 2025 14:00

Merge remote-tracking branch 'origin/master' into HEAD

4d281b5

RATIS-2245. Address review comments

c6f5fe3

RATIS-2245. Add log for debuggability

17be393

szetszwo approved these changes Feb 19, 2025

View reviewed changes

szetszwo merged commit 663a44b into apache:master Feb 19, 2025
13 checks passed

szetszwo pushed a commit to szetszwo/ratis that referenced this pull request Mar 30, 2025

RATIS-2245. Ratis should wait for all apply transaction futures befor…

904a75e

…e taking snapshot and group remove (apache#1218)

RATIS-2245. Ratis should wait for all apply transaction futures before taking snapshot and group remove #1218

RATIS-2245. Ratis should wait for all apply transaction futures before taking snapshot and group remove #1218

Conversation

swamirishi commented Feb 7, 2025

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

szetszwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szetszwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sumitagrawl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swamirishi Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sumitagrawl left a comment

Choose a reason for hiding this comment

swamirishi commented Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swamirishi Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szetszwo commented Feb 12, 2025

szetszwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swamirishi Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swamirishi Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

szetszwo left a comment

Choose a reason for hiding this comment

swamirishi Feb 10, 2025 •

edited

Loading

swamirishi commented Feb 11, 2025 •

edited

Loading

swamirishi Feb 12, 2025 •

edited

Loading

swamirishi Feb 19, 2025 •

edited

Loading

swamirishi Feb 19, 2025 •

edited

Loading