Return # of state transition when generating last replication tasks #4352

wxing1292 · 2023-05-17T17:27:51Z

What changed?

Return # of state transition when generating last replication tasks
When force replicate workflows, rate limit by state transition count

Why?
Caller can have better idea how many state transitions will happen when applying this replication tasks (including catch up)

How did you test it?
N/A

Potential risks
N/A

Is hotfix candidate?
N/A

…asks

yux0 · 2023-05-17T18:31:28Z

service/worker/migration/activities.go

-		// ignore NotFound error
+	switch err.(type) {
+	case nil:
+		_ = rateLimiter.ReserveN(time.Now(), int(resp.StateTransitionCount))


If the ST count excceds the burst rate, it won't reserve the token.

"// The returned Reservation’s OK() method returns false if n exceeds the Limiter's burst size."

you are definitely right, but what should be the behavior?
return error is not a valid option (i assume)
logging error?

hehaifengcn · 2023-05-17T22:19:59Z

service/history/workflow/task_generator.go

 	}

 	if r.mutableState.GetExecutionState().State == enumsspb.WORKFLOW_EXECUTION_STATE_COMPLETED {
 		return &tasks.SyncWorkflowStateTask{
 			// TaskID, VisibilityTimestamp is set by shard
 			WorkflowKey: r.mutableState.GetWorkflowKey(),
 			Version:     lastItem.GetVersion(),
-		}, nil
+		}, 1, nil
 	} else {
 		return &tasks.HistoryReplicationTask{
 			// TaskID, VisibilityTimestamp is set by shard
 			WorkflowKey:  r.mutableState.GetWorkflowKey(),
 			FirstEventID: executionInfo.LastFirstEventId,
 			NextEventID:  lastItem.GetEventId() + 1,
 			Version:      lastItem.GetVersion(),


does it make sense to include StateTransitionCount as part of HistoryReplicationTask?

no, state transition is basically "how many LWT / Tx" has this workflow caused so far

this metrics can be useful for predicting the # of LWT.

do you have any use case for this metrics?

cleaner IMO. avoiding the return types of the function becomes (a, b, c....). Leave u on it.

this is a go feature, not a bug ...

hehaifengcn · 2023-05-17T22:28:32Z

service/worker/migration/activities.go

+	switch err.(type) {
+	case nil:
+		stateTransitionCount := resp.StateTransitionCount
+		for stateTransitionCount > 0 {


puzzled by the need of for loop here. not knowing how rateLimiter is implemented, would ReserveN(, stateTransitionCount) fail if stateTransitionCount > rateLimiter.Burst()?

yes, the returned reservation will return Ok() == false

Do we know the upper bound of ratio between stateTransitionCount and Replication Tasks? If we know, say X, we could probably set burst to X*RPS instead.

the upper limit of number of history events is 50k.
on avg, i would say 2 history events per LWT. so the upper limit of # of ST per WF is acound 25K.

using 25K as burst is meaningless

hehaifengcn · 2023-05-17T22:40:57Z

service/worker/migration/activities.go

@@ -252,24 +254,37 @@ func (a *activities) checkHandoverOnce(ctx context.Context, waitRequest waitHand
 	return readyShardCount == len(resp.Shards), nil
 }

-func (a *activities) generateWorkflowReplicationTask(ctx context.Context, wKey definition.WorkflowKey) error {
+func (a *activities) generateWorkflowReplicationTask(ctx context.Context, rateLimiter quotas.RateLimiter, wKey definition.WorkflowKey) error {
+	if err := rateLimiter.WaitN(ctx, 1); err != nil {


rateLimiter.WaitN(ctx, 1) -> rateLimiter.Wait(ctx)?

hehaifengcn · 2023-05-17T22:42:00Z

service/worker/migration/activities.go

+		for stateTransitionCount > 0 {
+			token := util.Min(int(stateTransitionCount), rateLimiter.Burst())
+			stateTransitionCount -= int64(token)
+			_ = rateLimiter.ReserveN(time.Now(), token)


do you still intend to wait on the token? I think ReserveN only returns reservation. You still need to call time.Sleep(r.Delay()) to wait according to the doc.

since we need to call the API to know the # of LWTs (tokens to consume)
here the logic should just reserve those tokens, next call (WaitN function) will be blocked

I c. Can you add some comment to help understanding? Are you suggesting WaitN(, 1) at line 258 will wait on all reserved tokens? WaitN comment says "WaitN blocks until lim permits n events to happen." so I assume it will just consume 1 token?

Are you suggesting WaitN(, 1) at line 258 will wait on all reserved tokens?

plz take a look at the rate limiter doc: https://pkg.go.dev/golang.org/x/time/rate#Limiter.ReserveN

hehaifengcn · 2023-05-17T22:48:52Z

proto/internal/temporal/server/api/historyservice/v1/request_response.proto

@@ -574,6 +574,7 @@ message GenerateLastHistoryReplicationTasksRequest {
 }

 message GenerateLastHistoryReplicationTasksResponse {
+    int64 state_transition_count = 1;


any upgrade/downgrade concern?

old caller will get 0 here

and the system still use WaitN(ctx, 1) (before making the API call)
so for old logic, this will be a noop

for new logic, tokens consumed will be 1 per generate task call + n state transition from call result

"old caller will get 0 here"
You mean new history client calls old history service?

wxing1292 requested review from meiliang86 and yux0 May 17, 2023 17:27

wxing1292 requested a review from a team as a code owner May 17, 2023 17:27

wxing1292 and others added 2 commits May 17, 2023 10:50

Return # of state transition count when generating last replication t…

1bccba8

…asks

Merge branch 'master' into generate-replicaion-task-hint

2a36086

yux0 reviewed May 17, 2023

View reviewed changes

address comments

04c5ddb

yux0 approved these changes May 17, 2023

View reviewed changes

hehaifengcn reviewed May 17, 2023

View reviewed changes

meiliang86 approved these changes May 19, 2023

View reviewed changes

wxing1292 merged commit ed49604 into temporalio:master May 19, 2023

wxing1292 deleted the generate-replicaion-task-hint branch May 19, 2023 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return # of state transition when generating last replication tasks #4352

Return # of state transition when generating last replication tasks #4352

wxing1292 commented May 17, 2023 •

edited

Loading

yux0 May 17, 2023 •

edited

Loading

wxing1292 May 17, 2023

hehaifengcn May 17, 2023

wxing1292 May 17, 2023

hehaifengcn May 18, 2023

wxing1292 May 19, 2023

hehaifengcn May 17, 2023

wxing1292 May 17, 2023

hehaifengcn May 18, 2023

wxing1292 May 19, 2023

hehaifengcn May 17, 2023

hehaifengcn May 17, 2023 •

edited

Loading

wxing1292 May 17, 2023

hehaifengcn May 18, 2023

wxing1292 May 19, 2023

hehaifengcn May 17, 2023

wxing1292 May 17, 2023

hehaifengcn May 18, 2023

wxing1292 May 19, 2023

Return # of state transition when generating last replication tasks #4352

Return # of state transition when generating last replication tasks #4352

Conversation

wxing1292 commented May 17, 2023 • edited Loading

yux0 May 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hehaifengcn May 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wxing1292 commented May 17, 2023 •

edited

Loading

yux0 May 17, 2023 •

edited

Loading

hehaifengcn May 17, 2023 •

edited

Loading