Improve State restart time on failure caused by server restart, by using activity heartbeat, when startToClose is large #551

longquanzheng · 2025-03-11T21:10:05Z

Currently when iwf server restarts, the state api will fail and wait for next attempt by the startToClose timeout + backoff retry interval.
If the startToClose timeout is very large (e.g. >10 mins), it will wait for a long time. To avoid the unnecessary waiting, Temporal/Cadence has a concept of "activity heartbeat" to tell Temporal/Cadence server that the worker is still alive. If no heartbeat is received within heartbeat timeout, Temporal/Cadence will reschedule next activity immediately based on backoff retry policy.

Note: this is also because of the fact that Temporal/Cadence activity task/worker is "polling based". iWF task/worker is "pushing" so it doesn't have such issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve State restart time on failure caused by server restart, by using activity heartbeat, when startToClose is large #551

Improve State restart time on failure caused by server restart, by using activity heartbeat, when startToClose is large #551

longquanzheng commented Mar 11, 2025

Improve State restart time on failure caused by server restart, by using activity heartbeat, when startToClose is large #551

Improve State restart time on failure caused by server restart, by using activity heartbeat, when startToClose is large #551

Comments

longquanzheng commented Mar 11, 2025