Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partitioned activity taskQueue based on workflowId, for better caching performance in activity workers #427

Open
dhiaayachi opened this issue Sep 5, 2024 · 0 comments

Comments

@dhiaayachi
Copy link
Owner

Is your feature request related to a problem? Please describe.
Tasks on an activityTaskqueue are randomly fetched by competing activity workers reading from the same queue.
In case of subsequent activity calls on different activity workers, caching per pod does not work effectively.

For Workflow workers, temporal takes care for their cache performance using sticky exectution. For activity workers, a similar concept does not exist without custom code.

The typical workflows that I usually come up with consist of multiple activities which all operate on one entity (e.g. order workflow with activities operating on 1 ecommerce order, user workflow on 1 user, uber driver workflow on 1 driver entity).

The problem exists in the following scenario

  • there is a set of activities on the same task queue, all operating on the same entity (e.g. Order)
  • when the state of the entity is not fully owned by the workflow history alone (but instead e.g. in some data base, and temporal is only passing identifiers)
  • and when there’s a node(/pod)-specific entity cache in place (which is probably not the case for smaller services), for example Ehcache

Describe the solution you'd like
Partition activity task queues based on workflowId, and assign partitions to specific worker instances (similar to partition assignment within a kafka consumer group).

The number of partitions could be either fixed, or dynamic.
A limitation to have smaller or equal number of workers than queue partitions is not strictly required. If the number of consumer exceeds the number of partitions, it would still improve caching if let’s say only 2 worker instances are competing for tasks on the same partition, as opposed to all worker instances read from all partitions.

Such a taskQueue feature should be only optional, and not the default, as only systems with a cache would benefit from it. Otherwise, random or round-robin dispatching would lead to a better load-balancing across the workers

Describe alternatives you've considered
set custom task queue names in workflow code using activity options, like in the fileprocessing example, which requires to have custom logic in

  1. worker setup (to start the activity worker on aunique taskqueue)
  2. workflow code (to set the acivityOptions in workflowmethod dynamically, instead of when creating the Workflow worker)
  3. activity code (to return a unique taskqueue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant