Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store explicit TaskList partition data #6591

Merged
merged 1 commit into from
Jan 8, 2025

Conversation

natemort
Copy link
Member

@natemort natemort commented Jan 3, 2025

What changed?
Replace num_read_partitions and num_write_partitions with an explicit map of partition ids to partition configuration. This enables assigning isolation groups to partitions in the future.

This change is backwards compatible as it populates both values when writing data to/from persistence and when mapping to/from thrift/proto. When draining partitions we continue to maintain a contiguous block of partition ids.

Once this change has been deployed broadly we can remove the fields from IDL. We could also consider removing the fields from persistence in the future as well.

Why?

  • To support assigning isolation groups to partitions in the future

How did you test it?

  • Unit / Integration tests

Potential risks

  • Bugs in this functionality could result in partition autoscaling not working correctly or potentially impact partitioned task lists.

Detailed Description

  • Changed the in-memory representation of a TaskListConfig from the number of partitions to the map[int]TaskListPartition.
  • Updated the proto/thrift mappers to convert the partition numbers to default values (all isolation groups in each partition) any time the numeric values don't match the maps.
  • Updated the persistence logic to convert the partition numbers to default values (all isolation groups in each partition) any time the numeric values don't match the maps.
  • Updated AutoScaler to use DescribeTaskList and operate in terms of partitions rather than just numbers of partitions.
  • Updated logic depending on the numeric values to depend on the length of the maps. Subsequent PRs will actually use the values and populate the IsolationGroups.

Impact Analysis

  • Backward Compatibility: Since we're writing both fields we maintain compatibility with existing data and clients.
  • Forward Compatibility: This new schema is flexible enough that we should be able to freely change it.

Testing Plan

  • Unit Tests: Fully covered by unit tests, including old data
  • Persistence Tests: Persistence tests round trip the data and
  • Integration Tests: Partially covered by integration tests by virtue of partitioned task lists.
  • Compatibility Tests: No explicit compatibility tests

Rollout Plan

  • What is the rollout plan? This can be rolled out in any order. In order to enable actually storing partition data we need to copy it to the DB from the dynamic config.
  • Does the order of deployment matter? Order does not matter
  • Is it safe to rollback? Does the order of rollback matter? This can be safely rolled back as it writes the old fields.
  • Is there a kill switch to mitigate the impact immediately? This isn't behind a flag so rolling back is the only mechanism.

Release notes

Documentation Changes

return result, changed
}

func (a *adaptiveScalerImpl) collectPartitionMetrics(config *types.TaskListPartitionConfig) (*aggregatePartitionMetrics, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might increase the load significantly. Originally, describeTaskList is only called when the number of write partitions is less than the number of read partitions. But after this change, it will be called periodically. Maybe, we can add a metric to track how much extra load is caused by this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Failure case behavior is also not clear to me. If one partition fails to response (e.g. timeout) what do we do?
Maybe we just activate this new logic if the tasklist has isolation groups enabled. This way we can iterate and optimize as needed without impacting normal behavior of adaptive scaler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure case is a no-op, which seems reasonable as a starting point. In the future we can explore more sophisticated approaches such as maintaining more state. I've updated this logic to only perform the RPC to all child partitions when isolation is enabled.

if values == nil {
return nil
}
partitions := values.(map[int]map[string]any)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to avoid future misuses causing panics let's check the result of casting

partitions, ok := values.(map[int]map[string]any)
if !ok { return nil }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +94 to +95
// If they're out of sync, go with the value of num_*_partitions. This is necessary only while support for
// read_partitions and write_partitions rolls out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until read_partitions/write_partitions are written back to DB we will be resetting the partition mappings. I guess it doesn't matter for now as isolation group feature is disabled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that should be fine for now. Once we start populating the mapping then they'll be persisted and we can eventually even remove this logic.

@@ -654,3 +630,69 @@ func lockTaskList(ctx context.Context, tx sqlplugin.Tx, shardID int, domainID se
func stickyTaskListExpiry() time.Time {
return time.Now().Add(stickyTasksListsTTL)
}

func toSerializationTaskListPartitionConfig(c *persistence.TaskListPartitionConfig) *serialization.TaskListPartitionConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for moving these to helper funcs

if e != nil {
a.logger.Warn("failed to get partition metrics", tag.WorkflowTaskListName(a.taskListID.GetPartition(partitionID)), tag.Error(e))
}
if result != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if e is not nil we shouldn't care about result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return result, changed
}

func (a *adaptiveScalerImpl) collectPartitionMetrics(config *types.TaskListPartitionConfig) (*aggregatePartitionMetrics, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Failure case behavior is also not clear to me. If one partition fails to response (e.g. timeout) what do we do?
Maybe we just activate this new logic if the tasklist has isolation groups enabled. This way we can iterate and optimize as needed without impacting normal behavior of adaptive scaler

@@ -240,7 +240,7 @@ func NewManager(
partitionConfig := tlMgr.TaskListPartitionConfig()
r := 1
if partitionConfig != nil {
r = int(partitionConfig.NumReadPartitions)
r = len(partitionConfig.ReadPartitions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This len() logic is reused at multiple places, let's consider creating a helper method.

taskListType := types.TaskListTypeDecision.Ptr()
if c.taskListID.GetType() == persistence.TaskListTypeActivity {
taskListType = types.TaskListTypeActivity.Ptr()
}
// TODO: Do we want to notify partitions that were removed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to notify partitions that were removed from write

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Replace num_read_partitions and num_write_partitions with an explicit map of partition ids to partition configuration. This enables assigning isolation groups to partitions in the future.

This change is backwards compatible as it populates both values when writing data to Cassandra and when returning it via the API. When draining partitions we continue to maintain a contiguous block of partition ids.
@natemort natemort merged commit eebf656 into cadence-workflow:master Jan 8, 2025
14 of 18 checks passed
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Jan 15, 2025
Shaddoll added a commit to Shaddoll/cadence that referenced this pull request Jan 15, 2025
Shaddoll added a commit that referenced this pull request Jan 15, 2025
natemort added a commit to natemort/cadence that referenced this pull request Jan 24, 2025
…" (cadence-workflow#6625)

Address issues in GRPC -> types mapper and add additional tets.

This reverts commit bf9f526.
natemort added a commit to natemort/cadence that referenced this pull request Jan 28, 2025
…" (cadence-workflow#6625)

Address issues in GRPC -> types mapper and add additional tets.

Additionally address issues in serialization <-> sqlblobs mapper and add tests.

This reverts commit bf9f526.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants