Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DBZ-PGYB][yugabyte/yugabyte-db#24555] Add task ID to PostgresPartiti…
…on (#163) ## Problem With the introduction of the parallel snapshot model, we can have multiple tasks when the snapshot mode is set to `parallel`. This introduces a problem at the underlying layer when the connector stores the sourceInfo for its partitions i.e. `PostgresPartition` objects in Kafka. The `PostgresPartition` is identified by a map which has a structure `{"server", topicPrefix}` - currently this is the same for all the `PostgresPartition` objects which are created by the tasks when `snapshot.mode` is `parallel` and hence they all end up referring to the same source partition in the Kafka topic. Subsequently, what happens is that (assume that we have 2 tasks i.e. 0 and 1): 1. One task (task_0) completes the snapshot while the other is yet to start. a. After completion, `task_0` updates the `sourceInfo` saying that its snapshot is completed. 2. When task_1 starts up, it reads the same `sourceInfo` object and concludes that the snapshot is completed so it skips its snapshot. The above situation will cause a data loss since task_1 will never actually take a snapshot. ## Solution This PR implements a short term solution where we simply add the task ID to the partition so that each `PostgresPartition` can identity a sourcePartition uniquely, the identifying map will now become `{"server", topicPrefix_taskId}`. **Note:** This solution is a quick fix for the problem given that the number of tasks in the connector remain the same. This partially fixes yugabyte/yugabyte-db#24555
- Loading branch information