[yugabyte/yugabyte-db#26107] Parellel streaming changes #172

vaibhav-yb · 2025-02-17T04:42:51Z

This PR introduces the changes to stream changes in parallel using multiple tasks for a table given the user provides the hash_code ranges for it to stream. The following changes have been introduced in this PR:

New configurations:
a. streaming.mode: This values takes the input as default or parallel which is then used to decide whether or not parallel streaming mode is supposed to be used.
b. slot.names: A list of comma separated values for all the slot names which should be used by each task.
c. publication.names: A list of comma separated values for all the publication names which should be used by each task.
d. slot.ranges: A list of semi-colon separated values for slot ranges in the format a,b;b,c;c,d.
Validations in the class YBValidate have been introduced:
a. To validate that the complete hash range is provided by the user and nothing is missing.
b. To validate that the number of slot names is equal to the publication names as well as the number of slot ranges.
c. To ensure that there's only one table provided in the table.include.list as parallel streaming will not work with multiple tables.
Support for snapshot with streaming.mode parallel.
a. This will require providing the hash part of the primary key columns to the configuration parameter primary.key.hash.columns.
The PostgresPartition object will now also use the slot name to uniquely identify the source partition.

Usage example

If the connector configuration contains the following properties:

{
  ...
  "streaming.mode":"parallel",
  "slot.names":"rs1,rs1",
  "publication.names":"pb1,pb2",
  "slot.ranges":"0,32768;32768,65536"
  ...
}

then we will have 2 tasks created:

task 0: slot=rs1 publication=pb1 hash_range=0,32768
task 1: slot=rs2 publication=pb2 hash_range=32768,65536

Note:

It is currently the user's responsibility to provide full hash ranges and maintain the order given in the configs for slot.names, publication.names and slot.ranges as the values will be picked sequentially and divided into tasks. Thus, in order to ensure that the task with a slot gets the same hash_range every time, the user needs to be careful with the order provided.

This closes yugabyte/yugabyte-db#26107.

suranjan · 2025-02-17T11:35:54Z

d. slot.ranges: A list of semi-colon separated values for slot ranges in the format a,b;b,c;c,d.
Lets just call it ranges, in case of hashrange, it will be hash and in case of range sharding, it can be range column values

...nnector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresConnectorConfig.java

vaibhav-yb added 2 commits February 17, 2025 10:12

parellel streaming changes

33823cd

added validation

d87ede8

vaibhav-yb requested review from suranjan and Sumukh-Phalgaonkar February 17, 2025 11:28

suranjan requested changes Feb 17, 2025

View reviewed changes

...nnector-postgres/src/main/java/io/debezium/connector/postgresql/PostgresConnectorConfig.java Outdated Show resolved Hide resolved

addressed review comments

784bf07

suranjan approved these changes Feb 17, 2025

View reviewed changes

vaibhav-yb changed the title ~~[wip] parellel streaming changes~~ [yugabyte/yugabyte-db#26107] parellel streaming changes Feb 19, 2025

vaibhav-yb added 7 commits February 19, 2025 17:20

merge parallel snapshot logic with the streaming one

e5258fa

renamed ranges to slot.ranges

4de739e

changes for isolation level

d632900

resolved merge conflicts

25ee8d6

resolved merge conflicts

788530b

fixed validation errors

70d4f1d

removed null check as it is not needed

c58e3b1

vaibhav-yb changed the title ~~[yugabyte/yugabyte-db#26107] parellel streaming changes~~ [yugabyte/yugabyte-db#26107] Parellel streaming changes Feb 25, 2025

vaibhav-yb merged commit 13c3b13 into yugabyte:ybdb-debezium-2.5.2 Feb 25, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[yugabyte/yugabyte-db#26107] Parellel streaming changes #172

[yugabyte/yugabyte-db#26107] Parellel streaming changes #172

vaibhav-yb commented Feb 17, 2025 •

edited

Loading

suranjan commented Feb 17, 2025

[yugabyte/yugabyte-db#26107] Parellel streaming changes #172

[yugabyte/yugabyte-db#26107] Parellel streaming changes #172

Conversation

vaibhav-yb commented Feb 17, 2025 • edited Loading

Usage example

Note:

suranjan commented Feb 17, 2025

vaibhav-yb commented Feb 17, 2025 •

edited

Loading