[yugabyte/yugabyte-db#26107] Parellel streaming changes #172
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the changes to stream changes in parallel using multiple tasks for a table given the user provides the hash_code ranges for it to stream. The following changes have been introduced in this PR:
a.
streaming.mode
: This values takes the input asdefault
orparallel
which is then used to decide whether or not parallel streaming mode is supposed to be used.b.
slot.names
: A list of comma separated values for all the slot names which should be used by each task.c.
publication.names
: A list of comma separated values for all the publication names which should be used by each task.d.
slot.ranges
: A list of semi-colon separated values for slot ranges in the formata,b;b,c;c,d
.YBValidate
have been introduced:a. To validate that the complete hash range is provided by the user and nothing is missing.
b. To validate that the number of slot names is equal to the publication names as well as the number of slot ranges.
c. To ensure that there's only one table provided in the
table.include.list
as parallel streaming will not work with multiple tables.streaming.mode
parallel.a. This will require providing the hash part of the primary key columns to the configuration parameter
primary.key.hash.columns
.PostgresPartition
object will now also use the slot name to uniquely identify the source partition.Usage example
If the connector configuration contains the following properties:
then we will have 2 tasks created:
task 0
:slot=rs1 publication=pb1 hash_range=0,32768
task 1
:slot=rs2 publication=pb2 hash_range=32768,65536
Note:
It is currently the user's responsibility to provide full hash ranges and maintain the order given in the configs for
slot.names
,publication.names
andslot.ranges
as the values will be picked sequentially and divided into tasks. Thus, in order to ensure that the task with a slot gets the same hash_range every time, the user needs to be careful with the order provided.This closes yugabyte/yugabyte-db#26107.