Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBZ-PGYB] Implement parallel snapshotting using multiple tasks #24200

Closed
yugabyte-ci opened this issue Sep 30, 2024 · 0 comments
Closed

[DBZ-PGYB] Implement parallel snapshotting using multiple tasks #24200

yugabyte-ci opened this issue Sep 30, 2024 · 0 comments
Assignees
Labels
area/cdcsdk CDC SDK jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority

Comments

@yugabyte-ci
Copy link
Contributor

yugabyte-ci commented Sep 30, 2024

Jira Link: DB-13087

@yugabyte-ci yugabyte-ci added area/cdcsdk CDC SDK jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority labels Sep 30, 2024
vaibhav-yb added a commit to yugabyte/debezium that referenced this issue Oct 17, 2024
## Problem

For very large tables, the default `SELECT *` query can take a really
long time to complete leading to longer time for snapshots.

## Solution

This PR aims to implement snapshotting the table in parallel using an
inbuilt method `yb_hash_code` to only run the query for a given hash
range. The following 2 configuration properties are introduced with this
PR:
1. A new `snapshot.mode` called `parallel` - this will behave exactly
like `initial_only` but we will have the ability to launch multiple
tasks.
2. `primary.key.hash.columns` - this config takes in a comma separated
values of the primary key hash component of the table.

> **Note:** When `snapshot.mode` is set to `parallel`, we will not
support providing regex in the property `table.include.list` and the
user will need to specify the full name of the table in the property.
Additionally, we will only allow one table in the `table.include.list`
if `snapshot.mode` is `parallel`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cdcsdk CDC SDK jira-originated kind/new-feature This is a request for a completely new feature priority/low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants