Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DBZ-PGYB] Support for consistent snapshot for an existing slot #113

Merged
merged 5 commits into from
Apr 30, 2024

Conversation

asrinivasanyb
Copy link

@asrinivasanyb asrinivasanyb commented Apr 24, 2024

Summary
This PR is to support consistent snapshot in the case of an existing slot.

In this case, the consistent_point hybrid time is determined from the pg_replication_slots view, specifically from the yb_restart_commit_ht column.

There is an assumption here that this slot has not been used for streaming till this point. If this holds, then the history retention barrier will be in place as of the consistent snapshot time (consistent_point). The snapshot query will be run as of the consistent_point and subsequent streaming will start from the consistent_point of the slot.

Test Plan
Added new test
mvn -Dtest=PostgresConnectorIT#initialSnapshotWithExistingSlot test

Copy link
Collaborator

@vaibhav-yb vaibhav-yb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm my understanding after this PR - if a user creates a replication slot and starts the connector with snapshot.mode=never and after streaming if the user restarts the connector in initial mode, he will only get the records snapshotted which would be there if the snapshot was taken in the first place itself.

And in this case, what would happen once the snapshot is completed? Will the records be streamed from the point they were streamed before connector restart?

@asrinivasanyb
Copy link
Author

asrinivasanyb commented Apr 25, 2024

Just to confirm my understanding after this PR - if a user creates a replication slot and starts the connector with snapshot.mode=never and after streaming if the user restarts the connector in initial mode, he will only get the records snapshotted which would be there if the snapshot was taken in the first place itself.

And in this case, what would happen once the snapshot is completed? Will the records be streamed from the point they were streamed before connector restart?

The OffsetState in Kafka would indicate that Streaming has started. So, even when user restarts the connector in INITIAL mode, the streaming will continue from the LSN corresponding to the offsetState stored in Kafka

In the case of the newly added test - initialSnapshotWithExistingSlot - I was relying on the fact that the OffsetState would not be saved in Kafka as I am stopping the connector almost immediately after starting it. In this case, as there is no previous offset in Kafka, a snapshot would be taken and the streaming will start from the consistent_point

@asrinivasanyb asrinivasanyb merged commit c8e3fa4 into ybdb-debezium-2.5.2 Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants