-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
owner: fix a bug which lead to replication stopped and no error report (#1814) #1828
owner: fix a bug which lead to replication stopped and no error report (#1814) #1828
Conversation
/lgtm |
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by writing |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: ce58ce2
|
@ti-chi-bot: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Codecov Report
@@ Coverage Diff @@
## release-5.0 #1828 +/- ##
================================================
Coverage ? 53.9316%
================================================
Files ? 153
Lines ? 16100
Branches ? 0
================================================
Hits ? 8683
Misses ? 6496
Partials ? 921 |
This is an automated cherry-pick of #1814
Bug phenomenon
The resolved TS and checkpoint TS of a change feed is stopped, but no error reported
How to confirm the bug
We can check the owner log, and find logs like this:
Trigger conditions
A capture is offline when some table is moving by the owner.
Versions have this bug
[4.0.0, 4.0.13], 5.0.0-rc, [5.0.0, 5.0.1]
Bug mechanism
when the owner moves a table, the owner tries to remove the table from source capture and add the table to target capture.
if the target capture is offline at the same time, the owner should add this table to orphan tables and dispatch this table in the next tick.
but the owner forget to remove the invalid move table job, so the owner will add the table to orphan table every tick, leads to the other logic does not work properly
Check List
Tests
Release note