-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeed get stuck ...in test case "cdc_scale_sync" #4464
Comments
Reproduce again in v5.1.4. |
Reproduced this issue in v6.0. |
It should be fixed by tikv/tikv#12262 let's keep it open and wait for another round of testing. |
We have done the following analysis of the issue:
// flushSink emits all rows in rowBuffer to the backend sink and flushes
// the backend sink.
func (n *sinkNode) flushSink(ctx context.Context, resolvedTs model.Ts) (err error) {
defer func() {
if err != nil {
n.status.Store(TableStatusStopped)
return
}
if n.checkpointTs >= n.targetTs {
err = n.stop(ctx)
}
}()
if resolvedTs > n.barrierTs {
resolvedTs = n.barrierTs
}
if resolvedTs > n.targetTs {
resolvedTs = n.targetTs
}
if resolvedTs <= n.checkpointTs {
// always return
log.Warn("resolvedTs is less than or equal to checkpointTs",
zap.Int64("tableID", n.tableID),
zap.Uint64("resolvedTs", resolvedTs),
zap.Uint64("checkpointTs", n.checkpointTs))
return nil
} How did we locate to the whole problem?
tiup cluster deploy issue-4464 nightly topology.yaml -p (3 ticdc cluster)
tiup cluster start issue-4464
create database test;(in mysql)
tiup cdc:nightly cli changefeed create --sink-uri="mysql://root:[email protected]:3306/" -c "test" --tz "Asia/Shanghai"
tiup cdc:nightly cli changefeed pause -c test
go-tpc tpcc -H 127.0.0.1 -P 4000 --warehouses 100 prepare -T 10
tiup cdc:nightly cli changefeed resume -c test
go-tpc tpcc -H 127.0.0.1 -P 4000 --warehouses 100 run -T 10
curl -X POST http://127.0.0.1:8300/capture/owner/move_table -d 'cf-id=test&target-cp-id=1f1c489f-c006-4f56-a50d-d509c4d140de&table-id=69'
curl -X POST http://127.0.0.1:8300/capture/owner/move_table -d 'cf-id=test&target-cp-id=81ce066d-5b78-44de-aa0c-ac596c3c83fd&table-id=69'
curl -X POST http://127.0.0.1:8300/capture/owner/move_table -d 'cf-id=test&target-cp-id=1f1c489f-c006-4f56-a50d-d509c4d140de&table-id=69'
Related questions that are not yet clear:
Thank you very much @liuzix and @overvenus for helping me to solve this problem together, without them I believe I couldn't have figured out this problem. |
Reproduced this issue in v5.2.4
|
Reproduced in v5.4.1 and v6.1.0-nightly 2022-05-05 |
It happened again because it was affected by #5196 and I will continue to investigate it. |
@hi-rustin
In my view, the reason seems that when a table is being scheduled, the |
It is indeed affected by this PR:
For now we just clean up it in the bufferSink, and the bufferSink is about to be deleted, so we don't need to add the drawback back in. Test result: https://tcms.pingcap.net/dashboard/executions/plan/786269 Thanks @asddongmen for discussing this with me! |
/label affects-5.4 |
What did you do?
What did you expect to see?
No response
What did you see instead?
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();
in a MySQL client):main
TiCDC version (execute
cdc version
):[release-version=v5.5.0-nightly] [git-hash=7a227b421dbfcdafee02148e787138798edadf31] [git-branch=heads/refs/tags/v5.5.0-nightly] [utc-build-time="2022-01-24 18:10:05"] [go-version="go version go1.16.4 linux/amd64"] [failpoint-build=false]
The text was updated successfully, but these errors were encountered: