-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC panic: sinkManager: sink upperbound should not less than checkpoint ts #10613
Comments
/severity major |
I try and can not reproduce this issue and it is hard to find the root cause by investigate the code, still need more efforts to solve it. |
cdc panic due to tableUpperbound less that the checkpoint. From the log, we found it only happens when starting a table, and the table is just finishing a two phase schedule: # Phase 1
[2024/03/07 14:11:37.254 +08:00] [INFO] [table.go:267] ["schedulerv3: table found new task"] [namespace=default] [changefeed=s3-sink] [tableSpan="\"{table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}\""] [task="{\"Span\":\"{table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}\",\"Checkpoint\":{\"checkpoint_ts\":448211686913474596,\"resolved_ts\":448211687070761065,\"last_synced_ts\":448192142990639116},\"IsRemove\":false,\"IsPrepare\":true,\"Epoch\":{}}"]
[2024/03/07 14:11:37.254 +08:00] [INFO] [manager.go:859] ["Add table sink"] [namespace=default] [changefeed=s3-sink] [span={table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}] [startTs=448211686913474596] [version=3]
[2024/03/07 14:11:37.345 +08:00] [INFO] [table.go:200] ["schedulerv3: table is prepared"] [namespace=default] [changefeed=s3-sink] [tableSpan="\"{table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}\""] [state=Prepared]
# Pahse 2
[2024/03/07 14:11:37.655 +08:00] [INFO] [table.go:267] ["schedulerv3: table found new task"] [namespace=default] [changefeed=s3-sink] [tableSpan="\"{table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}\""] [task="{\"Span\":\"{table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}\",\"Checkpoint\":{\"checkpoint_ts\":448211687057915916,\"resolved_ts\":448211687136297053,\"last_synced_ts\":448192142990639116},\"IsRemove\":false,\"IsPrepare\":false,\"Epoch\":{}}"]
[2024/03/07 14:11:37.655 +08:00] [INFO] [manager.go:869] ["Start table sink"] [namespace=default] [changefeed=s3-sink] [span={table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}] [startTs=448211687057915916]
[2024/03/07 14:11:37.655 +08:00] [INFO] [table_sink_wrapper.go:153] ["Sink is started"] [namespace=default] [changefeed=s3-sink] [span={table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}] [startTs=448211687057915916] [replicateTs=448211687214940178]
# Panic after staring table
[2024/03/07 14:11:37.655 +08:00] [PANIC] [manager.go:1038] ["sinkManager: sink upperbound should not less than checkpoint ts"] [namespace=default] [changefeed=s3-sink] [span={table_id:39731,start_key:748000000000009bff335f720000000000fa,end_key:748000000000009bff335f730000000000fa}] [upperbound=448211686913474596] [checkpointTs="{\"Mode\":0,\"Ts\":448211687057915916,\"BatchID\":18446744073709551615}"] From start function, receivedSorterResolvedTs must be updated to startTS after a table started. So the smaller value, which was returned by getUpperBoundTs, must equal to barrierTs.
|
What did you do?
This panic is seen when there are 3 kafka changefeed running to sync 4000+ tables
What did you expect to see?
CDC no panic
What did you see instead?
cdc panic seen.
Versions of the cluster
[release-version=v8.0.0-alpha] [git-hash=ed54e785188538740c69a74cdca9f4ae258bed06] [git-branch=heads/refs/tags/v8.0.0-alpha] [utc-build-time="2024-02-07 11:39:36"]
The text was updated successfully, but these errors were encountered: