-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd worker tick stuck up to 2 minutes periodically when cdc can't connect to Kafka server #11340
Comments
The severity of this issue is set to major because if there are multiple changefeeds in cdc, the delay of other changefeeds will also be affected, resulting in a cyclical lag of about 2 minutes. The root cause of this issue is that when a kafka sink encounters an error, it will retry within the sinkManager, at which point it calls tiflow/cdc/processor/sinkmanager/manager.go Line 262 in 1252979
This function internally will hold m.sinkFactory.Lock() until the function exits.When kafka cannot be connected, this function will block for about 2 minutes, until the underlying call throws kafka: client has run out of available brokers to talk to: dial tcp 10.99.219.92:9092: i/o timeout" before it exits.
Meanwhile, In another goroutine, Processor calls tiflow/cdc/processor/processor.go Line 346 in 1252979
When the downstream of changefeed is kafka, the interior of tiflow/cdc/processor/sinkmanager/manager.go Line 318 in 1252979
This function will try to get m.sinkFactory.Lock() , but since this lock has already been held by initSinkFactory , the Processor Tick will be blocked in needsStuckCheck before it is released.Therefore, a relatively simple solution is to let the needsStuckCheck function no longer try to get m.sinkFactory.Lock() , so no blocking will occur.
|
/found customer |
What did you do?
What did you expect to see?
changefeed stucks but processor will not be stuck.
What did you see instead?
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();
in a MySQL client):(paste TiDB cluster version here)
Upstream TiKV version (execute
tikv-server --version
):(paste TiKV version here)
TiCDC version (execute
cdc version
):master
The text was updated successfully, but these errors were encountered: