Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mysql changefeed blocked by abnormal kafka changefeed #4241

Closed
Tammyxia opened this issue Jan 6, 2022 · 10 comments · Fixed by #5186
Closed

mysql changefeed blocked by abnormal kafka changefeed #4241

Tammyxia opened this issue Jan 6, 2022 · 10 comments · Fixed by #5186
Assignees
Labels
affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 area/ticdc Issues or PRs related to TiCDC. severity/minor type/bug The issue is confirmed as a bug.

Comments

@Tammyxia
Copy link

Tammyxia commented Jan 6, 2022

What did you do?

  • 1 TiCDC, 1 broker, 1 mysql; 3 kafka changefeeds, 1 mysql changefeed
  • At first, all changefeeds work normally
  • When kafka service stopped, the 3 kafka changefeeds don't sync data as expected, but the checkpoint of mysql changefeed also will not move on.
  • [
    {
    "id": "kafka-task-1",
    "summary": {
    "state": "normal",
    "tso": 430296185518948355,
    "checkpoint": "2022-01-06 14:14:42.308",
    "error": null
    }
    },
    {
    "id": "kafka-task-3",
    "summary": {
    "state": "normal",
    "tso": 430296185518948355,
    "checkpoint": "2022-01-06 14:14:42.308",
    "error": null
    }
    },
    {
    "id": "kafka-task-4",
    "summary": {
    "state": "normal",
    "tso": 430296185518948355,
    "checkpoint": "2022-01-06 14:14:42.308",
    "error": null
    }
    },
    {
    "id": "mysql-task-1",
    "summary": {
    "state": "normal",
    "tso": 430296201260695559,
    "checkpoint": "2022-01-06 14:15:42.358",
    "error": null
    }
    }
    ]
    [root@CentOS76_VM log]# date
    Thu Jan 6 14:42:04 CST 2022

What did you expect to see?

mysql changefeed still can works normally

What did you see instead?

cdc log has expected ERROR: [ERROR] [changefeed.go:118] ["an error occurred in Owner"] [changefeedID=kafka-task-3] [error="[CDC:ErrKafkaNewSaramaProducer]new sarama producer: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)"]
WARN: [WARN] [client.go:226] ["etcd client outCh blocking too long, the etcdWorker may be stuck"] [duration=20m39.000245874s]

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

5.4.0

TiCDC version (execute cdc version):

5.4.0
@Tammyxia Tammyxia added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. labels Jan 6, 2022
@Tammyxia
Copy link
Author

Tammyxia commented Jan 6, 2022

Another thing is, in this situation, the open files count continuous increasing:
image

@3AceShowHand 3AceShowHand self-assigned this Jan 6, 2022
@3AceShowHand
Copy link
Contributor

3AceShowHand commented Jan 6, 2022

when the broker out of line, cdc try to close the producer.
Screen Shot 2022-01-06 at 7 01 27 PM

As shown in the picture above, close the syncClient is quite time-consuming.

This would cause the owner block for about 1 minutes.

@3AceShowHand
Copy link
Contributor

maybe we should close the sync in a asynchronous way?

@3AceShowHand
Copy link
Contributor

this is constraint by
image

@3AceShowHand
Copy link
Contributor

For protocol, which would send checkpoint ts, this will happen.

At most, block the owner for about 1 minutes.

@amyangfei
Copy link
Contributor

when the broker out of line, cdc try to close the producer. Screen Shot 2022-01-06 at 7 01 27 PM

As shown in the picture above, close the syncClient is quite time-consuming.

This would cause the owner block for about 1 minutes.

Good catch. If there are multiple Kafka changefeeds and close them one by one, the time cost will be larger.

@3AceShowHand
Copy link
Contributor

This problem will cause the owner and processor blocked for a period of time, but only happen when all Kafka brokers were shut down. In a real-world production Kafka cluster, it would be rare that all Kafka brokers were shut down at the same time.

To solve this problem:

@3AceShowHand
Copy link
Contributor

after #4359 merged, blocking won't last too long, should no more than 2min, so that change the severity to minor.

@nongfushanquan
Copy link
Contributor

//label affects-5.3

@nongfushanquan
Copy link
Contributor

/label affects-5.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 area/ticdc Issues or PRs related to TiCDC. severity/minor type/bug The issue is confirmed as a bug.
Projects
None yet
7 participants