-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer stuck after coordinator goes down #3082
Comments
@edenhill Any idea what can be going on here ? |
Hello @edenhill |
Hm, this looks similar to #2944 -- there I also reported Not Coordinator errors on JoinGroup requests after a broker rolling update (which ofc includes the coordinator going down) |
The upcoming v1.6.0 release has a lot of consumer fixes and I believe this is fixed. |
Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ
Description
In our case we bring up a consumer with specified configs and attach it to 5 different topics.
Later consumer gets stuck after multiple brokers (especially coordinator) goes down.
Because of this consumer keeps on trying to join a consumer group but it keeps failing.
After few seconds even when all the brokers are up consumer does not function properly.
We have to restart the service to bring back normal execution.
This has happened multiple times since we recently we upgraded to librdkafka 1.4.2
2020-09-17 03:18:47.0105162 [ProcessId: 6108] Message: [thrd:main]: 25.66.192.253:9092/8: Joining group "Consumer.9c4699d2-5364-42c6-857f-e266d1133f75" with 5 subscribed topic(s)
2020-09-17 03:18:47.0307669 [ProcessId: 6108] Message: [thrd:main]: GroupCoordinator/8: JoinGroupRequest failed: Broker: Not coordinator: actions Refresh
Link to full consumer logs : client log.txt
Link to broker logs: server logs.txt
How to reproduce
Run the consumer as usual --> bring down few brokers and coordinator for few seconds and bring them up again.
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
1.4.2
2.4
debug=all enable.auto.commit=false auto.offset.reset=beginning enable.partition.eof=true fetch.wait.max.ms=10 fetch.error.backoff.ms=10 statistics.interval.ms=600000
windows 10
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: