Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consumer stuck after coordinator goes down #3082

Closed
7 tasks done
adinigam opened this issue Sep 21, 2020 · 4 comments
Closed
7 tasks done

Consumer stuck after coordinator goes down #3082

adinigam opened this issue Sep 21, 2020 · 4 comments

Comments

@adinigam
Copy link
Contributor

adinigam commented Sep 21, 2020

Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ

Description

In our case we bring up a consumer with specified configs and attach it to 5 different topics.
Later consumer gets stuck after multiple brokers (especially coordinator) goes down.
Because of this consumer keeps on trying to join a consumer group but it keeps failing.
After few seconds even when all the brokers are up consumer does not function properly.
We have to restart the service to bring back normal execution.
This has happened multiple times since we recently we upgraded to librdkafka 1.4.2

2020-09-17 03:18:47.0105162 [ProcessId: 6108] Message: [thrd:main]: 25.66.192.253:9092/8: Joining group "Consumer.9c4699d2-5364-42c6-857f-e266d1133f75" with 5 subscribed topic(s)

2020-09-17 03:18:47.0307669 [ProcessId: 6108] Message: [thrd:main]: GroupCoordinator/8: JoinGroupRequest failed: Broker: Not coordinator: actions Refresh

Link to full consumer logs : client log.txt
Link to broker logs: server logs.txt

How to reproduce

Run the consumer as usual --> bring down few brokers and coordinator for few seconds and bring them up again.

IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): 1.4.2
  • Apache Kafka version: 2.4
  • librdkafka client configuration: debug=all enable.auto.commit=false auto.offset.reset=beginning enable.partition.eof=true fetch.wait.max.ms=10 fetch.error.backoff.ms=10 statistics.interval.ms=600000
  • Operating system: windows 10
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@adinigam
Copy link
Contributor Author

@edenhill Any idea what can be going on here ?

@stsojithomas
Copy link

Hello @edenhill
We are facing similar issues, but not very sure if it's the same cause (still trying to figure out the scenarios)
But consumer seems to lose connectivity or subscription to topics, and wouldn't recover until a restart.
This didn't happen in the past but with v 1.4.

@dimpavloff
Copy link

Hm, this looks similar to #2944 -- there I also reported Not Coordinator errors on JoinGroup requests after a broker rolling update (which ofc includes the coordinator going down)

@edenhill
Copy link
Contributor

The upcoming v1.6.0 release has a lot of consumer fixes and I believe this is fixed.
Reopen if it still occurs on v1.6.0 (try v1.6.0-RC1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants