Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

Closed
chunaiarun opened this issue Aug 13, 2024 · 6 comments
Closed

Comments

@chunaiarun
Copy link

Describe the bug
32 consumers started to process 7 million messages from 32 partitions ( from the kafka queue)
We are doing manual async commit

To Reproduce

  • Have 7 million messages produced and segregated equally into 32 partitions
  • Configure consumer code to do "manual async commit"
  • then start 32 consumers at the same time
  • we saw below timeout error in almost all consumer logs.

Expected behavior
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #1)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #2)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #3)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #4)")"
"(4i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out 66792 in-flight, 0 retry-queued, 95763 out-queue, 0 partially-sent requests")"
"(3i;"FAIL";"[thrd:GroupCoordinator]: GroupCoordinator: :443: 162555 request(s) timed out

Looks like internally the library retries to commit and does it successfully, and I see the LAG goes down fine, 'but how to avoid this... What is the max retry ms it does ? what is the retry related librdkafka configuration (https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md)
to be on the safer side should I increase/adjust any configuration

FYI
(fetch.wait.max.ms;10);
(statistics.interval.ms;10000);
(enable.auto.commit;false);
(enable.auto.offset.store;false);
(message.max.bytes;1000000000)
);

.kfk.CommitOffsets[x[client];x[topic];((enlist x[partition])!(enlist 1+x[offset]));1b];] We are using async commit here

Desktop (please complete the following information):

  • OS: Linux
  • 12 CPUs
@chunaiarun
Copy link
Author

Correction on what I said above on the retry commit point "Looks like internally the library retries to commit and does it successfully". What I observe as per log is, all the async commit retries have failed... since we are processing messages sequentially from each partition, even if we see this error, the subsequent async commits are committing the offset and it is moving ahead

@sshanks-kx
Copy link
Contributor

This doc may be of relevance - https://github.com/confluentinc/librdkafka/wiki/FAQ ( section: https://github.com/confluentinc/librdkafka/wiki/FAQ#why-committing-each-message-is-slow ). If you pay for Kafka support (e.g. via Confluent) they be of additional help in debugging/advising.

@chunaiarun
Copy link
Author

Thank you very much

@chunaiarun
Copy link
Author

As per reference doc it suggests to use Message store, but it will work with auto commit true right?

@sshanks-kx
Copy link
Contributor

enable.auto.commit=true batch commits messages synchronously on a timer - so the consumer will wait till commits done. As its not doing async (depending on your app/design) it'll have much less messages in-flight while potentially consuming slower.

Ref:
https://docs.confluent.io/platform/current/clients/consumer.html#offset-management
https://medium.com/@rramiz.rraza/kafka-programming-different-ways-to-commit-offsets-7bcd179b225a
https://medium.com/apache-kafka-from-zero-to-hero/apache-kafka-guide-36-consumer-offset-commit-strategies-41ef6bf34fcd

@sshanks-kx
Copy link
Contributor

Please reopen if theres an issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants