Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

chunaiarun · 2024-08-13T12:19:29Z

Describe the bug
32 consumers started to process 7 million messages from 32 partitions ( from the kafka queue)
We are doing manual async commit

To Reproduce

Have 7 million messages produced and segregated equally into 32 partitions
Configure consumer code to do "manual async commit"
then start 32 consumers at the same time
we saw below timeout error in almost all consumer logs.

Expected behavior
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #1)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #2)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #3)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #4)")"
"(4i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out 66792 in-flight, 0 retry-queued, 95763 out-queue, 0 partially-sent requests")"
"(3i;"FAIL";"[thrd:GroupCoordinator]: GroupCoordinator: :443: 162555 request(s) timed out

Looks like internally the library retries to commit and does it successfully, and I see the LAG goes down fine, 'but how to avoid this... What is the max retry ms it does ? what is the retry related librdkafka configuration (https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md)
to be on the safer side should I increase/adjust any configuration

FYI
(fetch.wait.max.ms;10);
(statistics.interval.ms;10000);
(enable.auto.commit;false);
(enable.auto.offset.store;false);
(message.max.bytes;1000000000)
);

.kfk.CommitOffsets[x[client];x[topic];((enlist x[partition])!(enlist 1+x[offset]));1b];] We are using async commit here

Desktop (please complete the following information):

OS: Linux
12 CPUs

The text was updated successfully, but these errors were encountered:

chunaiarun · 2024-08-13T14:55:11Z

Correction on what I said above on the retry commit point "Looks like internally the library retries to commit and does it successfully". What I observe as per log is, all the async commit retries have failed... since we are processing messages sequentially from each partition, even if we see this error, the subsequent async commits are committing the offset and it is moving ahead

sshanks-kx · 2024-08-15T10:36:15Z

This doc may be of relevance - https://github.com/confluentinc/librdkafka/wiki/FAQ ( section: https://github.com/confluentinc/librdkafka/wiki/FAQ#why-committing-each-message-is-slow ). If you pay for Kafka support (e.g. via Confluent) they be of additional help in debugging/advising.

chunaiarun · 2024-08-20T04:39:01Z

Thank you very much

chunaiarun · 2024-08-26T12:17:50Z

As per reference doc it suggests to use Message store, but it will work with auto commit true right?

sshanks-kx · 2024-08-26T15:21:45Z

enable.auto.commit=true batch commits messages synchronously on a timer - so the consumer will wait till commits done. As its not doing async (depending on your app/design) it'll have much less messages in-flight while potentially consuming slower.

Ref:
https://docs.confluent.io/platform/current/clients/consumer.html#offset-management
https://medium.com/@rramiz.rraza/kafka-programming-different-ways-to-commit-offsets-7bcd179b225a
https://medium.com/apache-kafka-from-zero-to-hero/apache-kafka-guide-36-consumer-offset-commit-strategies-41ef6bf34fcd

sshanks-kx · 2024-09-13T15:09:26Z

Please reopen if theres an issue. Thanks

sshanks-kx closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

chunaiarun commented Aug 13, 2024

chunaiarun commented Aug 13, 2024

sshanks-kx commented Aug 15, 2024

chunaiarun commented Aug 20, 2024

chunaiarun commented Aug 26, 2024

sshanks-kx commented Aug 26, 2024

sshanks-kx commented Sep 13, 2024

Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118

Comments

chunaiarun commented Aug 13, 2024

chunaiarun commented Aug 13, 2024

sshanks-kx commented Aug 15, 2024

chunaiarun commented Aug 20, 2024

chunaiarun commented Aug 26, 2024

sshanks-kx commented Aug 26, 2024

sshanks-kx commented Sep 13, 2024