-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)\")" #118
Comments
Correction on what I said above on the retry commit point "Looks like internally the library retries to commit and does it successfully". What I observe as per log is, all the async commit retries have failed... since we are processing messages sequentially from each partition, even if we see this error, the subsequent async commits are committing the offset and it is moving ahead |
This doc may be of relevance - https://github.com/confluentinc/librdkafka/wiki/FAQ ( section: https://github.com/confluentinc/librdkafka/wiki/FAQ#why-committing-each-message-is-slow ). If you pay for Kafka support (e.g. via Confluent) they be of additional help in debugging/advising. |
Thank you very much |
As per reference doc it suggests to use Message store, but it will work with auto commit true right? |
enable.auto.commit=true batch commits messages synchronously on a timer - so the consumer will wait till commits done. As its not doing async (depending on your app/design) it'll have much less messages in-flight while potentially consuming slower. Ref: |
Please reopen if theres an issue. Thanks |
Describe the bug
32 consumers started to process 7 million messages from 32 partitions ( from the kafka queue)
We are doing manual async commit
To Reproduce
Expected behavior
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #0)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #1)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #2)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #3)")"
"(5i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out OffsetCommitRequest in flight (after 60628ms, timeout #4)")"
"(4i;"REQTMOUT";"[thrd:GroupCoordinator]: GroupCoordinator/2: Timed out 66792 in-flight, 0 retry-queued, 95763 out-queue, 0 partially-sent requests")"
"(3i;"FAIL";"[thrd:GroupCoordinator]: GroupCoordinator: :443: 162555 request(s) timed out
Looks like internally the library retries to commit and does it successfully, and I see the LAG goes down fine, 'but how to avoid this... What is the max retry ms it does ? what is the retry related librdkafka configuration (https://github.com/confluentinc/librdkafka/blob/master/CONFIGURATION.md)
to be on the safer side should I increase/adjust any configuration
FYI
(
fetch.wait.max.ms;
10);(
statistics.interval.ms;
10000);(
enable.auto.commit;
false);(
enable.auto.offset.store;
false);(
message.max.bytes;
1000000000));
.kfk.CommitOffsets[x[
client];x[
topic];((enlist x[partition])!(enlist 1+x[
offset]));1b];] We are using async commit hereDesktop (please complete the following information):
The text was updated successfully, but these errors were encountered: