-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnect logs from kafka client #1089
Comments
Looking at the timestamp it seems that's the broker's idle connection reaper kicking in. Set log.connection.close=false to suppress these error messages. |
Hi @edenhill We had this issue happen today, luckily in our dev environment when our Kafka broker went down. However, it was NOT harmless at all. In fact, every instance in our datacenter pretty much DoS'd itself. When we spun up a new instance, we saw:
Those lines repeated a few hundred times a second. Its as if the errors from omkafka initiated another attempt and an endless loop began. All instances' connection tables filled up and queued forever. Our rsyslog config looks like:
Is there anything obvious you see that we did wrong in this configuration?? |
Also, i'm not sure if this is the right place for this, but this issue is the closest i've seen to what we experienced today. |
Reference: rsyslog/rsyslog#1464 |
@djenriquez attempting to produce new messages when the broker is down will not directly try to initiate a new connection, that's handled by its own retry interval ( Are you using multiple rd_kafka_t producer instances, or are all those logs from the same producer instance? |
Hi @edenhill, that makes sense. All the logs that are causing the instance to experience this behavior originate from the producer instance itself. We did some more investigation and we're pretty sure that new logs aren't generated, omkafka looks to just be retrying immediately and indefinitely when it receives dropped connections from the ELB that it's trying to connect through to get to Kafka. When we change the scenario such that it gets a reject/refuse rather than a drop, it behaves how we expect, it tries once then backs off and doesn't try again for a few minutes, maybe once per new log received. We can simulate this drop by changing the port the ELB is listening on. Also when DNS can't be resolved, it looks to try once per attempt and that's it. It's only when connections are dropped from the upstream that this behavior ensues, but when it does, it's destructive. It's very easy to reproduce:
|
I repro'd the problem again today with more diagnostic tools and dumped out the conntrack table. To my surprise, it held tens of thousands of these:
The conntrack table seems to fill up because every immediate retry from rsyslog instantiates a new DNS lookup. I implemented stateless rules in iptables to NOTRACK these, and the connection table is no longer filling up. HOWEVER, this is purely a mitigation practice. The problem still persists that rsyslog-omkafka continues to endlessly retry immediately. |
Yea....so Any ideas what I can do to mitigate this? |
I think I know what the issue is, the reconnect jitter delay might be cut short by internal librdkafka signaling. |
Yes! Absolutely! |
@djenriquez Please give latest master a try. |
You might want to increase reconnect.backoff.jitter.ms so it is more evident that a backoff is taking place, maybe 5000ms or so |
@edenhill @djenriquez I'm also using librdkafka with rsyslog in production. I haven't experienced this issue but I may have just not hit the right set of circumstances to trigger it. I'm working on new rsyslog builds for 8.25 for us now so I'm super interested in hearing the result of this fix - as I'd love to incorporate it in my build. Big thanks to both of you for digging into this. |
Hey guys, working on this now. I'm not sure where to actually increase I'll take a look, got a friend @alexunwin helping out 👍 |
That seems to have done the trick! I'm still getting some error logs but it DOES NOT burn the instances to the ground anymore:
As you can see, errors are still reporting, but there are only a handful now every 10 seconds, wheras before there were hundreds. Thanks @alexunwin for helping build everything! |
You can silence the disconnect logs by setting log.connection.close=false |
You will still see the "1/1 brokers are down" log message though. If you register an error_cb you can filter this error out manually. |
Sure, thanks @edenhill |
Description
This are the logs, also check the config parameters, I am using omkafka of rsyslog and facing this issue.
Mar 3 07:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message 2/2 brokers are down [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 07:47:07 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 07:47:07 tbehra-r8-v4 rsyslogd: message repeated 2 times: [ omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]]
Mar 3 07:47:07 tbehra-r8-v4 rsyslogd: omkafka: kafka message 2/2 brokers are down [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message 2/2 brokers are down [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:37:09 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:37:09 tbehra-r8-v4 rsyslogd: omkafka: kafka message localhost:9092/0: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:37:09 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:57:10 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 08:57:10 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 09:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 09:47:03 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 09:47:03 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 09:57:03 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 09:57:03 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:17:01 tbehra-r8-v4 rsyslogd: omkafka: kafka message localhost:9092/0: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:17:04 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:51:19 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:51:19 tbehra-r8-v4 rsyslogd: omkafka: kafka message 2/2 brokers are down [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:52:06 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
Mar 3 10:52:06 tbehra-r8-v4 rsyslogd: omkafka: kafka message kafka.dn-services.internal:9092/bootstrap: Receive failed: Disconnected [v8.25.0 try http://www.rsyslog.com/e/2422 ]
How to reproduce
Checklist
Please provide the following information:
librdkafka version (release number or git tag): 0.9.2
Apache Kafka version: 0.10
librdkafka client configuration:
"socket.keepalive.enable=true",
"queue.buffering.max.messages=1000",
"log.connection.close=true"
Operating system: Ubuntu 14
Using the legacy Consumer
Using the high-level KafkaConsumer
Provide logs (with
debug=..
as necessary) from librdkafkaProvide broker log excerpts
Critical issue
The text was updated successfully, but these errors were encountered: