Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage while consuming #81

Closed
winbatch opened this issue Feb 18, 2014 · 20 comments
Closed

Memory usage while consuming #81

winbatch opened this issue Feb 18, 2014 · 20 comments
Milestone

Comments

@winbatch
Copy link

Have you ever seen a case where when consuming, if the consumer can't keep up (let's say he is writing each message to slow disk), that the amount of memory explodes? I'm seeing that. I've run valgrind and don't see any leaks, etc. thinking librdkafka is caching msgs but I haven't changed any of the defaults in terms of fetch.message.max.bytes, etc.

@winbatch
Copy link
Author

Also- I see there is a 'queued.min.messages' but no queued.max.messages?

@winbatch
Copy link
Author

In case if wasn't obvious, can reproduce with sleep () rather than with slow disk

@edenhill
Copy link
Contributor

rdkafka will fetch more messages as long as there are less than queued.min.messages in the local queue, and for each fetch it will request up to fetch.message.max.bytesworth of messages, so no need for a queued.max.messages.

@winbatch
Copy link
Author

Yes, but what if I want it to STOP fetching messages? My issue is that without a max, it will continue to fetch and use huge amounts of memory.

On Tuesday, February 18, 2014, Magnus Edenhill [email protected]
wrote:

rdkafka will fetch more messages as long as there are less than
queued.min.messages in the local queue, and for each fetch it will
request up to fetch.message.max.bytesworth of messages, so no need for a
queued.max.messages.

Reply to this email directly or view it on GitHubhttps://github.com//issues/81#issuecomment-35463683
.

@winbatch winbatch reopened this Feb 19, 2014
@edenhill
Copy link
Contributor

It will stop when the queued.min.messages threshold is reached, and start fetching again whenever it drops below that threshold.

@winbatch
Copy link
Author

ok - that worked. Although I wouldn't mind if you offered a config parameter that says when the total bytes queued is 'X', then stop consuming. This allows you to control the amount of memory used when you don't know the size of the upcoming messages available on the wire.

@edenhill
Copy link
Contributor

The maximum memory consumption (per consumed toppar) should be close to queued.min.messages * message.max.bytes.

@edenhill
Copy link
Contributor

Allowing some factor of overhead, does this match the memory consumption you're seeing in your application?

@winbatch
Copy link
Author

That's hard for me to know since the message sizes coming in random. That's why I'd like to be able to control the number of messages based on memory. This way if messages coming in are small, I can ask for more of them. If messages are large, I don't run out of memory. I am not worried about getting a single huge message such that message.max.bytes really figures in. Let's say I only want to queue up 500 MB of memory. I'd like to be able to configure it such that librdkafka gets as many messages queued as possible without going over.

@edenhill
Copy link
Contributor

Okay, a queued.min.bytes property.
So, if any of queued.min.messages or queued.min.bytes thresholds are reached it will pause fetching until both of the levels drop below the threshold again. Okay with you?

@winbatch
Copy link
Author

A queued.max.bytes property. It's the threshold beyond which I don't want
to cross. I also find queued.min.messages naming confusing. It feels like it should be named 'max' based on how you described the logic above. But the feature is more important to me than the name, so up to you ;)

On Thu, Feb 20, 2014 at 11:13 PM, Magnus Edenhill
[email protected]:

Okay, a queued.min.bytes property.
So, if any of queued.min.messages or queued.min.bytes thresholds are
reached it will pause fetching until both of the levels drop below the
threshold again. Okay with you?

Reply to this email directly or view it on GitHubhttps://github.com//issues/81#issuecomment-35697733
.

@edenhill
Copy link
Contributor

Myeah, it might seem a little odd.
queued.min.messages is the threshold toggling if rdkafka should fetch more messages or not, it does not really control a maximum number of queued messages, even though it will have that effect as well since it will stop fetching once this threshold is reached.
But it will not queue exactly that number of messages, but rather up to queued.min.messages-1 + the-number-of-messages-received-in-the-last-fetch-batch (which depends on message.max.bytes).

If I added a queued.max.bytes property it would be a soft-limit, it would still queue all messages received in the fetch reply, possibly overshooting the queued.max.bytes value by some amount (up to message.max.bytes). Otherwise rdkafka would have to drop received messages to satisfy the maximum queue limit, just to refetch them in a short while again.

I'm rambling.

@winbatch
Copy link
Author

Yeah, you're rambling. Bottom line, I want to safe guard against running
out of memory but not limiting myself to a certain number of messages. If
you've got a cleaner way to do that, that's fine. I don't care if it gets
overshot by a little bit.

On Thu, Feb 20, 2014 at 11:27 PM, Magnus Edenhill
[email protected]:

Myeah, it might seem a little odd.
queued.min.messages is the threshold toggling if rdkafka should fetch
more messages or not, it does not really control a maximum number of queued
messages, even though it will have that effect as well since it will stop
fetching once this threshold is reached.
But it will not queue exactly that number of messages, but rather up to
queued.min.messages-1 +
the-number-of-messages-received-in-the-last-fetch-batch (which depends on
message.max.bytes).

If I added a queued.max.bytes property it would be a soft-limit, it would
still queue all messages received in the fetch reply, possibly overshooting
the queued.max.bytes value by some amount (up to message.max.bytes).
Otherwise rdkafka would have to drop received messages to satisfy the
maximum queue limit, just to refetch them in a short while again.

I'm rambling.

Reply to this email directly or view it on GitHubhttps://github.com//issues/81#issuecomment-35698271
.

@edenhill
Copy link
Contributor

I'll add queued.max.message.kbytes that does what is defined above (including possible overshoot).

edenhill added a commit that referenced this issue Feb 22, 2014
)

Defaults to 1 gig.

This also adds "fetchq_size" (alongside "fetchq_cnt") to the stats output.
@edenhill edenhill modified the milestone: 0.8.4 Feb 22, 2014
@edenhill
Copy link
Contributor

Have you had time to verify that queued.max.messages.kbytes solves your problem?

@winbatch
Copy link
Author

Sorry, haven't had a chance yet.

@edenhill
Copy link
Contributor

Okay, I'll close it anyway, reopen if you see the problem again (impossible!).

@kant111
Copy link

kant111 commented Oct 31, 2017

@winbatch Going through this discussion I am wondering why you ran out of memory ? The default value for queued.min.messages is 100,000 and say your message size 20KB(which is a lot but whatever) then 100K * 20KB = 2GB. so what is the size of your message and the memory you got?

@winbatch
Copy link
Author

winbatch commented Oct 31, 2017

@kant111 - This was more than 3 1/2 years ago and I'm sure many versions of kafka and librdkafka ago.

@kant111
Copy link

kant111 commented Oct 31, 2017

@winbatch I am sorry I did not understand your latter part of the sentence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants