Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Reduce memory usage of multiline #14068

Merged
merged 4 commits into from
Oct 15, 2019

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Oct 15, 2019

The use of time.After when multiline is enabled and lines don't match the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally by the Go runtime when time.After(duration) is used. According to the docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on the input file don't match the pattern.

Reproduced with this config:

multiline.pattern: '^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\.d{3}'
multiline.negate: true
multiline.match: after
multiline.max_lines: 50
max_bytes: 1048576

and a random input file generated by:

cat <<EOF > genrandom.py
import base64
import os
import sys

LINE_SIZE = 160
RAW_SIZE = int(LINE_SIZE * 3 / 4)

if len(sys.argv) != 2:
    print 'Usage:', sys.argv[0], '<SIZE IN MEGABYTES>'
    sys.exit(1)

limit = int(sys.argv[1]) * 1024 * 1024
size = 0
f = open('/dev/urandom', 'rb')

while size < limit:
    os.write(2, '> written {0:.2f} MiB\r'.format(size / (1024.0 * 1024.0)))
    raw = f.read(RAW_SIZE * 100)
    while len(raw) >= RAW_SIZE:
        part, raw = raw[:RAW_SIZE], raw[RAW_SIZE:]
        line = base64.b64encode(part)
        print line
        size += len(line)
os.write(2, 'Done.\n')
EOF
python genrandom.py 1024 > 1gb.log

The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.
@adriansr
Copy link
Contributor Author

Here's how the heap looks before this PR:

Screenshot 2019-10-15 at 15 36 43

@adriansr adriansr force-pushed the fix_filebeat_timer_mem_sdh8362 branch from cd1d6cc to 7e1b650 Compare October 15, 2019 15:07
@faec faec self-requested a review October 15, 2019 16:21
Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It looks to me like multiline might still have some issues, Next's handling of r.running looks suspicious, but this is a good step :-)

timer := time.NewTimer(r.timeout)
defer func() {
timer.Stop()
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra select looks redundant, is it still needed to free memory after timer.Stop? If so please explain in a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Draining the channel is unnecessary unless the timer is going to be reused.

I took the opportunity to rewrite a little bit so that a single channel is used instead of allocating a new one every time.

if r.timer == nil {
r.timer = time.NewTimer(r.timeout)
} else {
r.timer.Reset(r.timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we're seeing noticeable memory / cpu overhead from it, I actually like the ephemeral version better (allocate a timer each time). Reusing timers is a lot more subtle -- because Reset can only be called on a timer that has stopped or expired, and <-timer.C can deadlock if it's called in the wrong state... it looks to me like you've handled all the cases correctly here, but it takes considerable thought to make sure.

If this is a significant performance fix over the previous version, then this looks ok, but please comment on these issues (e.g. pointing out that timer.Reset is safe because in every code path the timer is Stopped or expired before this function returns, and that <-r.timer.C can't deadlock because Stop can only be called once per Reset and thus a false return value means there is an expiration signal waiting...)

If this doesn't affect performance much, though, I'd rather just create and deallocate the value every time, to save the next person who reads this function the cognitive load of thinking about state invariants :-)

@@ -28,11 +28,12 @@ var (
errTimeout = errors.New("timeout")
)

// timeoutProcessor will signal some configurable timeout error if no
// TimeoutReader will signal some configurable timeout error if no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing comments!

@urso
Copy link

urso commented Oct 15, 2019

Great find. We should not use time.After in general, but always use a timer as shown here. Even if it is a little less convenient.

@adriansr adriansr merged commit ce651e0 into elastic:master Oct 15, 2019
adriansr added a commit to adriansr/beats that referenced this pull request Oct 15, 2019
The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit ce651e0)
adriansr added a commit to adriansr/beats that referenced this pull request Oct 15, 2019
The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit ce651e0)
adriansr added a commit that referenced this pull request Oct 15, 2019
The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit ce651e0)
adriansr added a commit that referenced this pull request Oct 15, 2019
#14074)

The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit ce651e0)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…14073)

The use of time.After when multiline is enabled and lines don't match
the multiline pattern increases the memory usage (from 30MB to 1GB).

This extra memory is attributed to unexpired timers allocated internally
by the Go runtime when time.After(duration) is used. According to the
docs: "If efficiency is a concern, use NewTimer instead and call Timer.Stop
if the timer is no longer needed.".

It's not clear to me why this problem only appears when many lines on
the input file don't match the pattern.

(cherry picked from commit 572ee79)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants