Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for Mitigating Input Downtime #68

Closed
todd opened this issue Dec 10, 2015 · 12 comments
Closed

Options for Mitigating Input Downtime #68

todd opened this issue Dec 10, 2015 · 12 comments

Comments

@todd
Copy link

todd commented Dec 10, 2015

We recently experienced an issue where our Redis input for Logstash went down and our app became unresponsive, a scenario outlined in the README. As noted in the README, we can bump up the values for the buffer configuration, but it doesn't seem that that will prevent this issue from recurring in the event of another significant logging infrastructure downtime event.

There's the sync option, but, based on the documentation, I'm unclear on whether this would have prevented this issue from occurring.

It would be great if there was a way for the logger to flush the buffer if it receives a connection error. We'd much rather lose logs than take downtime. Is this something that would be possible? I'd be more than willing to work on a patch and submit a PR if you thought it was possible and worthwhile and could give a little direction.

@johnnonolan
Copy link

👍

@dwbutler
Copy link
Owner

I think it's a great idea to make this behavior configurable. I think most people would rather drop some logs rather than experience downtime! :)

In order for this to work really well, LogStashLogger would need to never block and never raise an exception. This would require a thorough review of the code to make sure this works consistently everywhere.

I can see there being different options for this. Don't buffer messages, and drop messages on connection failure. Buffer messages, but drop them if the buffer gets full. Buffer messages, but don't drop them. (e.g. block until the connection is re-established.)

@blysik
Copy link

blysik commented Jan 7, 2016

I think we experienced the same thing recently: our redis system was down, and this seemed to cause long pauses because of timeouts sending logs to it in our Rails application.

Is there a way to make a very short timeout for logging?

@dwbutler
Copy link
Owner

dwbutler commented Jan 8, 2016

The Ruby Redis client defaults to a 5 second timeout. You can override it by passing a different value for timeout and/or connect_timeout in your Redis configuration. Let me know if that helps with the issue.

@todd
Copy link
Author

todd commented Feb 29, 2016

So we're not going to work on this due to time and resource constraints, but I did want to report back with the solution we went with.

We ended up removing this gem entirely. Instead, we're logging to files that are being tailed with Filebeat and shipping events directly to our collectors.

I regret that I won't be able to work on this. I'm going to leave this issue open for now as it's still an issue that I believe should be solved at some point.

@dwbutler
Copy link
Owner

I agree that it should be solved. LogStashLogger is essentially an in-process log shipper, and should act in a well-defined, reliable way that does not interfere with normal operation of the application.

@DaveCollinsJr
Copy link

We were bitten by this in production today also. Our ELK stack went down over the weekend and eventually, I believe, the inability to log caused our sidekiq workers to get hung. Would +1 the option to simply lose the data when the buffer is full rather than having the application become unresponsive.

sauliusgrigaitis added a commit to necolt/logstash-logger that referenced this issue Apr 4, 2016
@DaveCollinsJr
Copy link

Thanks much @sauliusgrigaitis for that fix!

@lucke84
Copy link

lucke84 commented Jul 5, 2016

Hello there, we've experienced yesterday the same issue (our Redis endpoint went down and the application quickly became unresponsive). @dwbutler What needs to be done to introduce the ability to drop the data if the connection times out? I'd be happy to help if you give me a few pointers on what code to review (as you were suggesting in this very issue).

dwbutler added a commit that referenced this issue Jul 6, 2016
LogStashLogger currently uses `Stud::Buffer` to implement buffering for connectable devices (such as TCP, Redis, etc.) When the remote service goes down, an exception is raised when a buffer flush is attempted. By default `Stud::Buffer` will retry sending the messages forever. Since a flush is triggered when a message is received, or on a regular timer, this will cause logging calls to block. See jordansissel/ruby-stud#28

`Stud::Buffer` allows callbacks to be fired when it encounters a flush error. This ties into that mechanism to abort the flush and re-enqueue the failed messages. This behavior is now enabled by default. To instead drop messages when there is a flush error, pass the new `drop_messages_on_flush_error` option to the logger.

Most applications will want to buffer messages and only drop them when the buffer fills up. This behavior has been implemented by tying into `Stud::Buffer`'s callback for the buffer full event. By default, when the buffer is full, `Stud::Buffer` will block when any new message comes in, until there is room in the buffer. If you want to discard messages in the buffer when this happens, pass the new `drop_messages_on_full_buffer` option to the logger.

Fixes #68
@dwbutler
Copy link
Owner

dwbutler commented Jul 6, 2016

I finally found some time to work on this. Please try the patch in #81. By default, the logger will no longer block when there is a connection error. If you want to drop messages when the buffer is full, add this new configuration option to your logger:

logger = LogStashLogger.new(type: :redis, drop_messages_on_full_buffer: true)

@lucke84
Copy link

lucke84 commented Jul 10, 2016

Thanks @dwbutler for getting this done! Any chance you'll release a new version of the gem anytime soon?

@dwbutler
Copy link
Owner

Yes, my goal is to release sometime this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants