Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop queuing up heartbeat threads #345

Merged
merged 3 commits into from
Mar 24, 2017
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions lib/shoryuken/manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ class Manager
include Util

BATCH_LIMIT = 10
HEARTBEAT_INTERVAL = 0.1
HEARTBEAT_INTERVAL = 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynerobinson I'm testing HEARTBEAT_INTERVAL = 1, as I'm also calling dispatch in the processor_done callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably going to cause issues with empty queues as it will take a least HEARTBEAT_INTERVAL before it will request new messages.

Instead of this you could just have dispatch run in a loop and rely on the TimerTask to restart it if it ever dies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

def dispatch_now
  while true
    if ready.zero?
      return unless (queue = @polling_strategy.next_queue)
  
      logger.debug { "Ready: #{ready}, Busy: #{busy}, Active Queues: #{@polling_strategy.active_queues}" }

      batched_queue?(queue) ? dispatch_batch(queue) : dispatch_single_messages(queue)
    else
      sleep MINIMUM_WAIT # 0.05 or something to prevent CPU pegging
    end
  end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although instead of ready.zero?, is there just some type of latch on the pool to await a free thread so you don't have to do the sleep?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although instead of ready.zero?, is there just some type of latch on the pool to await a free thread so you don't have to do the sleep?

@waynerobinson not I'm aware of 🐼


def initialize(fetcher, polling_strategy)
@count = Shoryuken.options.fetch(:concurrency, 25)
Expand All @@ -13,16 +13,18 @@ def initialize(fetcher, polling_strategy)
@queues = Shoryuken.queues.dup.uniq

@done = Concurrent::AtomicBoolean.new(false)
@dispatching = Concurrent::AtomicBoolean.new(false)

@fetcher = fetcher
@polling_strategy = polling_strategy

@heartbeat = Concurrent::TimerTask.new(run_now: true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynerobinson I'm doing the loop in a ensure block, and also calling dispatch_async when a processor done, just in case. So I don't think we need to the heartbeat anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the dispatch_async in processor_done does anything given the normal dispatcher re-runs at the end anyway.

But the ensure and re-run should keep the dispatch loop running I think. 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynerobinson you are right, maybe I was too overcautious. I added it in there just in case. But I'm considering removing it, ensure should work.

execution_interval: HEARTBEAT_INTERVAL,
timeout_interval: 60) { dispatch }
Concurrent::TimerTask.new(execution_interval: 1) do
Shoryuken.logger.info "Threads: #{Thread.list.size}"
end.execute
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waynerobinson With concurrency 10: bundle exec bin/shoryuken -c 10 -q test -r ./test.rb, the thread size kept consistent at 16.


@pool = Concurrent::FixedThreadPool.new(@count, max_queue: @count)
@dispatcher_pool = Concurrent::SingleThreadExecutor.new
Copy link
Collaborator Author

@phstc phstc Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the block finishes faster than the execution_timeout then this leak won't occur, even if the currently broken state of concurrent-ruby.

@waynerobinson now as I'm using this SingleThreadExecutor, it's no longer an issue, it supports only one thread at time and the fallback policy is discard. If we try to post while there's a thread running, it will just discard the post.

Now if only the team there could be convinced that there is a bug in the first place. 😜

I will try to reply on that issue as well. I think they should allow to configure a max_queue and fallback_policy for the TimerTask, too bad they don't allow it, and it ends up having a max_queue: infinity.

Copy link
Contributor

@waynerobinson waynerobinson Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that TimerTask is the thing that's leaking threads because of the ones it creates to do the timeout. But I'm sure the scheduling of a dispatch task occurs faster than the heartbeat period.

Don't really need a max_queue in TimerTask because the way the class is designed it should be locking around the actual task and only letting one operate at a time.

But the lack of a correct implementation for the timeout causes thread leaks in the timeout monitor if the task takes longer than the execution interval (it should be the timeout interval at the very least, but it never refers to this… hence the bug) to complete.

The current design intention of TimerTask seems to have it limited to only ever using 2 threads.


@heartbeat = Concurrent::TimerTask.new(run_now: true, execution_interval: HEARTBEAT_INTERVAL) { dispatch }
end

def start
Expand All @@ -44,6 +46,7 @@ def stop(options = {})
logger.info { 'Shutting down workers' }

@heartbeat.kill
@dispatcher_pool.kill

if options[:shutdown]
hard_shutdown_in(options[:timeout])
Expand All @@ -54,22 +57,25 @@ def stop(options = {})

def processor_done(queue)
logger.debug { "Process done for '#{queue}'" }

dispatch
end

private

def dispatch
return if @done.true?
return unless @dispatching.make_true

@dispatcher_pool.post(&method(:dispatch_now))
end

def dispatch_now
return if ready.zero?
return unless (queue = @polling_strategy.next_queue)

logger.debug { "Ready: #{ready}, Busy: #{busy}, Active Queues: #{@polling_strategy.active_queues}" }

batched_queue?(queue) ? dispatch_batch(queue) : dispatch_single_messages(queue)
ensure
@dispatching.make_false
end

def busy
Expand Down