Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did release 2.17.0 break call concurrency? #206

Closed
Abdullah-l opened this issue Feb 22, 2024 · 9 comments
Closed

Did release 2.17.0 break call concurrency? #206

Abdullah-l opened this issue Feb 22, 2024 · 9 comments

Comments

@Abdullah-l
Copy link
Contributor

Abdullah-l commented Feb 22, 2024

Please describe the issue

Hey guys @splittingred @leonid-shevtsov, check this out:
Create two RPCs

  1. Client streaming
  2. Anything else (I tried server streaming and unary)

Place a log in both RPCs
Call RPC 1 then 2

On gruf 2.16.2 it behaves normally by printing both logs

On gruf 2.17.0 it does not print the log in RPC 2. The server does not acknowledge the call at all, the client keeps loading indefinitely unless I cancel RPC 1.

This blocking behaviour I suspect was introduced in #178 as it's the only PR in that release that touches threading.

The issue is reproducible on the latest version too (2.19.0)

Anything else we should know?

  • gRPC version - 1.61
  • Ruby version - All, 2.7, 3.0, 3.1
  • Rails version 6.0 & 7.1
  • OS - macOS 14.1
@splittingred
Copy link
Member

Hi @Abdullah-l - I'm not able to replicate this; can you provide a repository/code sample of this occurring?

@splittingred
Copy link
Member

Further illustrated no apparent issue here: #207

@Abdullah-l
Copy link
Contributor Author

Hey @splittingred, I managed to reproduce it in your rake task :)

I created two branches with the exact same changes

  1. https://github.com/Abdullah-l/gruf/tree/after-2.17
  2. https://github.com/Abdullah-l/gruf/tree/before-2.17

The issue is easily seen in the logs
Before 2.17.0

I, [2024-02-23T02:24:05.015640 #20703]  INFO -- : Received get1 at 2024-02-23 02:24:05 +0300
I, [2024-02-23T02:24:06.221235 #20703]  INFO -- : Received get2 at 2024-02-23 02:24:06 +0300
I, [2024-02-23T02:24:06.325923 #20703]  INFO -- : Received get3 at 2024-02-23 02:24:06 +0300

After 2.17.0

I, [2024-02-23T02:25:00.553379 #21885]  INFO -- : Received get1 at 2024-02-23 02:25:00 +0300
I, [2024-02-23T02:25:30.554698 #21885]  INFO -- : Received get2 at 2024-02-23 02:25:30 +0300
I, [2024-02-23T02:25:30.554768 #21885]  INFO -- : Received get3 at 2024-02-23 02:25:30 +0300

As you can see after 2.17.0 get2 and get3 requests are blocked for a while there. They arrived 30 seconds later (after get1 timed out).

@splittingred
Copy link
Member

@Abdullah-l can you provide more information about your working environment:

  • Ruby version
  • Rubygems version
  • gRPC version installed
  • Anything else you feel relevant

@Abdullah-l
Copy link
Contributor Author

Ruby version - I tried 2.7, 3.0, 3.1
Rubygems version - 3.3.3
gRPC version installed - 1.61

I'm like 99.99% sure that 2.17.0 broke it

I hope you can reproduce it :)

You can just checkout the two branches I linked in the previous message and run:

./spec/demo_server
bundle exec rake gruf:demo:concurrent

on each one and see the logs

@leonid-shevtsov
Copy link
Contributor

And to clarify, is this under development mode or production mode? (What does Gruf.development? say?)

Basically the new code uses a read-write lock; any number of requests can concurrently "read", that is, access the controller - but when the controller is reloaded the lock is "write" locked and no concurrency is allowed.

I'd add some logging statements to see the order of operations around reload, the write lock acquire/release and the read lock acquire release.

@Abdullah-l
Copy link
Contributor Author

Using the rake task @splittingred wrote, Gruf.development? is true

However the issue still occurs when I used it with rails on a staging environment.

@leonid-shevtsov regarding adding logs, I didn't fully understand where to add them and what are we looking for 😅
is it here?

@leonid-shevtsov
Copy link
Contributor

leonid-shevtsov commented Mar 6, 2024

is it here?

yeah, here and also in lines 54-56; before the block (locking), at the beginning of the block (locked), and after it (unlocked); normally, you should not see a very long write locked period (that would be breaking concurrency indeed)

hmm... perhaps what's happening is the first request locks it for reading, and then the second wants to lock it for writing and hence has to wait until the first one completes? but this should only happen in a development environment.

could it be possible that you're running it in dev mode on staging? dev mode enables code reloading and possibly other things that you generally don't need or want outside of a dev environment.

and looks like the fallback environment is development:

def development?
environment == 'development'
end
private
##
# Automatically determine environment
#
# @return [String] The current Ruby environment
#
def environment
if defined?(::Rails)
::Rails.env.to_s
else
ENV.fetch('RACK_ENV') { ENV.fetch('RAILS_ENV', 'development') }.to_s
end
end

@Abdullah-l
Copy link
Contributor Author

Closing because I gave up on this 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants