Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client must not expect to hear back from server when establishing bidirectional stream #515

Open
vorot93 opened this issue Dec 28, 2020 · 20 comments
Labels
C-bug Category: Something isn't working E-help-wanted Call for participation: Help is requested to fix this issue.

Comments

@vorot93
Copy link
Contributor

vorot93 commented Dec 28, 2020

Currently it does expect server to write something back. This causes a hang.

Illustrated example:
https://github.com/hyperium/tonic/blob/master/interop/src/client.rs#L155

Move this line after full_duplex_call and observe the test fail with timeout.

@yusdacra
Copy link

This issue also affects one of my crates and causes tests to get stuck and fail. Would love to see this get fixed soon. I would like to take a look at this but I'm not sure where to start.

@LucioFranco
Copy link
Member

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

@LucioFranco LucioFranco added C-bug Category: Something isn't working E-help-wanted Call for participation: Help is requested to fix this issue. labels Jan 14, 2021
@yusdacra
Copy link

Do either of you know if this only happens with grpc-go or if happens with other h2 servers?

Not sure, as our server is only implemented with Go currently. I will try testing it with the Python examples on https://github.com/grpc/grpc/tree/master/examples/python today.

@yusdacra
Copy link

I have made a repo showcasing the bug happening with the Python examples I linked:
https://github.com/yusdacra/tonic-bug

@LucioFranco

@LucioFranco
Copy link
Member

Thanks, I do not have time this week to dig into it but will get to this ASAP.

@davidpdrsn
Copy link
Member

I've done some digging but unsure about the best way to fix this. The hang is initiated from Grpc::streaming which eventually ends up in Reconnect::call where it tries to setup the connection to the server. So if the server never responds then it hangs.

I suppose we could change Grpc::streaming to not block until the connection has been established by connecting in a background task instead and sending the connection back in a oneshot channel. I guess that would mean blocking for that task to complete in Stream::message instead and propagate any connection errors there.

@LucioFranco Do you think makes sense? If so I would like to take a stab at implementing it 😊

@sfackler
Copy link

The Java gRPC server implementation also appears to behave this way.

@behos
Copy link

behos commented Jan 5, 2023

Is there a workaround for this? What does the tonic server send back to the client which makes it work for rust and not for other implementations?

@r-ml
Copy link

r-ml commented Apr 30, 2023

.NET server implementation also triggers this issue.

I can workaround the issue, as I control both ends, by issuing a await responseStream.WriteAsync("ping"); before the usual while (await requestStream.MoveNext()) { ... } server-side, and discarding the first response right after stream construction client-side.

@jordy25519
Copy link

jordy25519 commented Sep 20, 2023

+1 seeing this issue and unable to use a streaming client as the server is not in my control (seems same issue #1233)

tried comparing python/c++ implementations, my hunch was that the other clients send an 'initial metadata' payload which triggers the server side to always respond, couldn't find similar thing within tonic implementation.

e.g. https://github.com/grpc/grpc/blob/fc159a690158ed089b19d3eb9f76e8399e3207ca/src/python/grpcio_tests/tests/unit/_cython/cygrpc_test.py#L361-L362

@LucioFranco
Copy link
Member

@jordy25519 do you have a reproduction I can look at?

@jordy25519
Copy link

jordy25519 commented Sep 20, 2023

I don't have one on hand. the relevant generated client code is:
hangs here: self.inner.server_streaming(req, path, codec).await

pub async fn stream_foo(
    &mut self,
    request: impl tonic::IntoRequest<super::StreamFooRequest>,
) -> std::result::Result<
    tonic::Response<tonic::codec::Streaming<super::StreamFooResponse>>,
    tonic::Status,
> {
    self.inner
        .ready()
        .await
        .map_err(|e| {
            tonic::Status::new(
                tonic::Code::Unknown,
                format!("Service was not ready: {}", e.into()),
            )
        })?;
    let codec = tonic::codec::ProstCodec::default();
    let path = http::uri::PathAndQuery::from_static(
        "/crate_foo_rpc.CrateFooRpc/StreamFoo",
    );
    let mut req = request.into_request();
    req.extensions_mut()
        .insert(
            GrpcMethod::new(
                "crate_foo_rpc.CrateFooRpc",
                "StreamFoo",
            ),
        );
    /// ---HANGS HERE---
    self.inner.server_streaming(req, path, codec).await
}

Additionally, I observe the hang resolves after ~2-5minutes I assume due to a close/timeout message from the server which unblocks the client.

like #515 (comment) I traced the hang to the Reconnect code but didn't dig much further

@jordy25519
Copy link

jordy25519 commented Sep 26, 2023

had another go at debugging this and in my particular case the server never responds with a http2 Header frame if it has no message data to stream.

http2 handshake/connection is ok but blocks after sending streaming request and waiting for response in h2: https://github.com/hyperium/h2/blob/da38b1c49c39ccf7e2e0dca51ebf8505d6905597/src/proto/streams/recv.rs#L318
(this condition is never satisfied until first Header frame sent along with first Data frame, technically they could be sent separately but seems like grpc servers are not doing this by default)
not sure what action tonic can take without knowing/receiving the response headers from the server.

zooming out, as a user I'd expect to get the Streaming handle/instance back whether there's immediate data available or not and be able to await on that for the messages rather than block on setup i.e. decouple the setup await from the messages await

possibly better to solve on server side but also feel like this should be a 'it just works' thing

@declark1
Copy link

declark1 commented Jun 27, 2024

Also facing this issue. Not sure how to resolve this. Strangely, it doesn't happen with a Rust server when attempting to create a reproducible example, in my case only with a Python server.

@Adphi
Copy link

Adphi commented Jun 27, 2024

@declark1 as far as I know, you can only solve this by making your server speak first, for example by sending headers server side.

@blinsay
Copy link

blinsay commented Aug 10, 2024

I'm reliably running into this as well with a Go server. I built a simple Go server with the protos from Tonic's streaming example to test. Eyeballing some wireshark dumps it seems jordy25519 is correct and the Go server isn't send an http2/ headers frame until it has data to respond with, even though an equivalent Tonic server does.

It looks like this is only an issue if the client-side code waits for the RPC method to complete to start sending messages on the outgoing stream. Changing the streaming example to the following hung on connect and and never printed connected.

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();

    let (tx, rx) = tokio::sync::mpsc::channel(10);

    println!("connecting");
    let response = client
        .bidirectional_streaming_echo(tokio_stream::wrappers::ReceiverStream::new(rx))
        .await
        .unwrap();
    println!("connected");

    for i in 0..10 {
        tx.send(EchoRequest {
            message: format!("msg {:02}", i),
        })
        .await
        .unwrap();
    }

    let mut resp_stream = response.into_inner();

    while let Some(received) = resp_stream.next().await {
        let received = received.unwrap();
        println!("\treceived message: `{}`", received.message);
    }
}

Changing the outgoing stream to tokio_stream::pending also hangs forever and never prints connected. For example, would expect the following to print connected and exit.

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = EchoClient::connect("http://127.0.0.1:50051").await.unwrap();

    println!("connecting");
    let response = client
        .bidirectional_streaming_echo(tokio_stream::pending())
        .await
        .unwrap();
    println!("connected");
}

I haven't dug deep enough yet to understand if this is an expectation mismatch between Tonic and Hyper, or Hyper and other http/2 implementations. Given the number of folks in this thread reporting issues with Java, Python, etc. servers, it seems like it's at least worth some documentation to point out that you MUST have a ready outgoing message on the request stream before the RPC call returns.

@blinsay
Copy link

blinsay commented Aug 10, 2024

Looking quickly at the Go grpc implementation, it seems like maybe this is a mismatch between Tonic and other GRPC APIs. Generated streaming clients in Go seem to return a BidiStreamingClient which wraps a ClientStream interface and doesn't guarantee Headers are available or that a stream is connected when calling SendMsg. Compare that to Tonic, where generated streaming clients still return a tonic::Response which guarantees Metadata exists.

@berwani
Copy link

berwani commented Sep 16, 2024

Faced the same issue with Rust client & C++ server (bidirectional gRPC streaming). The Rust client used to hang-up and not get a stream back. I had to apply the hack mentioned by @blinsay: on the client-side, add an empty message ready to be sent in the outgoing stream before making gRPC streaming method call.

I consider this as a Tonic bug as the Tonic generated gRPC client doesn't work with other language based servers. It should not be expected for the client to have an outgoing message ready while establishing the stream.

@montekki
Copy link

@LucioFranco is there any work planned on fixing this? This issue is making the code using tonic to practically not work with any piece of cloud infrastructure that is also not implemented with tonic (which is virtually everything).

@nikkolasg
Copy link

nikkolasg commented Dec 20, 2024

Same issue here - we had to make the work around of sending a dummy message first on a normal request and then create the bidirectional stream which took us hours to figure out and is making the code clunky.
Any fixes would be highly appreciated, or even pointers on how to fix this.

Thank you for your work 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Something isn't working E-help-wanted Call for participation: Help is requested to fix this issue.
Projects
None yet
Development

No branches or pull requests