-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pubsub: Many instances of "The StreamingPull stream closed for an expected reason and should be recreated ..." #9788
Comments
Hi, how long have you been experiencing this issue? Does this correspond to a recent version bump of the Pub/Sub library? Separately, in the last code block you have, I wasn't able to see where exactly |
Hi @hongalex! My Honeycomb data stretches back for 2 months and the issue has been going on at least since then. At that point we were using I have amended my code block to include info on how Receive is called 🤗 |
StreamingPull streams are periodically closed every 30ish minutes, which your above graph confirms. This is intentional for the server to reassign resources properly. While this behavior isn't documented specifically, StreamingPull streams being closed with a non-ok error is normal: https://cloud.google.com/pubsub/docs/pull-troubleshooting#troubleshooting_a_streamingpull_subscription Given that the error isn't tied to poor behavior, this is working as intended. If you have noticed your streams behaving poorly though, please let us know so we can investigate further. |
Closing this issue since I think my previous comment answers this. If you're experiencing other kinds of unexpected behavior, please open another issue and I'll investigate there. |
We've been experiencing this noise for several months and I finally got around to looking into it (which caused me to stumble upon this thread). We, like @HaraldNordgren, use otel & honeycomb. I think the root of this noise goes back a change implementing a bridge between opencensus and otel in the google apis, although I'm not sure. My hunch is that these traces suddenly started being reported due to this. It's unfortunate that the streamingPull interruption is always an error and the trace is reported accordingly. This really throws a wrench in a lot of our dashboards/queries. Right now, it seems the only real options we have are to either filter these out at some layer of our otel stack or to disable telemetry on the client in our code. Mostly just writing this small essay as additional information for anyone else that stumbles upon this and maybe to hear if there's other options I didn't consider. |
Yeah so you're close, but it was actually added as part of googleapis/google-api-go-client#2127 which is in a different repo. The change you linked above was for an internal trace package, that Pub/Sub currently doesn't rely on. The PR I linked adds a grpc otel stat handler to our underlying I agree that it is unfortunate that StreamingPull returning an non-nil error affects your dashboards. If it helps, we are close to launching native instrumentation for Pub/Sub, tracked in #4665 which traces the message lifecycle more closely than gRPC events. Once this is supported, I imagine you could try only using this instrumentation, while disabling the underlying client telemetry with
|
Ah sweet, there was no chance I was going to ever hunt that one down, so glad that you had it! Makes a ton of sense. I had already found the ability to disable, and I will keep an eye on your linked issue so we can get visibility back when it's ready! Appreciate the response! |
Client
cloud.google.com/go/pubsub v1.37.0
Environment
GKE
Go Environment
Code
then using it like
Expected behavior
No error messages.
Actual behavior
Screenshots
The text was updated successfully, but these errors were encountered: