Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): fix stream termination in MRD. #11432

Merged
merged 24 commits into from
Mar 6, 2025

Conversation

shubham-diwakar
Copy link
Contributor

@shubham-diwakar shubham-diwakar commented Jan 10, 2025

  1. Make CloseSend() call before releasing resource.
  2. Drain inbound response from the stream.

@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Jan 10, 2025
@shubham-diwakar shubham-diwakar marked this pull request as ready for review January 10, 2025 13:54
@shubham-diwakar shubham-diwakar requested review from a team as code owners January 10, 2025 13:54
Comment on lines 1472 to 1474
if err := mr.stream.CloseSend(); err != nil {
return err
}
Copy link
Contributor

@BrennaEpp BrennaEpp Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also drain the stream after the CloseSend (as we do in drainInboundStream(), receiving from stream until we get a non-nil error) to make sure its resources are released? See grpc/grpc-go@365770f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the best practice, applied this.
Using stream.recv just for determination of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually i am not sure that will be get any outputs here if we call stream.Recv post CloseSend(). What if all the responses where consumed by streamReceiver go routine?

Although there are some cases when we close the stream even with requests added i have drained responses there.

LMK your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Recv() continually returns the err (io.EOF or something else) once the stream is done? This should be easy to check with a toy program.

However, multiple concurrent calls to Recv() are not allowed, so if streamReceiver goroutine may be calling Recv(), you have to be careful not to call Recv() elsewhere until streamReceiver is done. I think it's probably easiest for all Recv() calls to live on that one goroutine.

It's maybe easiest to call CloseSend() on the same goroutine which calls Send()? Then you have one goroutine for Send/CloseSend, and another for Recv, and user code can cancel the context then call Close() to trigger a cancellation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris for the suggestion.
That kind of simplifies structure too.

Now one go routine has all Recv() calls and other has all Send()/CloseSend() calls.

@shubham-diwakar shubham-diwakar changed the title feat(storage): fix mutex use and CloseSend before close feat(storage): fix stream termination in MRD. Jan 27, 2025
@shubham-diwakar shubham-diwakar changed the title feat(storage): fix stream termination in MRD. fix(storage): fix stream termination in MRD. Jan 29, 2025
@shubham-diwakar
Copy link
Contributor Author

Ok so added some integration tests for context cancellation and abrupt close.
Here's the behaviour:
If we do context cancellation user gets the context canceled error message back from callback and we close things. Any further request after which would basically return some stream closed, can't add range errors.

For abrupt close we just close the client server connection,
Then basically all the present ranges get some stream closed early message. Any further request after which would basically return some stream closed, can't add range errors.

For normal scenario we would basically close client server connection drain responses in case we have active range and then tear down resources which would ideally stop returning the cancelled errors i believe on server side.

Comment on lines +1490 to +1491
mr.closeManager <- true
mr.closeReceiver <- true
Copy link

@arjan-bal arjan-bal Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be buffered channels. So it's not guaranteed that the receivers would have read the message when execution resumes. Even if thee were unbuffered channels, the sender would get unblocked as soon as the receiver gets the message. The sender and receiver would race to lock the mutex and the sender could win. This could result in the context getting cancelled before the stream is drained.

@shubham-diwakar
Copy link
Contributor Author

Made sure that we receive EOF error or any permanent error from server on inbound stream(recv) and on outbound stream made sure to call closesend before calling the cancel().

Seeing ok response now on server side dashboards, Ref: https://screenshot.googleplex.com/8qAqijdXnohNeDi.

@tritone tritone requested a review from a team as a code owner March 6, 2025 21:54
@tritone tritone added the automerge Merge the pull request once unit tests and other checks pass. label Mar 6, 2025
Copy link
Contributor

Your PR has conflicts that you need to resolve before merge-on-green can automerge

@tritone
Copy link
Contributor

tritone commented Mar 6, 2025

This required some modifications on merge with the commit that changed callback semantics to bytes written.

@gcf-merge-on-green gcf-merge-on-green bot merged commit 3d4e62f into googleapis:main Mar 6, 2025
8 checks passed
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 6, 2025
tritone added a commit to tritone/google-cloud-go that referenced this pull request Mar 7, 2025
gcf-merge-on-green bot pushed a commit that referenced this pull request Mar 7, 2025
This reverts commit 3d4e62f.

We still need to do some debugging on an integration test failure for this.

Fixes #11769
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants