-
Notifications
You must be signed in to change notification settings - Fork 16
call Stream.Reset instead of Stream.Close #76
Conversation
tests fail... |
Yeah, we expect a nice EOF. |
we need the |
|
The test failure seems insurmountable, we'll need to change the stream API for this to work :( |
Not necessarily. The test failure is due to the assumption that |
But I agree this sucks. |
@@ -28,29 +29,53 @@ func (n *NetAddr) String() string { | |||
return fmt.Sprintf("relay[%s-%s]", n.Remote, n.Relay) | |||
} | |||
|
|||
func (c *Conn) Close() error { | |||
return c.stream.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is relevant here. This is how I ended up doing it in gostream :
https://github.com/hsanjuan/go-libp2p-gostream/blob/master/conn.go#L38
// Close closes the connection.
// Any blocked Read or Write operations will be unblocked and return errors.
func (c *conn) Close() error {
if err := c.s.Close(); err != nil {
c.s.Reset()
return err
}
go pnet.AwaitEOF(c.s)
return nil
}
If I remember well, bluntly resetting libp2p streams on Close()
caused errors on the other side on situations where the stream closing was supposed to be a clean operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but we need to be careful:
- Concurrent reads aren't safe. We need to interrupt any concurrent reads (somehow) before we can do this. We should be able to do this by setting a read deadline but... we'll have to make sure to fix any/all of our multiplexers/transports (deadlines currently don't interrupt).
- We probably want to set some timeouts/deadlines.
- We should probably call
Close()
in the background as well. That will mimic a normalclose(tcpFileDescriptor)
call (which will let the kernel flush in the background).
IMO, the real issue here is that TCPConn.Close
is actually a very fancy best-effort function that does a bunch of dirty work in the background. Ideally, even if we make Close
close both directions, we wouldn't be that sloppy.
@vyzo, this is getting really critical and both the TCP and the QUIC transports reset the underlying connection on close without flushing anything (I specifically modified the TCP transport to do this to avoid unnecessary work). This bug likely just crashed our gateways and has been causing problems for a ton of people. The only other solution I can think of is to:
But that requires fixing multiple annoying bugs. I've started on this path (with the yamux fix) but it feels like a waste given that we don't actually make any guarantees about Close flushing. |
Sigh... If it has been identified as the critical bug, then we got to merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fix the test and merge it as it is a critical bug.
Well, Close is stalling. I'll try shipping a patch to some of our partners to make sure this is the bug. |
This needs a rebase for the go mod debacle (go.sum conflicts) |
May fix ipfs/kubo#6237 Basically, 1. We hang while closing a stream (because `Close` waits). 2. This blocks the connection manager because it assumes that close _doesn't_ wait. This may also fix a stream leak.
rebased and resolved the conflict; should be ready for merge. |
we also need the companion in #77, so that the connection manager doesn't kill the underlying hop relay connections when they have stop streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial feedback; still reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous status should've been "request changes".
fixed bug identfied, the wrapping is not an issue.
Tag the hop relay when creating stop streams
May fix ipfs/kubo#6237
Basically,
Close
waits). Also, Closing a stream should cancel pending writes. go-mplex#9This may also fix a stream leak.