-
Notifications
You must be signed in to change notification settings - Fork 37
feat: close transports that implement io.Closer #227
Conversation
swarm.go
Outdated
|
||
for t := range transports { | ||
if closer, ok := t.(io.Closer); ok { | ||
if err := closer.Close(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case (tor and i2p) closer.Close()
is slow, it needs to call other peers in the dht about him leaving, that takes some times, I would like each transport to have his own goroutine.
Edit: mb, it would just be cheaper if transport needing it fire themself a goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I'd prefer to be able to wait and I'm not too concerned about the cost of firing off goroutines on shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I couldn't ask for more 👌
s.transports.m = nil | ||
s.transports.Unlock() | ||
|
||
var wg sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to allocate a new waitgroup, we have s.refs available (it's empty at this point, I now it's not how we are supposed to use it but it doesn't matter anyway we are closing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't safe if Close
is called concurrently. Basically, we can only call Add
on this waitgroup when we can guarantee that the waitgoup won't be zero.
@@ -176,6 +177,27 @@ func (s *Swarm) teardown() error { | |||
// Wait for everything to finish. | |||
s.refs.Wait() | |||
|
|||
// Now close out any transports (if necessary). Do this after closing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only now, why not while closing conns and listeners ? I don't see any point in doing that way if we are not confident about the transport not receiving any call to dial or listen.
Some call to listen or dial can effectively get lost in starvation (or concurrency randomness) and arrive later than the Close (on the transport side I talk) so if the transport need to implement an internal close syncing logic about that we might close transports after closing listeners (but before) the s.refs.Wait
, WDYT ?
PS: The transport can get closed before the transport receive a call to dial or listen because the transport lock is not held until the dial is complete :
go-libp2p-swarm/swarm_transport.go
Line 21 in 58c167a
defer s.transports.RUnlock() |
Could be fixed if the RUnlock was moved here (and should also be done in the listen/all functions using
Swarm.TransportForDialing
) :Line 576 in 58c167a
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to take a lock in one function, and release it in another. The correct fix here would be to call refs.Add(1)
before starting a dial, releasing the ref on dial failure.
What's the state of this, is it blocked by anything? |
not that I know, LGTM. @Stebalien ? |
I haven't merged it because it's still technically possible to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed my mind, that a close, as long as it works performance is not an issue.
@Stebalien what is the status of this ? |
Yeah, this is probably fine as-is. I'll merge for now with an open issue: libp2p/go-libp2p#1545. |
Actually, I do need a final signoff from @raulk on this. |
wg.Add(1) | ||
go func(c io.Closer) { | ||
defer wg.Done() | ||
if err := closer.Close(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A log statement confirming that we're closing down this closeable transport would be nice. But then again, we need better logging all throughout, so I won't object if we don't add this one here right now.
@@ -48,7 +51,10 @@ func (s *Swarm) TransportForListening(a ma.Multiaddr) transport.Transport { | |||
s.transports.RLock() | |||
defer s.transports.RUnlock() | |||
if len(s.transports.m) == 0 { | |||
log.Error("you have no transports configured") | |||
// make sure we're not just shutting down. | |||
if s.transports.m != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to actually model a swarm status with atomic ints. Inferring we're closing down based on nil vs. 0-length slice differentiation is a bit of a footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is a map, not a slice. We could have a second flag, but I'd prefer to avoid storing the state twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok to merge these changes, but there are some edge cases that I quite don't understand. And the swarm needs to store its state (unstarted, started, closing, closed) in an atomic int so we can implement a state machine, rather than rely on hacks to infer which state we're in.
This way, transports with shared resources (e.g., reused sockets) can clean them up. fixes libp2p/go-libp2p#999
Tests are broken due to #246. Otherwise, they passed. |
This way, transports with shared resources (e.g., reused sockets) can clean them up. Otherwise, we have no way to tell, e.g., the QUIC transport to close it's socket "pool".
fixes libp2p/go-libp2p#999