Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

feat: close transports that implement io.Closer #227

Merged
merged 1 commit into from
Mar 29, 2021
Merged

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Aug 28, 2020

This way, transports with shared resources (e.g., reused sockets) can clean them up. Otherwise, we have no way to tell, e.g., the QUIC transport to close it's socket "pool".

fixes libp2p/go-libp2p#999

swarm.go Outdated

for t := range transports {
if closer, ok := t.(io.Closer); ok {
if err := closer.Close(); err != nil {
Copy link

@Jorropo Jorropo Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case (tor and i2p) closer.Close() is slow, it needs to call other peers in the dht about him leaving, that takes some times, I would like each transport to have his own goroutine.

Edit: mb, it would just be cheaper if transport needing it fire themself a goroutine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I'd prefer to be able to wait and I'm not too concerned about the cost of firing off goroutines on shutdown.

@Stebalien Stebalien requested a review from raulk August 29, 2020 00:27
Copy link

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I couldn't ask for more 👌

s.transports.m = nil
s.transports.Unlock()

var wg sync.WaitGroup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to allocate a new waitgroup, we have s.refs available (it's empty at this point, I now it's not how we are supposed to use it but it doesn't matter anyway we are closing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That isn't safe if Close is called concurrently. Basically, we can only call Add on this waitgroup when we can guarantee that the waitgoup won't be zero.

@@ -176,6 +177,27 @@ func (s *Swarm) teardown() error {
// Wait for everything to finish.
s.refs.Wait()

// Now close out any transports (if necessary). Do this after closing
Copy link

@Jorropo Jorropo Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only now, why not while closing conns and listeners ? I don't see any point in doing that way if we are not confident about the transport not receiving any call to dial or listen.
Some call to listen or dial can effectively get lost in starvation (or concurrency randomness) and arrive later than the Close (on the transport side I talk) so if the transport need to implement an internal close syncing logic about that we might close transports after closing listeners (but before) the s.refs.Wait, WDYT ?

PS: The transport can get closed before the transport receive a call to dial or listen because the transport lock is not held until the dial is complete :

defer s.transports.RUnlock()

Could be fixed if the RUnlock was moved here (and should also be done in the listen/all functions using Swarm.TransportForDialing) :
if err != nil {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to take a lock in one function, and release it in another. The correct fix here would be to call refs.Add(1) before starting a dial, releasing the ref on dial failure.

@jacobheun
Copy link

What's the state of this, is it blocked by anything?

@vyzo
Copy link
Contributor

vyzo commented Sep 11, 2020

not that I know, LGTM. @Stebalien ?

@Stebalien
Copy link
Member Author

I haven't merged it because it's still technically possible to call Dial on transports after calling Close. Honestly, merging with that issue is probably fine for now.

Copy link

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my mind, that a close, as long as it works performance is not an issue.

@Jorropo
Copy link

Jorropo commented Oct 7, 2020

@Stebalien what is the status of this ?
I need it for https://github.com/berty/go-libp2p-tor-transport, without it this is causing memory, cpu and process leakage.
Plus I think there is no need to bother as much as I were doing, it's fine for me.

@Stebalien
Copy link
Member Author

Yeah, this is probably fine as-is. I'll merge for now with an open issue: libp2p/go-libp2p#1545.

@Stebalien
Copy link
Member Author

Actually, I do need a final signoff from @raulk on this.

wg.Add(1)
go func(c io.Closer) {
defer wg.Done()
if err := closer.Close(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A log statement confirming that we're closing down this closeable transport would be nice. But then again, we need better logging all throughout, so I won't object if we don't add this one here right now.

swarm_listen.go Show resolved Hide resolved
@@ -48,7 +51,10 @@ func (s *Swarm) TransportForListening(a ma.Multiaddr) transport.Transport {
s.transports.RLock()
defer s.transports.RUnlock()
if len(s.transports.m) == 0 {
log.Error("you have no transports configured")
// make sure we're not just shutting down.
if s.transports.m != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to actually model a swarm status with atomic ints. Inferring we're closing down based on nil vs. 0-length slice differentiation is a bit of a footgun.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is a map, not a slice. We could have a second flag, but I'd prefer to avoid storing the state twice.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok to merge these changes, but there are some edge cases that I quite don't understand. And the swarm needs to store its state (unstarted, started, closing, closed) in an atomic int so we can implement a state machine, rather than rely on hacks to infer which state we're in.

This way, transports with shared resources (e.g., reused sockets) can clean them
up.

fixes libp2p/go-libp2p#999
@Stebalien
Copy link
Member Author

Tests are broken due to #246. Otherwise, they passed.

@Stebalien Stebalien merged commit 3d932ed into master Mar 29, 2021
@Stebalien Stebalien deleted the feat/closer branch March 29, 2021 18:29
@Stebalien Stebalien mentioned this pull request May 11, 2021
27 tasks
@aschmahmann aschmahmann mentioned this pull request May 14, 2021
71 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Closable Transport
5 participants