Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Move connection management into networking layer #351

Merged
merged 15 commits into from
Apr 18, 2020
Merged

Conversation

dirkmc
Copy link
Contributor

@dirkmc dirkmc commented Apr 15, 2020

Part of the fix for #347

@dirkmc dirkmc marked this pull request as ready for review April 16, 2020 21:00
@dirkmc dirkmc requested a review from Stebalien April 16, 2020 21:00
internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved
internal/messagequeue/messagequeue.go Outdated Show resolved Hide resolved
return nil
case <-s.done:
return nil
case <-time.After(s.opts.SendErrorBackoff):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using time.After but this isn't too critical.

network/ipfs_impl.go Show resolved Hide resolved
network/ipfs_impl.go Outdated Show resolved Hide resolved
}
state.refs++

if state.refs == 1 && state.responsive {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider switching the peer back to "responsive" on connect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is:

if state.refs == 1 || !state.responsive {
    state.responsive = true
    ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get into the unresponsive state if the remote peer fails to respond to several attempts to dial it, but it's still connected. So we're implicitly saying that we care about responsiveness more than connectivity.
I guess arguably if a peer opens a new connection it can be considered responsive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more we probably want to keep it how it is - if for example a peer that doesn't support bitswap dials us, we will dial it when broadcasting to connected peers. If the peer responds with an error indicating protocol not supported, we shouldn't try to dial it again even if it connects to us again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're making 3 attempts, it's probably fine. I'm concerned that it can sometimes take some time to know that a connection is actually dead. In that case, we could try several times, say "peer's dead!", then get the new connection, then see the old connection finally die.

network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Outdated Show resolved Hide resolved
network/connecteventmanager.go Show resolved Hide resolved
return s.stream, nil
}

// Reset the stream
func (s *streamMessageSender) Reset() error {
if s.stream != nil {
err := s.stream.Reset()
s.stream = nil
s.connected = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just set the stream to nil? That will free up the resources as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the crash was caused because we were calling SupportsHave() after a Reset():

func (s *streamMessageSender) SupportsHave() bool {
	return s.bsnet.SupportsHave(s.stream.Protocol())
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. We shouldn't even construct a streamMessageSender till we have the stream.

@Stebalien
Copy link
Member

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x14356d7]
goroutine 151492 [running]:
github.com/ipfs/go-bitswap/network.(*streamMessageSender).SupportsHave(0xc004dc0a80, 0x1ef1520)
        pkg/mod/github.com/ipfs/[email protected]/network/ipfs_impl.go:137 +0x37
github.com/ipfs/go-bitswap/internal/messagequeue.(*MessageQueue).sendMessage(0xc00331ac40)
        pkg/mod/github.com/ipfs/[email protected]/internal/messagequeue/messagequeue.go:425 +0x1b8
github.com/ipfs/go-bitswap/internal/messagequeue.(*MessageQueue).sendIfReady(0xc00331ac40)
        pkg/mod/github.com/ipfs/[email protected]/internal/messagequeue/messagequeue.go:406 +0x4b
github.com/ipfs/go-bitswap/internal/messagequeue.(*MessageQueue).runQueue(0xc00331ac40)
        pkg/mod/github.com/ipfs/[email protected]/internal/messagequeue/messagequeue.go:350 +0x2b7
created by github.com/ipfs/go-bitswap/internal/messagequeue.(*MessageQueue).Startup
        pkg/mod/github.com/ipfs/[email protected]/internal/messagequeue/messagequeue.go:300 +0x98

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some remaining questions but this PR fixes the immediate problem.

@Stebalien Stebalien merged commit 9d9719e into master Apr 18, 2020
@Stebalien Stebalien deleted the refactor/conn-mgmt branch April 21, 2020 17:53
Jorropo pushed a commit to Jorropo/go-libipfs that referenced this pull request Jan 26, 2023
Move connection management into networking layer

This commit was moved from ipfs/go-bitswap@9d9719e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants