Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: return the best _acceptable_ conn in NewStream #1604

Merged
merged 2 commits into from
Jul 5, 2022

Conversation

Stebalien
Copy link
Member

Otherwise, we can return, e.g., a transient connection that can't actually be used.

I noticed bitswap was trying to use transient connections, which shouldn't be possible. To really fix this issue, we'd also need to fix #1603 as well, but that's harder.

Otherwise, we can return, e.g., a transient connection that can't
actually be used.
@Stebalien Stebalien requested a review from marten-seemann June 17, 2022 23:19
@marten-seemann
Copy link
Contributor

Not sure I understand the swarm code correctly. Let's assume we only have a transient conn to a peer. We will therefore call dialPeer:

if c == nil {
if nodial, _ := network.GetNoDial(ctx); nodial {
return nil, network.ErrNoConn
}
if dials >= DialAttempts {
return nil, errors.New("max dial attempts exceeded")
}
dials++
var err error
c, err = s.dialPeer(ctx, p)
if err != nil {
return nil, err
}
}

dialPeer will then start a new dial sync (there are currently no active dials), as we only have a transient conn:

// check if we already have an open (usable) connection first
conn := s.bestAcceptableConnToPeer(ctx, p)
if conn != nil {
return conn, nil
}
// apply the DialPeer timeout
ctx, cancel := context.WithTimeout(ctx, network.GetDialPeerTimeout(ctx))
defer cancel()
conn, err = s.dsync.Dial(ctx, p)
if err == nil {
return conn, nil
}

Does that mean we'll end up with two transient connections now?

@Stebalien
Copy link
Member Author

Ah.... ok, this is all broken.

@Stebalien Stebalien closed this Jun 20, 2022
If we have a transient connection, don't want to use a transient
connection, and haven't specified "force direct dial", fail the dial.
@Stebalien Stebalien reopened this Jun 20, 2022
@Stebalien
Copy link
Member Author

Ok, it's "fixed". Horribly broken, but at least it's correct.

Copy link
Collaborator

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing that we have two different concepts of "transient" connections in go-libp2p. Transient as in a limited connection over a relay and transient as in a connection that hasn't been upgraded yet and doesn't have a peerid associated with it yet.


forceDirect, _ := network.GetForceDirectDial(ctx)
if forceDirect && !isDirectConn(conn) {
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here? We may have a transient connection but we'll try to dial again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We need to do that for hole-punching.

@marten-seemann marten-seemann merged commit c8bd6b5 into master Jul 5, 2022
@MarcoPolo MarcoPolo mentioned this pull request Jul 7, 2022
41 tasks
@ajnavarro ajnavarro mentioned this pull request Aug 24, 2022
72 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connect should wait for an acceptable connection
3 participants