-
Notifications
You must be signed in to change notification settings - Fork 12
Conversation
fix bug introduced in last commit
connection state can be determined through the network interface on demand
addresses service TODO
svc.go
Outdated
log.Debugf("error dialing %s: %s", pi.ID.Pretty(), err.Error()) | ||
// wait for the context to timeout to avoid leaking timing information | ||
// this renders the service ineffective as a port scanner | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to select
here.
<-ctx.Done()
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed; habbit.
autonat.go
Outdated
type NATStatus int | ||
|
||
const ( | ||
NATStatusUnknown = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give those constants explicit type of NATStatus
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
autonat.go
Outdated
func (as *AutoNATState) background() { | ||
// wait a bit for the node to come online and establish some connections | ||
// before starting autodetection | ||
time.Sleep(10 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10s might be a bit short here. Automatic port forwarding has about 10s timeout.
From another hand, if it is timing out we won't get and forwarding either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing, it might be useful for it to be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can bump it a bit, say to 15s.
Re: testing
I don't think it will be useful, as we won't be able to unit-test this aspect with go test.
I plan to run some test programs and try it in the wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then again we probably do want to test the autodiscovery without waiting for this time, so maybe we can make it a variable.
We can revisit when I have the test suite ready.
autonat.go
Outdated
func (as *AutoNATState) background() { | ||
// wait a bit for the node to come online and establish some connections | ||
// before starting autodetection | ||
time.Sleep(15 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pull these durations out into named constants at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autonat.go
Outdated
PublicAddr() (ma.Multiaddr, error) | ||
} | ||
|
||
type AutoNATState struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming here is a bit weird. Its not clear from reading that AutoNATState
is an implementation of AutoNAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm, good point. Easy to fix, should I call it AutoNATImpl
? boring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like AmbientAutoNAT
:
- it conveys that it is an
AutoNAT
instance - it conveys that it is ambient, meaning that the user doesn't have anything to do other than creating an instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to AmbientAutoNAT
in 1562e1b
autonat.go
Outdated
shufflePeers(peers) | ||
|
||
for _, p := range peers { | ||
cli := NewAutoNATClient(as.host, p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a single use thingy here feels a bit weird. Maybe just make the dial method take the host and peer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also reuse the client objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on the other hand it's a very simple object, with no state. Not sure it's worth the trouble to cache it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make sense to make it take a peer so that we can reuse the object for all peers in the interaction; no need to hold it across function calls i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in bb5cad4
The client object is reused for dialing all peers as needed.
autonat.go
Outdated
|
||
for _, p := range peers { | ||
cli := NewAutoNATClient(as.host, p) | ||
ctx, cancel := context.WithTimeout(as.ctx, 60*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can give a name to the incantation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
named it and made it a variable so that we can unit test; fa14117
autonat.go
Outdated
return | ||
} | ||
|
||
as.mx.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to put this locked section into a separate method so we can use defers nicely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in d16ca79
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a lot simpler than I thought it would be. Nice!
autonat.go
Outdated
func (as *AmbientAutoNAT) background() { | ||
// wait a bit for the node to come online and establish some connections | ||
// before starting autodetection | ||
time.Sleep(AutoNATBootDelay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that important but this should probably select in a context and a time.After
.
autonat.go
Outdated
|
||
peers := make([]peer.ID, 0, len(as.peers)) | ||
for p := range as.peers { | ||
if len(as.host.Network().ConnsToPeer(p)) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Network().Connectedness() == inet.Connected
avoids allocating.
// NAT status is publicly dialable | ||
NATStatusPublic | ||
// NAT status is private network | ||
NATStatusPrivate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "no nat"? Do we need that state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that state mean though? We have Uknown and Public -- no nat is equivalent to public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was thinking:
- NatStatusPrivate -> Behind a nat and undiablable.
- NatStatusPublic -> Behind a nat and dialable.
(although we may not need to track the undialable case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more of "dialable" or not "dialable".
Do we gain anything by knowing that there is no NAT whatsoever?
Note that the inference might be hard to make.
client.go
Outdated
// AutoNATClient is a stateless client interface to AutoNAT peers | ||
type AutoNATClient interface { | ||
// Dial requests from a peer providing AutoNAT services to test dial back | ||
Dial(ctx context.Context, p peer.ID) (ma.Multiaddr, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this something else? When I see "Dial" I think "establish a connection". When I saw this function used in the code, I had absolutely no idea why dialing a peer would tell us anything about our NAT status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will rename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Called it DialBack
.
log.Debugf("error dialing %s: %s", pi.ID.Pretty(), err.Error()) | ||
// wait for the context to timeout to avoid leaking timing information | ||
// this renders the service ineffective as a port scanner | ||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
svc.go
Outdated
} | ||
|
||
ra := conns[0].RemoteMultiaddr() | ||
conns[0].Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to call as.dialer.Network.ClosePeer
. We can, unfortunately, successfully open multiple connections.
// NewAutoNATService creates a new AutoNATService instance attached to a host | ||
func NewAutoNATService(ctx context.Context, h host.Host, opts ...libp2p.Option) (*AutoNATService, error) { | ||
opts = append(opts, libp2p.NoListenAddrs) | ||
dialer, err := libp2p.New(ctx, opts...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could configure libp2p with a custom connection manager that just kills all connections after 30s. Just to be extra safe :wink.
func (as *AmbientAutoNAT) OpenedStream(net inet.Network, s inet.Stream) {} | ||
func (as *AmbientAutoNAT) ClosedStream(net inet.Network, s inet.Stream) {} | ||
|
||
func (as *AmbientAutoNAT) Connected(net inet.Network, c inet.Conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we really don't need a large set of peers that support this protocol. Instead of testing every one, How about we:
a. Keep a list of known autonat peers (discovered as we try to use them, not when we first connect).
b. Keep a list of known non-autonat peers.
Then, periodically*, we can:
- Try every connected peer in the known good set.
- Try every open connection not in the bad set, adding peers to the good set and bad set as we try them.
That way we aren't unnecessarily noisy.
*later, we can get even fancier and set the period to be "time since last inbound connection from a public address", or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can reduce the noise by simply checking on the protocols reported by identify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to look at the protocols reported by identify through the peerstore in 46d352f
I don't quite like the delay for identify, but it seems to be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like the delay for identify, but it seems to be necessary.
Yeah... that annoys me to me to no end as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identify
keeps the stream open after identification. If we made it close the stream, we could hook onto the ClosedStream
event to know when identify was over deterministically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's really not. I know it should work, but that's just horrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but that's all we have now 😅
Should we think about setting up some kind of "in-mem event bus" so that different layers of libp2p can emit and react to events? Identify would then emit a protocols:identify/1.0.0:complete
event when it finished, or a ...:error
one if it failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... Ideally services would just hook into identify (or the peerstore? that doesn't seem right) and get called when we connect to a peer supporting protocol X.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the delay configurable, with an initial value of 5 sec (per @magik6k's suggestion)
notify.go
Outdated
|
||
go func() { | ||
// add some delay for identify | ||
time.Sleep(250 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just walk over each connected peer and check lazily? That way, we don't even need this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we gain anything by the lazy check.
We'll have to iterate through the list of peers, which also adds complexity for recognizing new autonat servers after we have iterated once.
So we might as well do it at connect time, the goroutine cost is marginal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not worried about the goroutine cost, I'm worried about the race. However, this is probably fine as an initial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race is probably benign, as we are just collecting peer IDs of peers that implement the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm just worried that if the peer supported AutoNAT –but their Identify took longer than 250ms– we will have missed that peer forever, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can increase a tad... but how much is too much?
autonat.go
Outdated
} | ||
} | ||
|
||
if len(peers) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this black or white decision could yield erratic results, e.g. if you only have 1 active connection to an autonat peer, we're going to restrict our query to a single peer. I much rather have a minimum threshold we strive for, e.g. 5 peers, for resilience purposes, starting with peers we hold a connection to.
As it is, the autodetect
does a round-robin, so no risk of establishing redundant connections if we shuffle the connected and non-connected sublists separately, i.e.
A..G: connected
U..Z: not connected
A B C D E F G || U V W X Y Z
<- shuffle -> <- shuffle ->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. The code tries to use an already existing connection purely to avoid creating unnecessary new connections.
Do you want to try multiple peers? And each peer multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code tries to use an already existing connection purely to avoid creating unnecessary new connections.
Currently, if we happen to be connected to 1 autonat peer only, we'll restrict ourselves to it. If it fails, we're out of luck. This makes us fragile, especially because we expect scarcity in autonat peers.
What I'm proposing is to target N peers (e.g. 5), preferring connected peers, and falling back to disconnected ones to fill up the slice. To avoid connected and unconnected peers getting mixed up in the shuffle, we keep track of the pivot index and shuffle both sublists separately.
Since autodetect
is round-robin, it'll only resort to disconnected peers if the connected ones fail. This makes us more resilient overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but do we want to dial more than one peers when we get a DIAL_ERROR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I'm not sure I have the answer. Right now we flag NATStatusPrivate
on the first dial err, and abort. However, what if that peer is behind some kind of firewall (corporate, geographical, etc.)? Wouldn't it be better to corroborate with more observations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could do a few more tries if we have more known autonat peers, but accept the failure if we don't have enough.
I would arbitrarily go for "3 times is enemy action".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could do a few more tries if we have more known autonat peers, but accept the failure if we don't have enough.
Yep. If we don't have enough, we'd defer to the next iteration. If by then we've found more autonat peers, with this new logic we'll query them even if not connected, and hence have a chance to improve our connectivity.
I would arbitrarily go for "3 times is enemy action".
We detect "enemy action" on the receiving side through the throttling, no? (3 is fine for that)
That makes me realise that we should probably move peers who have sent us E_DIAL_REFUSED
or E_INTERNAL_ERROR
to a blacklist, to avoid dwelling on them, and to be well-behaved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably too much complexity for marginal improvement :)
Also, I think I want some slightly more clever strategy for making multiple dial attempts -- if our nat status was unknown or public, then try 3 times.
If it was private, then a single failure should be enough to convince us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented the "3 times is enemy action" strategy in aadb8db, with memory of past failures so that it stops asking multiple peers once it has enough confidence we are NATed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In d7f55b0 we ensure that we have at least 3 autonat peers in the candidate set, even when we are connected to less than that.
This uses the strategy you suggested for ordering.
func (as *AmbientAutoNAT) OpenedStream(net inet.Network, s inet.Stream) {} | ||
func (as *AmbientAutoNAT) ClosedStream(net inet.Network, s inet.Stream) {} | ||
|
||
func (as *AmbientAutoNAT) Connected(net inet.Network, c inet.Conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identify
keeps the stream open after identification. If we made it close the stream, we could hook onto the ClosedStream
event to know when identify was over deterministically.
svc.go
Outdated
// rate limit check | ||
as.mx.Lock() | ||
_, ok := as.peers[pi.ID] | ||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- Perhaps this n=1 limiting is too optimistic. The first attempt could've failed due to an intermittent issue, not because the address is unroutable. How about allowing n=3 (or configurable) grace attempts per peer?
- It's confusing that
AutoNATService
andAmbientAutoNAT
both have apeers
field, and they're both referred to byas.peers
in code. I suggest renaming this one todialedPeers
to reduce ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, both peers
maps are never cleaned, so they'll keep growing forever. Any thoughts on adding a cleanup routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a related note, both peers maps are never cleaned, so they'll keep growing forever. Any thoughts on adding a cleanup routine?
The number of autonat peers is expected to be relatively small for the network size and we'd like to know about our peers even if we are not connected to them any more.
If this becomes a problem, we can add some cleanup logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about allowing n=3 (or configurable) grace attempts per peer?
ok, although we should probably have some backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing that AutoNATService and AmbientAutoNAT both have a peers field, and they're both referred to by as.peers in code.
They serve different purposes, but they are both sequences of peer ids -- hence the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a comment in the struct though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented configurable throttle in 8ea9f1b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They serve different purposes, but they are both sequences of peer ids -- hence the name.
It took me several reads to understand that as.peers
referred to different things in the same package. Let's be nice to future maintainers and rename one of them to contextualise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fine. I'll be nice to future self and call the limiter requests
reqs
.
3 times is enemy action; this is more resilient to transient dial errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, there are some points in other discussions which seem to need adressing
"hash": "QmPL3AKtiaQyYpchZceXBZhZ3MSnoGqJvLZrc7fzDTTQdJ", | ||
"name": "go-libp2p", | ||
"version": "6.0.19" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be missing a few deps, but since this needs to be split anyways I guess it's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the rationale was that all the deps get pulled by go-libp2p, but it will be split and not depend on that at all.
Extract service implementation from go-libp2p-autonat
Provides an ambient NAT auto-discovery service.
TBD: