-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relay: turn autorelay into a service and always filter out relay addresses #578
Conversation
config/config.go
Outdated
// advertise ourselves | ||
// TODO: Why do we only do this when EnableAutoRelay is | ||
// set? This has absolutely _nothing_ to do with | ||
// autorelay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it has to do with autorelay -- it's the advertisement of relay services for autorelayed hosts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at this from an "I'm enabling the autorelay subsystem" perspective, yes. However, users will just see an autorelay option. When looking at these as options,
- If I enable relay, I expect my node to make use of relays and accept relay connections.
- If I enable relay hop, I expect my node to act as a relay. That includes telling other users that I'm willing to be a relay.
- If I enable autorelay, I expect my node to automatically pick and advertise a relay.
Having to enable both relay hop + autorelay to get other users to use me as a relay is really not what I would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need a mechanism to allow users to enable relay for their own purposes (ie their own little fleet of nodes) but without having to be inundated by the torrent of traffic that comes when announcing the relay functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that should really be an option on the relay service. I'll try to document all this to be less confusing but I know we're still going to be confusing users.
Hm. Actually, I'm going to try to simply remove both of these custom hosts. I have a feeling we don't actually need them. |
We don't need the host wrappers, we can just replace the filter AddrsFactory. Also, always filter out relay addresses.
1d096af
to
5b66564
Compare
if ar.host.Network().Connectedness(p) == inet.Connected { | ||
// We have a second connection. | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes a small race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to return/wrap the host constructed by AutoRelay.
config/config.go
Outdated
// advertise ourselves | ||
// TODO: Why do we only do this when EnableAutoRelay is | ||
// set? This has absolutely _nothing_ to do with | ||
// autorelay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it has to do with autorelay -- it's the advertisement of relay services for autorelayed hosts!
// 2. Introduce a service management system (e.g., | ||
// uber's fx) so we can actually manage the lifetime of | ||
// this service. | ||
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to keep the autorelay host, otherwise we will wrap/return the basic host and all the address manipulation will be voided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it's probably ok, we are overriding the AddressFactory
at the end of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's not really an abuse of context. Contexts are the only construct go has for denoting dynamic scopes, so it's ok to use it for long-lived ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context in go are rather... overloaded. They are the only construct for dynamic scopes so, in that sense, it's correct to pass them into constructors. However, they're more commonly (e.g., here) used for cancellation and that's where we get into trouble: contexts don't allow waiting for something to finish.
They're useful for requests where either:
- You pass the context and then wait on a returned channel. This channel allows you to actually wait for the request.
- You pass the context to a synchronous request.
- You just don't care about waiting.
However, services should really just provide a Stop()
or Close()
function.
I bring this up because it prevents orderly shutdown in go-ipfs. For example, I'd like to be able to say:
- Shut down all services using the datastore.
- Shut down the datastore.
However, when we use contexts to manage services, we can't wait for 1 to finish before we move on to 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how are we going to implement Stop
or Close
when there are background goroutines doing work? Why, with contextx of course!
So we can't really get rid of them and I think it's better to have one global context for the entire host and tree of services rather than have each service make its own background one with cancel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not arguing against Close
methods; all I am saying is that contexts will inevitably be used for this and it's convenient to have a global context with user cancellation.
When we go services, we might have GlobalContext
service dependency that provides this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm explicitly arguing against using contexts for canceling services at all because:
- It causes all services to shut down all at once. This has been causing real and really annoying problems in go-ipfs where we'll, e.g., close the blockservice before we stop all services trying to write blocks.
- It makes it really easy to incorrectly implement services. Before the transport refactor, we had quite a few services that wouldn't actually stop unless you both canceled the context and called Close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a Close
method for services is undeniably a useful abstraction, as it could handle orderly dependency shutdown as well (but it won't be easy).
The global context could be used by services that need a context to manager their internal goroutines, there is nothing stopping them from wrapping this context in a WithCancel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that canceling this context will cause the service to start shutting down before close is called, throwing any orderly shutdown out the window. Now, of course, we could just say "don't cancel this context till you've finished the orderly shutdown" but that's not exactly user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put together a requirements/design document outlining what we actually need from the refactor. That will help us drive alignment and consensus.
I sense we all have slightly different views, understandings and wishes, and it'll cause frustration if we jump into technical solutions before we decide what problems we want to solve.
Things we have discussed:
- dependency injection
- service lifecycles
- hot replacement of runtime services
- service tree (lifecycle dependencies)
- eventbus
- etc.
So why are we killing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to not wrap the hosts and go back to the basic host, we are overriding the address factory at the end of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, on second (and third) reading, this is all fine; it simplifies a bunch by turning the host into a service.
// 2. Introduce a service management system (e.g., | ||
// uber's fx) so we can actually manage the lifetime of | ||
// this service. | ||
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's not really an abuse of context. Contexts are the only construct go has for denoting dynamic scopes, so it's ok to use it for long-lived ones.
// AutoRelayHost is a Host that uses relays for connectivity when a NAT is detected. | ||
type AutoRelayHost struct { | ||
*basic.BasicHost | ||
// AutoRelay is a Host that uses relays for connectivity when a NAT is detected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's no longer a host, but rather a service.
p2p/host/relay/doc.go
Outdated
`libp2p.New(libp2p.EnableRelay(circuit.OptHop), libp2p.Routing(makeDHT))`. | ||
They provide Relay Hop services, and advertise through the DHT | ||
in the `/libp2p/relay` namespace | ||
|
||
- `AutoRelayHost`s are constructed with `libp2p.New(libp2p.Routing(makeDHT))` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs an update too, it's no longer a host.
The documentation has become obsolete. |
p2p/host/relay/autorelay.go
Outdated
need := DesiredRelays - len(h.relays) | ||
h.mx.Unlock() | ||
need := DesiredRelays - len(ar.relays) | ||
ar.mx.Unlock() | ||
|
||
limit := 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we bump this limit to 50? We have 25 relays deployed plus whatever user-run relays, so we should try to see them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a commit that bumps that.
These are issues for another day.
// as a public relay using the provided routing system. | ||
// 2. When this libp2p node is _not_ configured as a relay "hop", it will | ||
// automatically if it is unreachable (e.g., behind a NAT). If so, it will | ||
// find, configure, and announce a set of public relays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vyzo, this is why I don't like this option design. I now have to explain to the user that the "AutoRelay" option can do two very different things depending on how they've configured their relay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AutoRelay is a system service.
If you are a hop relay, then the functionality is to advertise relay services. If you are a client, then the functionality is to use automatically use the services advertised by hop relays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What service? There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients. That's why this is so damn confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to think of autorelay
as a single service. But yes, at some level you are right that these are two seperate services and that could potentially be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients.
Actually, it's even more confusing than that. A client can be a consumer of a relay service, or a target of a relayed connection. Currently, all of these flow over the same thick protocol. I really have a hard time wrapping my head around this, and so do our users.
I'd like to segregate this protocol if possible into at least two:
/libp2p/circuit/relay/hop/v0.0.0
: exposed only by relay nodes. Used inside the initiator -> relay connection./libp2p/circuit/relay/target/v0.0.0
: exposed by clients accepting relayed connections. Used in the relay -> target connection.
This would allow for better implicit observability too. Would've helped a lot in the incident we're analysing.
options.go
Outdated
// (circuit.OptHop is passed to EnableRelay), this node will advertise itself | ||
// as a public relay using the provided routing system. | ||
// 2. When this libp2p node is _not_ configured as a relay "hop", it will | ||
// automatically if it is unreachable (e.g., behind a NAT). If so, it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "detect" word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No code review; just contributions to the great discussion.
// as a public relay using the provided routing system. | ||
// 2. When this libp2p node is _not_ configured as a relay "hop", it will | ||
// automatically if it is unreachable (e.g., behind a NAT). If so, it will | ||
// find, configure, and announce a set of public relays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients.
Actually, it's even more confusing than that. A client can be a consumer of a relay service, or a target of a relayed connection. Currently, all of these flow over the same thick protocol. I really have a hard time wrapping my head around this, and so do our users.
I'd like to segregate this protocol if possible into at least two:
/libp2p/circuit/relay/hop/v0.0.0
: exposed only by relay nodes. Used inside the initiator -> relay connection./libp2p/circuit/relay/target/v0.0.0
: exposed by clients accepting relayed connections. Used in the relay -> target connection.
This would allow for better implicit observability too. Would've helped a lot in the incident we're analysing.
// 2. Introduce a service management system (e.g., | ||
// uber's fx) so we can actually manage the lifetime of | ||
// this service. | ||
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put together a requirements/design document outlining what we actually need from the refactor. That will help us drive alignment and consensus.
I sense we all have slightly different views, understandings and wishes, and it'll cause frustration if we jump into technical solutions before we decide what problems we want to solve.
Things we have discussed:
- dependency injection
- service lifecycles
- hot replacement of runtime services
- service tree (lifecycle dependencies)
- eventbus
- etc.
Also, remove the "RelayHost".