Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relay: turn autorelay into a service and always filter out relay addresses #578

Merged
merged 8 commits into from
Apr 6, 2019

Conversation

Stebalien
Copy link
Member

Also, remove the "RelayHost".

@ghost ghost assigned Stebalien Apr 5, 2019
@ghost ghost added the status/in-progress In progress label Apr 5, 2019
@Stebalien Stebalien requested a review from vyzo April 5, 2019 21:19
config/config.go Outdated
// advertise ourselves
// TODO: Why do we only do this when EnableAutoRelay is
// set? This has absolutely _nothing_ to do with
// autorelay.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it has to do with autorelay -- it's the advertisement of relay services for autorelayed hosts!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at this from an "I'm enabling the autorelay subsystem" perspective, yes. However, users will just see an autorelay option. When looking at these as options,

  1. If I enable relay, I expect my node to make use of relays and accept relay connections.
  2. If I enable relay hop, I expect my node to act as a relay. That includes telling other users that I'm willing to be a relay.
  3. If I enable autorelay, I expect my node to automatically pick and advertise a relay.

Having to enable both relay hop + autorelay to get other users to use me as a relay is really not what I would expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need a mechanism to allow users to enable relay for their own purposes (ie their own little fleet of nodes) but without having to be inundated by the torrent of traffic that comes when announcing the relay functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that should really be an option on the relay service. I'll try to document all this to be less confusing but I know we're still going to be confusing users.

@Stebalien
Copy link
Member Author

Hm. Actually, I'm going to try to simply remove both of these custom hosts. I have a feeling we don't actually need them.

We don't need the host wrappers, we can just replace the filter AddrsFactory.

Also, always filter out relay addresses.
@Stebalien Stebalien force-pushed the fix/relay-host-stuff branch from 1d096af to 5b66564 Compare April 6, 2019 02:55
if ar.host.Network().Connectedness(p) == inet.Connected {
// We have a second connection.
return
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes a small race.

config/config.go Show resolved Hide resolved
Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to return/wrap the host constructed by AutoRelay.

config/config.go Outdated
// advertise ourselves
// TODO: Why do we only do this when EnableAutoRelay is
// set? This has absolutely _nothing_ to do with
// autorelay.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it has to do with autorelay -- it's the advertisement of relay services for autorelayed hosts!

// 2. Introduce a service management system (e.g.,
// uber's fx) so we can actually manage the lifetime of
// this service.
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to keep the autorelay host, otherwise we will wrap/return the basic host and all the address manipulation will be voided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it's probably ok, we are overriding the AddressFactory at the end of the day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's not really an abuse of context. Contexts are the only construct go has for denoting dynamic scopes, so it's ok to use it for long-lived ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context in go are rather... overloaded. They are the only construct for dynamic scopes so, in that sense, it's correct to pass them into constructors. However, they're more commonly (e.g., here) used for cancellation and that's where we get into trouble: contexts don't allow waiting for something to finish.

They're useful for requests where either:

  1. You pass the context and then wait on a returned channel. This channel allows you to actually wait for the request.
  2. You pass the context to a synchronous request.
  3. You just don't care about waiting.

However, services should really just provide a Stop() or Close() function.


I bring this up because it prevents orderly shutdown in go-ipfs. For example, I'd like to be able to say:

  1. Shut down all services using the datastore.
  2. Shut down the datastore.

However, when we use contexts to manage services, we can't wait for 1 to finish before we move on to 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how are we going to implement Stop or Close when there are background goroutines doing work? Why, with contextx of course!
So we can't really get rid of them and I think it's better to have one global context for the entire host and tree of services rather than have each service make its own background one with cancel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not arguing against Close methods; all I am saying is that contexts will inevitably be used for this and it's convenient to have a global context with user cancellation.
When we go services, we might have GlobalContext service dependency that provides this context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm explicitly arguing against using contexts for canceling services at all because:

  1. It causes all services to shut down all at once. This has been causing real and really annoying problems in go-ipfs where we'll, e.g., close the blockservice before we stop all services trying to write blocks.
  2. It makes it really easy to incorrectly implement services. Before the transport refactor, we had quite a few services that wouldn't actually stop unless you both canceled the context and called Close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a Close method for services is undeniably a useful abstraction, as it could handle orderly dependency shutdown as well (but it won't be easy).
The global context could be used by services that need a context to manager their internal goroutines, there is nothing stopping them from wrapping this context in a WithCancel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that canceling this context will cause the service to start shutting down before close is called, throwing any orderly shutdown out the window. Now, of course, we could just say "don't cancel this context till you've finished the orderly shutdown" but that's not exactly user-friendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put together a requirements/design document outlining what we actually need from the refactor. That will help us drive alignment and consensus.

I sense we all have slightly different views, understandings and wishes, and it'll cause frustration if we jump into technical solutions before we decide what problems we want to solve.

Things we have discussed:

  • dependency injection
  • service lifecycles
  • hot replacement of runtime services
  • service tree (lifecycle dependencies)
  • eventbus
  • etc.

@vyzo
Copy link
Contributor

vyzo commented Apr 6, 2019

So why are we killing RelayHost? I think we should keep it instead of having the advertisement hiding in the constructor.

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to not wrap the hosts and go back to the basic host, we are overriding the address factory at the end of the day.

Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, on second (and third) reading, this is all fine; it simplifies a bunch by turning the host into a service.

config/config.go Show resolved Hide resolved
// 2. Introduce a service management system (e.g.,
// uber's fx) so we can actually manage the lifetime of
// this service.
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's not really an abuse of context. Contexts are the only construct go has for denoting dynamic scopes, so it's ok to use it for long-lived ones.

// AutoRelayHost is a Host that uses relays for connectivity when a NAT is detected.
type AutoRelayHost struct {
*basic.BasicHost
// AutoRelay is a Host that uses relays for connectivity when a NAT is detected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's no longer a host, but rather a service.

p2p/host/relay/autorelay.go Show resolved Hide resolved
`libp2p.New(libp2p.EnableRelay(circuit.OptHop), libp2p.Routing(makeDHT))`.
They provide Relay Hop services, and advertise through the DHT
in the `/libp2p/relay` namespace

- `AutoRelayHost`s are constructed with `libp2p.New(libp2p.Routing(makeDHT))`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs an update too, it's no longer a host.

@vyzo
Copy link
Contributor

vyzo commented Apr 6, 2019

The documentation has become obsolete.

need := DesiredRelays - len(h.relays)
h.mx.Unlock()
need := DesiredRelays - len(ar.relays)
ar.mx.Unlock()

limit := 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we bump this limit to 50? We have 25 relays deployed plus whatever user-run relays, so we should try to see them all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit that bumps that.

@vyzo vyzo changed the title relay: always filter out relay addresses relay: turn autorelay into a service and always filter out relay addresses Apr 6, 2019
@ghost ghost assigned vyzo Apr 6, 2019
// as a public relay using the provided routing system.
// 2. When this libp2p node is _not_ configured as a relay "hop", it will
// automatically if it is unreachable (e.g., behind a NAT). If so, it will
// find, configure, and announce a set of public relays.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyzo, this is why I don't like this option design. I now have to explain to the user that the "AutoRelay" option can do two very different things depending on how they've configured their relay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoRelay is a system service.
If you are a hop relay, then the functionality is to advertise relay services. If you are a client, then the functionality is to use automatically use the services advertised by hop relays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What service? There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients. That's why this is so damn confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think of autorelay as a single service. But yes, at some level you are right that these are two seperate services and that could potentially be confusing.

Copy link
Member

@raulk raulk Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients.

Actually, it's even more confusing than that. A client can be a consumer of a relay service, or a target of a relayed connection. Currently, all of these flow over the same thick protocol. I really have a hard time wrapping my head around this, and so do our users.

I'd like to segregate this protocol if possible into at least two:

  • /libp2p/circuit/relay/hop/v0.0.0: exposed only by relay nodes. Used inside the initiator -> relay connection.
  • /libp2p/circuit/relay/target/v0.0.0: exposed by clients accepting relayed connections. Used in the relay -> target connection.

This would allow for better implicit observability too. Would've helped a lot in the incident we're analysing.

options.go Outdated
// (circuit.OptHop is passed to EnableRelay), this node will advertise itself
// as a public relay using the provided routing system.
// 2. When this libp2p node is _not_ configured as a relay "hop", it will
// automatically if it is unreachable (e.g., behind a NAT). If so, it will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "detect" word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@Stebalien Stebalien requested a review from vyzo April 6, 2019 15:37
Copy link
Contributor

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Stebalien Stebalien merged commit a5ef3bd into master Apr 6, 2019
@Stebalien Stebalien deleted the fix/relay-host-stuff branch April 6, 2019 15:53
@ghost ghost removed the status/in-progress In progress label Apr 6, 2019
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code review; just contributions to the great discussion.

// as a public relay using the provided routing system.
// 2. When this libp2p node is _not_ configured as a relay "hop", it will
// automatically if it is unreachable (e.g., behind a NAT). If so, it will
// find, configure, and announce a set of public relays.
Copy link
Member

@raulk raulk Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are, and always have been, two completely separate and orthogonal services: one for "hop" relays and one for clients.

Actually, it's even more confusing than that. A client can be a consumer of a relay service, or a target of a relayed connection. Currently, all of these flow over the same thick protocol. I really have a hard time wrapping my head around this, and so do our users.

I'd like to segregate this protocol if possible into at least two:

  • /libp2p/circuit/relay/hop/v0.0.0: exposed only by relay nodes. Used inside the initiator -> relay connection.
  • /libp2p/circuit/relay/target/v0.0.0: exposed by clients accepting relayed connections. Used in the relay -> target connection.

This would allow for better implicit observability too. Would've helped a lot in the incident we're analysing.

// 2. Introduce a service management system (e.g.,
// uber's fx) so we can actually manage the lifetime of
// this service.
_ = relay.NewAutoRelay(swrm.Context(), h, discovery, router)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put together a requirements/design document outlining what we actually need from the refactor. That will help us drive alignment and consensus.

I sense we all have slightly different views, understandings and wishes, and it'll cause frustration if we jump into technical solutions before we decide what problems we want to solve.

Things we have discussed:

  • dependency injection
  • service lifecycles
  • hot replacement of runtime services
  • service tree (lifecycle dependencies)
  • eventbus
  • etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants