Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host/Service interface design #467

Open
vyzo opened this issue Oct 29, 2018 · 33 comments · Fixed by ipfs/kubo#5785
Open

Host/Service interface design #467

vyzo opened this issue Oct 29, 2018 · 33 comments · Fixed by ipfs/kubo#5785
Labels
need/community-input Needs input from the wider community

Comments

@vyzo
Copy link
Contributor

vyzo commented Oct 29, 2018

It is becoming apparent that our Host interface needs to be extended with the concept of services and possibly an event bus.

For instance, the autorelay module in #454 needs access to the Identify service in BasicHost in order to push address updates. It also needs to be able to override the address factory, which necessitates the AutoRelay host wrapping a BasicHost directly.
On top of that, it embeds the ambient autonat service, which we would like to expose to other parts of the system.

Additional considerations include the ability to emit events about things like peer identification completing and other internal state changes that are now handled with delays.

This issue is here to discuss the evolution of our Host interface and drive future refactoring.

@vyzo vyzo added the need/community-input Needs input from the wider community label Oct 29, 2018
@vyzo
Copy link
Contributor Author

vyzo commented Oct 29, 2018

cc @Stebalien @bigs @raulk @diasdavid

@bigs
Copy link
Contributor

bigs commented Oct 29, 2018

under this scheme, would the host become quite thin, bolstered by services and the like? i think that would be nice. the dial refactor underway is a clear example of how we already wrapped up too much logic in a central area (the swarm)

@Stebalien
Copy link
Member

So @raulk pointed me to dig which is pretty much what I was trying to implement myself (just fleshed out and, you know, actually working). I think we may need some kind of "default service" support to fully use it (not sure) but I think it'll vastly simplify the host and possibly even the swarm.

Even better, we can have the libp2p constructor construct it internally by default or accept it as an option. That means go-ipfs can use the same library internally for managing it's services and then share the same "container" with go-libp2p.

Even better still, we don't have to bake this concept into libp2p, in general; we just need to integrate it into the libp2p constructor. Unless they want to, services really don't even need to know we're using this library (as long as they provide reasonable constructors).

We'll also be able to delete all the custom reflection code we're currently using to construct transports. Transports are just a type of service. We can use container.Invoke(func(ts []Transport, sw Swarm) {...}) to register the transports constructed this way (again, without leaking this concept into the rest of the code).

@vyzo
Copy link
Contributor Author

vyzo commented Oct 30, 2018

There are a few issues with the dig api:

  • our services will need a host parameter to be constructed (and possibly a context, unless we expose this from the host), which is not a container itself.
  • concrete structs, not interfaces -- we want interfaces damn it!
  • I would very much have an interface that provides a pointer to the service instead of the Invoke contraption.
  • No ability to shutdown things

@vyzo
Copy link
Contributor Author

vyzo commented Oct 30, 2018

We can use container.Invoke(func(ts []Transport, sw Swarm) {...}) to register the transports constructed this way

I don't think this will work with dig; we need an aggregation of all containers implementing the Transport interface, which dig doesn't seem to have support for.

@vyzo
Copy link
Contributor Author

vyzo commented Oct 30, 2018

I think we should approach this is from exact use cases, so that we can edge a design that is well-tailored to our programming patterns and existing code. I don't think we should base our design on dig just because it's there -- if it fits, that's fine we can use it, but unfortunately I don't see it fitting like a glove.

So let's start with the basic host construct: I think we should redesign the basic host to be modular and implement an extended host interface that provides an API for service activation and access. Wrapper hosts can wrap to provide additional functionality by injecting or activating components, while still conforming to the extended host interface. I also propose to keep the existing Host interface intact so that we minimize the impact to existing code.

The trigger of all this can be our first use case: the autorelay and routed host (mal)abstractions. That's what we want to refactor so that it has a more stebalien-conformant architecture (TM).

Conceptually our abstractions look like this:


Host
 |
 +-- BasicHost
 |      | + Identify
 |      | + AddrFactory
 |      |
 |      +-- RelayHost
 |      |    + AddrFactory
 |      |
 |      +-- AutoRelayHost
 |            + AmbientAutoNAT
 |            + Discovery
 |               + ContentRouting
 |            + AddrFactory
 |
 +-- RoutedHost
      + PeerRouting

The obvious problem with this is that the *RelayHosts depend on the basic host because they need to access specific functionality. The bigger problem is that all the services we want to access are not accessible through a high level interface and are instead burried in concrete implementations.

But if we rebase everything on ExtendedHost (tentatively; naming is hard) and interpret the + as service activations, then it doesn't look so bad:

ExtendedHost
 |
 +-- BasicHost
 |    + Identify
 |    + AddrFactory
 |      
 +-- RelayHost
 |    + AddrFactory
 |
 +-- AutoRelayHost
 |    + AmbientAutoNAT
 |    + Discovery
 |        + ContentRouting
 |    + AddrFactory
 |
 +-- RoutedHost
      + PeerRouting

So let's start with the ExtendedHost interface and the BasicHost (re)implementation, and we will arrive to our service manager/dependency injector/buzzword of the day description for our component system.

@vyzo vyzo mentioned this issue Oct 30, 2018
8 tasks
@Stebalien
Copy link
Member

Comments on Dig:

First off, I'm not married to dig. I'd be happy with some other service manager, it's just that dig looks like it fits the bill (although I agree the interface looks a bit funky at first glance, I'd be fine wrapping it).

concrete structs, not interfaces -- we want interfaces damn it!

I had the same reaction but @raulk pointed out that we can use interfaces. We just have to explicitly specify the interface types we want to satisfy. That is:

func MyConstructor() (*Concrete, InterfaceA, InterfaceB, InterfaceC, error) {
}

(or it can use the struct form)

Really, I don't think we want to say "just give me something that implements interface X". Unfortunately, in go, that could be anything. It also doesn't allow users to choose which services should do what.

our services will need a host parameter to be constructed (and possibly a context, unless we expose this from the host), which is not a container itself.

How hard would it be to get rid of that context? We should just have close functions on everything and that "shutdown" context really needs to go (that's not what contexts are for!).

As for the host, we should be able to do this in stages, right? Although it will make wrapping tricky.

I would very much have an interface that provides a pointer to the service instead of the Invoke contraption.

The alternative would have been to pass in pointers to interfaces and have dig fill them. IMO, there way is actually cleaner.

Note: Only the libp2p constructor will need to call Invoke.

No ability to shutdown things

We'd probably need to add support for this (my half-finish service manager did provide this but didn't provide a bunch of other nice features that dig provides).

I don't think this will work with dig; we need an aggregation of all containers implementing the Transport interface, which dig doesn't seem to have support for.

Dig does provide aggregation, as long as the providers implement the same type.

I don't think we should base our design on dig just because it's there -- if it fits, that's fine we can use it, but unfortunately I don't see it fitting like a glove.

So, I'm suggesting dig because I started writing something very similar to fit our use-case and then raul introduced me to dig. Now, I agree, dig doesn't quite fit. However, I think we can make a few changes (that upstream will likely accept) to make it work for us. If push comes to shove, we can always wrap it in something that works better (but I'd rather not).

Additionally, I believe we can do this without leaking dig into all parts of our system (contain it (mostly) to the constructor). If we can't, we should probably ditch it. But I think we should try.


Comments on ExtendedHost:

But if we rebase everything on ExtendedHost (tentatively; naming is hard) and interpret the + as service activations, then it doesn't look so bad:

Won't this turn into a kitchen sink host? My interest in some service manager (maybe not dig but something like it) is avoid having to modify the host every time we want to introduce a new type of service.

@vyzo
Copy link
Contributor Author

vyzo commented Oct 30, 2018

Won't this turn into a kitchen sink host? My interest in some service manager (maybe not dig but something like it) is avoid having to modify the host every time we want to introduce a new type of service.

Oh, the ExtendedHost will be just Host + the interface to the service manager!
The basic host will be the concrete base implementation with a service manager and activation of essential services.

@vyzo
Copy link
Contributor Author

vyzo commented Oct 30, 2018

However, I think we can make a few changes (that upstream will likely accept) to make it work for us.

Sure, but let's sketch out our own service manager first, and then adapt that design to what dig can do (and what it can be made to do)?
As for upstream, we don't care about that initially. We can fork, develop the changes necessary to suit our needs, and when we are happy we can submit patches upstream.

@bigs
Copy link
Contributor

bigs commented Oct 30, 2018

what if modules that wish to provide services export a Service interface with subpackages or implementation packages that provide Service structs that implement it. that way we can have an alias or token of sorts to match against with some reflection based API like dig or something of our own creation. we could even have a meta Shutdown function that iterates through all provided structs for things that implement io.Closer and closes them

@cannium
Copy link
Contributor

cannium commented Oct 31, 2018

I should say I could smell gunpowder in the air. Below are some random thoughts that might be related, ignore me if off-topic.

  • It seems ipfs know too many details about libp2p(say, init/start/bootstrap libp2p related instances a couple of times). IMHO ipfs should only hold a handle to libp2p and query everything from it.
  • What if split the current identify into a service(a running part) and maintain data in peerstore or somewhere(a "persistence" part). As identify is so fundamental to libp2p, other services might need to access its data. Also, currently identify's private data intersect with peerstore.

@raulk
Copy link
Member

raulk commented Oct 31, 2018

Just a fleeting thought. In my view, Identify is really part of a larger “wire protocol” than encompasses hello, disconnect, maybe pings/pongs, goaways, throttle/backoffs, etc. Anything that governs the lifecycle of a connection.

@cannium
Copy link
Contributor

cannium commented Nov 1, 2018

btw, golang has an "official" dependency injection lib wire

@vyzo
Copy link
Contributor Author

vyzo commented Nov 1, 2018

This is more suited for static compile time dependencies. We need runtime instantiation and extension.

@Stebalien
Copy link
Member

Oh, the ExtendedHost will be just Host + the interface to the service manager!

Got it. I think I'll need to see some code/interfaces to really understand where the service manager would plug in in that architecture.

@cannium
Copy link
Contributor

cannium commented Nov 6, 2018

After think for a while(really!), I come up with the abstraction I think is right. Please take a look.

I'd always like the Linus way to design a system, which is to look at data to manipulate first. For IPFS as a whole, we need ability to 1) translate a name to a hash(currently ipns), 2) find peers who can provide the hash, 3) find a way to connect the peer(tcp, relay, etc), 4) get the data behind hash. So we need 4 types of data structures with names(I coined) as "NameStore", "ContentStore", "PeerStore", and "DataStore":

  • NameStore: map a human-readable name to its hash
  • ContentStore: map a peer(peer ID) to hashes it owns, but usually the reverse map is more useful, which is to find peers for specific hash
  • PeerStore: map a peer to its related connection information, like addresses, active connections, etc
  • DataStore: map a hash to actual bytes

We need a way to observe changes on those stores, an event bus or some callbacks should both apply. Some contents need persistent store(e.g. DataStore), some can only reside in memory(e.g. active connections).
Services are built on those stores, for example, connection manager(or swarm service?) should be able to access PeerStore; kad-dht service should be able to access ContentStore and PeerStore. Services communicate by API calls or message bus.

Now zoom in to libp2p. I think the main purpose of libp2p is to solve the problem 3 above. Ideally one service in libp2p should own(has exclusive write access to) part of PeerStore, while other services could only have read access to that part.
Take autonat as an example, it owns the info about "if I'm behind an NAT". When identify announces about a node itself, it should query this info; when other nodes try to connect to this node, it should query this node's announced info. As you're investigating, some kind of plugin mechanism is necessary, so autonat could plug the info-check code into identify. Still, and also noted by @raulk, I'd like to see identify become a more generic service(maybe a "meta-service" in libp2p) to handle connection metadata and life cycle.
Take those "hosts" as another example, the RoutedHost and BasicHost actually query different parts of PeerStore, so instead of currently nested calls to find address of a peer ID, a step-by-step model may be more plugin-friendly. For such a query, the procedure becomes 1) query PeerStore, if found, use it; if not, call kad-dht service(or some other, just a placeholder, you get the idea) and register a callback on PeerStore 2) if DHT could find it, update PeerStore and trigger the callback; if not found, Connect timeout and throw an error. Connect only read PeerStore, leave write access to DHTs and name resolvers.

@Stebalien
Copy link
Member

@cannium This issue is really about that plugin system you referred to. We've found that our current system of manually wiring together all of our services requires us to make architectural changes (at least to the libp2p constructor) every time we want to add a new service or dependency.

@vyzo
Copy link
Contributor Author

vyzo commented Dec 19, 2018

this was accidentally closed.

@vyzo vyzo reopened this Dec 19, 2018
@cannium
Copy link
Contributor

cannium commented Dec 20, 2018

@Stebalien From my observation, there exist some dependency circles between services so initializing new services is really painful. Refactoring is unavoidable, and we all know it's not easy. Recently I'm learning rust, its concept of ownership and lifetime are very inspiring. Although we don't have those rules built in compiler in go, a good practice is good no matter what language is used. Now that auto-relay is merged, no more OKR pressures, I think you guys as maintainers should take time to do some cleaning, like Snow Leopard and High Sierra did to macOS. I'd like to get involved if that's your plan.

@anacrolix
Copy link
Contributor

I'm finding the layered Host abstractions very difficult. Can the services be completely divorced from the Host abstraction? Where services depend on one another can be exposed in those services individual constructors. For instance if I want auto relaying, I create the host, then the identify, NAT, auto NAT services as required, before passing references to them to the auto relay constructor. If there is a circular dependency, it should be removed. The host implementation can be redesigned to enable switching of whatever fundamental behaviour is dependent on it, or allow initialization to be deferred until all services are registered. In other words, build the service dependency graph first, then initiate the host.

@anacrolix
Copy link
Contributor

Also there's a Go idiom "return concrete types, accept interfaces" that's very relevant here. Constructors that return interfaces are an indication that there might already be too much abstraction (for instance the host nesting that occurs currently). Returning concrete types allows more specialization of the type's value before passing it around. To my understanding "github.com/libp2p/go-libp2p".New should be "github.com/libp2p/go-libp2p".NewHost that returns a concrete "github.com/libp2p/go-libp2p".*Host, that just happens to implement the Host interface defined elsewhere. Possibly the concrete Host type might be StandardHost or something along those lines to indicate that it's the usual, actual-net-affecting implementation.

@Stebalien
Copy link
Member

I'm finding the layered Host abstractions very difficult. Can the services be completely divorced from the Host abstraction?

That's the hope here (and much of the motivation). We're finding ourselves in a place where we need to either create a new host or refactor an existing host whenever we want to add a new service.

build the service dependency graph first, then initiate the host.

That's what we're hoping to use dependency injection for. Basically:

  1. The options will create a set of functions with inputs and outputs.
  2. The dependency injection library will create a graph from these functions.
  3. The dependency injection library will walk the dependency graph, starting everything in-order.

To my understanding "github.com/libp2p/go-libp2p".New should be "github.com/libp2p/go-libp2p".NewHost that returns a concrete "github.com/libp2p/go-libp2p".*Host, that just happens to implement the Host interface defined elsewhere.

New is special and may warrant a bit more abstraction. We have concrete constructors for specific host types but New is kind of a meta-constructor that picks the right host implementation depending on the options.

Eventually, we may be able to have a single host implementation with no wrappers but, even then, we'll probably want to allow users to pass in a custom host constructor as an option.

@vyzo
Copy link
Contributor Author

vyzo commented Jan 9, 2019

Perhaps we should start with the omnihost instead of making it an eventual goal.

@anacrolix
Copy link
Contributor

Relevant appeal to authority on interfaces.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 10, 2019

Another example might be https://godoc.org/go.uber.org/fx which supports application lifecycles.

@Stebalien
Copy link
Member

@anacrolix,

Relevant appeal to authority on interfaces.

From #467 (comment),

New is special and may warrant a bit more abstraction. We have concrete constructors for specific host types but New is kind of a meta-constructor that picks the right host implementation depending on the options.

Basically, New isn't a constructor for a concrete type (we have separate constructors for those types). It's a "constructor" for a libp2p "node".


@Kubuxu

That looks like what we're looking for although I'm a bit worried it's not dynamic enough. Ideally, the user would be able to pass in their own "service manager" instance and have libp2p use that. Personally, I want this so we can use the same "service manager" in both go-ipfs and go-libp2p.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 11, 2019

We can always modify one or the other or look for other examples.
I would prefer us not going through writing DI infrastructure if we can use something already existing.

@bigs
Copy link
Contributor

bigs commented Feb 11, 2019

i had not seen fx! that looks perfect

@Kubuxu
Copy link
Member

Kubuxu commented Feb 11, 2019

It doesn't support assigning structs to interfaces as far as I can see but we can either add that (based on dig`) or decide we don't need it. Or we can add life cycles to Dig, it seems like it would work as an additional dependency.

@Kubuxu
Copy link
Member

Kubuxu commented Feb 11, 2019

Aah, I didn't notice it at first but Fx expands upon Dig. It is essentially a wrapper around Dig.

@bigs
Copy link
Contributor

bigs commented Feb 14, 2019

imo it's still worth starting with fx at least in an exploratory capacity!

@magik6k
Copy link
Contributor

magik6k commented Apr 15, 2019

I've started refactoring go-ipfs to use fx - ipfs/kubo#6162.

So far it looks good, if we ever need more dynamic capabilities, it should be relatively easy to add them to fx

@bigs
Copy link
Contributor

bigs commented Apr 18, 2019

the recent merging of ipfs/kubo#6162 is super inspiring and, i think, an indicator that we should follow suit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/community-input Needs input from the wider community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants