-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to allow out-of-tree router implementations #199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I believe we need to export from
from pubsub.RPC
.
|
||
// Eval schedules a thunk for evaluation in the pubsub process loop. | ||
// Returns false if the evaluation was aborted because the pubsub system has been shutdown. | ||
Eval(thunk func()) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is funky.
- Are we allowed to use the rest of the interface outside of a "thunk"?
- Are we using this to protect gossipsub state? If so, what if we just used a lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface in general is unsafe to use outside of the event loop, this should probably be specified in the documentation; will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I wonder if we should put that part into a separate interface and pass it to the thunk? That should make it clear that that's the only safe way to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Eval
wait for the thunk to eval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It waits for the evaluation to start, but it doesn't wait for completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked the unsafe outside the event loop methods as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still like to consider moving these to a separate interface and passing it into the thunk. I really wish go had some kind of goroutine local storage so we could just make all this automatic but it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the interface gets attached in Router.Attach
and can get used without explicitly re-entering with Eval
-- all callbacks to the router from the pubsub system are inside the event loop and that would be awkward to use.
I think the contract should be "you call the pubsub system inside the event loop; if you have a goroutine that is outside (eg a timer, like in gossipsub), use Eval to re-enter the event loop."
We should probably document this in the Router
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed synchronously: All the router callbacks are also called on the main thread so we'd have to feed this new interface in there as well. TL;DR: not worth it.
Instead, we should:
- Carefully document this.
- Maybe call this
InvokeInMainThread
(or something less java-y).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a smell. Alternative design proposal in comments above.
pb/rpc.proto
Outdated
@@ -12,6 +12,7 @@ message RPC { | |||
} | |||
|
|||
optional ControlMessage control = 3; | |||
optional bytes extData = 4; // protocol extension data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gofmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops! emacs did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(use-package go-mode
:config (gofmt-before-save))
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any is probably a bad idea because it needs metadata, which may not be supported by some programming languages.
Also, our protobuf is proto2 so we don't want to pollute with proto3 constructs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a gofmt save hook, but this is a protobuf file!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:roll:
So, I'm just cheating and using spacemacs. (add-to-list 'protobuf-mode-hooks (lambda () (setq-local indent-tabs-mode t)))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed the indentation; the hook looks good, i might adopt it :p
Good point, the |
var ErrQueueFull = errors.New("queue full") | ||
|
||
func (p *PubSubControlView) PeersForTopic(topic string) (map[peer.ID]struct{}, bool) { | ||
tmap, ok := p.topics[topic] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the peer list into a slice shouldn't really cost us anything in the grand scheme of things. I'd do that to avoid exposing internal state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this method is on the hot path, hence the intention to avoid copying by introducing the thunk scheduling to guard state...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is on the hot path indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: I think the Eval()
is a smell. Its goal is to curtail state leakage caused the preexisting design. An event-based model would be cleaner and more comfortable to develop against.
// PeersForTopic returns the set of peers in a topic; the boolean indicator will be false | ||
// if the topic is unknown. | ||
// This method is not safe to call outside the pubsub process loop. | ||
PeersForTopic(topic string) (map[peer.ID]struct{}, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, would a router want to limit their view of topic membership by only seeing peers that support their routing protocol? The reason I ask is because all other methods can be regarded as events, except for this one, which is pure state access. It seems to break the abstraction a bit.
I'd be more inclined to have the router and control layers be event-based, with no shared state. That could remove the need for the synthetic Eval()
method -- which is only needed to protect state at an adjacent layer...
I see this model working as follows:
- On startup, routers enrol with the control layer, providing a predicate that, given a
[]protocol.ID
, returns whether the router can interact with that peer or not. - On every new peer that advertises the umbrella pubsub protocol, we call the predicate on all routers, and emit a
NewPeer
event to routers that signal support. - The control layer tracks which routers are handling which peers. When a peer goes away, we emit an event relevant routers.
- For topic membership control messages (join or leave), the control layer emit events to the supporting routers.
- Routers send messages by pushing events on a channel that the control layer is looping over.
- ...
This approach would insulate the state of each layer and we'd remove the need for cross-boundary scope changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a major redesign i'd rather not tackle in this pr!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, programming this with events makes it a lot harder to reason about; I really like the simplicity of a lock-free event loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I get that, but pubsub/gossipsub is relied on by major downstream adopters, and I'm happy to invest time and effort in that refactor. I think it would make the code more decoupled, elegant, accessible for contributors, and less spaghetti.
Isn't the lock-free event loop inherently mono-threaded, anyway? It can become a major bottleneck.
If instead we let routers keep their own state locally, and be responsible for mutating in reaction to events they receive from the control layer, I think it makes the system more functional and a lot easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 that having an Eval()
function on the interface seems ridiculous (also, this being ridiculous was the reason given for #171 not being it's own router). If there's a short term need for users that don't want to fork go-libp2p-pubsub just to get their work done I'd rather be even grosser and just export the fields in PubSub so people know how unstable/temporary the state is.
A more approachable, and less intrusive, approach might be to invert a small amount of the design. For example PeersForTopic
could easily be part of the PubSubRouter
interface instead utilizing a PubSubControl
interface. If certain structs/functions are really the same for various different routers then we could have some PubSubRouterBase
struct that could be embedded in the individual routers.
|
||
// Eval schedules a thunk for evaluation in the pubsub process loop. | ||
// Returns false if the evaluation was aborted because the pubsub system has been shutdown. | ||
Eval(thunk func()) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a smell. Alternative design proposal in comments above.
var ErrQueueFull = errors.New("queue full") | ||
|
||
func (p *PubSubControlView) PeersForTopic(topic string) (map[peer.ID]struct{}, bool) { | ||
tmap, ok := p.topics[topic] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this method is on the hot path, hence the intention to avoid copying by introducing the thunk scheduling to guard state...
closing as we decided to refactor internals before allowing out of tree routers. |
By popular demand, this refactors the pubsub system to allow out-of-tree router implementations.
This is accomplished by introducing a
PubSubControl
interface, which is the pubsub view presented to routers.It is currently sufficient to implement our in-tree routers, but we may want to extend it in the future.
It also adds a new field to the protobuf,
extData
, to be used by out-of-tree protocols for their own purposes.