-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Signed Address Records #217
Conversation
What do people think about IPLD? I mostly just thought the schema notation is pretty cool, but I think if we want to use it for real we'd need a block store (at least an in-memory one), which doesn't feel like it's really a libp2p-layer concern. Maybe I'm overthinking that though. Other options that I can think of are protobuf, which plays nice with most other libp2p things, or CBOR. Raw CBOR is as annoying to work with as raw JSON, but it does have a canonical encoding that is deterministic, which protobuf lacks. I'm not sure that a deterministic encoding is strictly necessary, and as far as I can tell, IPLD doesn't enforce one either. Anyway, protobuf is likely the simplest to use, so that seems like a decent way to go. |
I should note that I haven't tried actually using IPLD yet :) I'm going to play with it a bit and see how it works in practice. If anyone has experience with it, how is it? |
I've been thinking about this some more and just wanted to write up where my head is at before I start revising. Ultimately I think IPLD is overkill for this, and we should just use protobufs. Earlier I was leaning away from this because there's no guarantees of deterministic encoding for protobufs, which was an issue with the peer id spec. However, as long as we sign, transmit and store serialized protobufs, we can still validate the record before deserializing it, and we don't really have to worry about whether our encoder has the same behavior as the one that produced the record. I think I want to tease this apart into two RFCs:
This is basically the same idea as the original draft, except instead of an e.g.: message SignedEnvelope {
bytes peerId = 1;
bytes contents = 2;
bytes signature = 3;
optional PublicKey pubkey = 4;
}
message AddressInfo {
bytes addr = 1;
Routability routability = 2;
Confidence confidence = 3;
}
message AddressState {
bytes subject = 1;
repeated AddressInfo addresses = 2;
// etc...
} Storing serialized protobufs as a byte string inside another protobuf feels weird, but it seems necessary, given the lack of a deterministic encoder. Does this make sense? |
Also note: we need to tag the data with some kind of "purpose" string so we can't reuse these records as messages in other protocols with signed messages. We do this in pubsub by prepending |
That is better 👍 Good question about the "domain separation" string. Maybe we could have a fixed string, e.g. "libp2p-signed-message:" but also allow the user to add an arbitrary "purpose" string that gets appended to the fixed string. This could be included in plain text as a field in the Something like {
publicKey: {
//...
},
contents: Buffer(),
purpose: "AddressState",
signature: Buffer()
} And you prepend |
That will take extra space in the message itself which may discourage high entropy names. I actually like forcing users to explicitly specify the purpose/domain. If we include it in the message, users can be lazy and ignore it. If we don't, users will have to call some function like: func Open(domain string, envelope []byte) (message []byte, author PublicKey, err error) { ... }
I'm not sure if
BTW, my concern here is that |
Ah, I see where you're coming from with the out-of-band domain string, that makes sense. Using only user-controlled data for the domain string makes me nervous, although I'm not sure I can really come up with a plausible attack. Still, there's not really any cost to using the fixed string plus the user-provided one, so might be worth doing just in case. |
From my experience with security auditors, letting arbitrary or user-defined input control the interpretation of a message is going to raise a yellow flag for them even if it's not exploitable. (A plain text field in the protobuf does not seem as likely to raise those concerns.) Perhaps something to consider since we've fixed a lot of other stuff that was not technically broken (but is a nonstandard approach) to satisfy auditor concerns. |
This is less of an arbitrary user-defined input than the bytes being signed as the purpose string is fixed at compile time. My only concerns are:
|
Good point - the "user" that controls the domain string is another libp2p subsystem, and there's no external input at runtime. I guess I lean towards using a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shaping up well. Just a handful of initial comments.
RFC/0002-signed-envelopes.md
Outdated
|
||
```protobuf | ||
message SignedEnvelope { | ||
PublicKey publicKey = 1; // see peer id spec for definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought. Including the pubkey may be superfluous for some signature schemes.
Given an ECDSA signature, one can recover the public key provided we know the curve, the hash function, and the plaintext that was signed. Bitcoin and Ethereum use that trick heavily to validate transactions.
See:
https://crypto.stackexchange.com/questions/18105/how-does-recovering-the-public-key-from-an-ecdsa-signature-work
https://crypto.stackexchange.com/questions/60218/recovery-public-key-from-secp256k1-signature-and-message
RFC/0003-address-records.md
Outdated
LOCAL = 3; | ||
|
||
// public internet | ||
GLOBAL = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a PROXIMITY
enum value for network-less transports that rely on physical proximity, e.g. Bluetooth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLOBAL = 4; | |
PUBLIC = 4; |
RFC/0003-address-records.md
Outdated
} | ||
``` | ||
|
||
If Alice wants to publish her address to a public shared resource like a DHT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend that records are rejected outright when they leak addresses that are outside the scope of the discovery mechanism?
RFC/0003-address-records.md
Outdated
|
||
// Confidence indicates how much we believe in the validity of the | ||
// address. | ||
enum Confidence { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What classes of problems do you foresee communicating our perceived confidence of our own addresses would solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's to signal which dials to prioritise, we can simplify this by conveying a 1-byte precedence value in the range (-128, 128).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from a sync call with @yusefnapora.
Regarding the peer routing record:
- For now, modelling confidence and routability should be out of scope, because these two concepts are not yet modelled in the libp2p stack.
- We should think about "record extensions", where confidence and routability would be two examples of future extensions.
- Another extension I'd like to see is a "service bloom filter" that allows peers to advertise their protocols compactly and efficiently (without enumerating them, therefore with a degree of privacy preservation), and receivers of that record would be able to probabilistically test whether that authoring peer supports a protocol they're interested in, without having to connect to them.
Regarding the envelope:
- Add a
type
field: not sure if this should be a string, an enumeration, a multicodec. - Hoist the sequence number from the peer routing record to the envelope.
- These two fields need to be part of the plaintext to sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More thoughts regarding the "routability" aspect:
- What addresses do we want to communicate in a peer routing record?
- Discovery mechanisms have different scopes of operation, e.g. mDNS operates on LAN and loopback, PEX and the DHT are mostly public, but we do want to facilitate two local peers finding each other and establishing a local connection.
- For PEX and DHT, maybe we shouldn't even publish local addresses? Doing so has implications in security (you can order a machine to perform local dials to some process in their local network).
- Instead of publishing local addresses in a global venue (e.g. DHT), we should be having a mechanism so that peers establishing a connection via a public endpoint can detect if they're behind the same network, and "upgrade" to a better route.
- I guess in some scenarios, NAT hairpinning takes care of the above.
A possibility is having the address filtering be explicit. In most cases, we can communicate local and public endpoints, filtering out the localhost endpoints.
If the user wants to enable localhost endpoint advertisement, they can do so by enabling a flag on the Host.
In the implementation realm, assuming there'll be a peer routing record generator component, which knows all addresses and spits out records whenever they change, notifying downstream components (e.g. DHT, PEX, etc.), those components could provide an address filter when registering.
The problem with the above approach is that it can lead to a "chimera" scenario. If two peers find each other via two venues (e.g. mDNS, DHT), the routing records with different scopes could collide and override each other. Unless we isolate routing records per domain/scope in the peerstore.
Food for thought.
@yusefnapora aims to have a final draft of incorporating comments by today, in order to be reviewed and merged on Monday, to have a minimal implementation ready by the end of the week in Go. |
Hm, not really. If these bloom filters are at all useful, peers will be able to ask "does peer X likely speak protocol Y". I'd expect an attacker to ask "which peers speak protocol Y" (e.g., trying to find all the IPFS, Filecoin, etc. peers). However, shrinking the size of the protocol list is likely worth it (but multicodecs!). |
YES! |
Why? Are we planning on duplicating these? I thought the idea was to make the envelope general-purpose.
Multicodec prefixed byte string? That allows:
This is also consistent with ENS records. |
@Stebalien about moving the sequence number to the envelope, we were thinking it might be broadly useful for other kinds of payload, but as I was writing it up, I realized that each system that uses envelopes would need to have their own rules about how to interpret the sequence number (e.g. whether to invalidate & replace old records, merge them, etc). So it may be best to just leave it in the routing record and keep the envelope as simple as we can make it. For the type field, I think your suggestion makes sense and gives us a lot of flexibility. 👍 Since we're talking about prefixing multiple fields to the content to sign instead of just the domain string, I'm now thinking that length-prefixing the fields is better than using a |
Just wanted to update this, since I've been a bit quiet this week. I've been working on an initial implementation in Go that I'm hoping to make PRs for today. I diverged from what's written up here in one place. It seemed a bit silly to use varints for length-prefixing when building up the buffer to sign, since we don't need to save space on the wire or anything. So I just used uint64, which seems like way more than enough to fit any payload we might want to use envelopes for. I'll update this RFC to reflect that in a second. Also, how does everyone feel about the name "routing records"? In the implementation I ended up calling them RoutingStateRecords to emphasize that they contain transient state, but I don't know if that's really any better. I'm also wondering if an even more generic name like "peer records" makes sense, since we're thinking about putting things like the protocol bloom filter in here, which isn't strictly a routing concern. Maybe "availability records"? IDK 🚲 🏠 Anyway, my plan for today is to fix up a few things in the go implementation and push those branches. Then I was thinking I'd merge this RFC in and start new PRs to turn the RFCs into Working Draft specs and move the conversation over to those PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've already merged a reference implementation into go-libp2p, so this can be promoted to a spec and be tagged with status 2A (Candidate Recommendation).
@vasco-santos -- are you planning to implement in js-libp2p?
Co-authored-by: tmakarios <[email protected]>
Now that this has been implemented in Go and JS I intend to merge this later today unless there is a reasonable objection. Ideally we would couple the approved RFC in a merge with its spec per #198, but I'd prefer this PR not sit open, completed, until someone can dedicate time to completing it. |
@jacobheun Should we go ahead and merge this ? |
This commit upgrades the current gossipsub implementation to support the [v1.1 spec](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md). It adds a number of features, bug fixes and performance improvements. Besides support for all new 1.1 features, other improvements that are of particular note: - Improved duplicate LRU-time cache (this was previously a severe bottleneck for large message throughput topics) - Extended message validation configuration options - Arbitrary topics (users can now implement their own hashing schemes) - Improved message validation handling - Invalid messages are no longer dropped but sent to the behaviour for application-level processing (including scoring) - Support for floodsub, gossipsub v1 and gossipsub v2 - Protobuf encoding has been shifted into the behaviour. This has permitted two improvements: 1. Message size verification during publishing (report to the user if the message is too large before attempting to send). 2. Message fragmentation. If an RPC is too large it is fragmented into its sub components and sent in smaller chunks. Additional Notes The peer eXchange protocol defined in the v1.1 spec is inactive in its current form. The current implementation permits sending `PeerId` in `PRUNE` messages, however a `PeerId` is not sufficient to form a new connection to a peer. A `Signed Address Record` is required to safely transmit peer identity information. Once these are confirmed (libp2p/specs#217) a future PR will implement these and make PX usable. Co-authored-by: Max Inden <[email protected]> Co-authored-by: Rüdiger Klaehn <[email protected]> Co-authored-by: blacktemplar <[email protected]> Co-authored-by: Rüdiger Klaehn <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Roman S. Borschel <[email protected]> Co-authored-by: Roman Borschel <[email protected]> Co-authored-by: David Craven <[email protected]>
This commit upgrades the current gossipsub implementation to support the [v1.1 spec](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md). It adds a number of features, bug fixes and performance improvements. Besides support for all new 1.1 features, other improvements that are of particular note: - Improved duplicate LRU-time cache (this was previously a severe bottleneck for large message throughput topics) - Extended message validation configuration options - Arbitrary topics (users can now implement their own hashing schemes) - Improved message validation handling - Invalid messages are no longer dropped but sent to the behaviour for application-level processing (including scoring) - Support for floodsub, gossipsub v1 and gossipsub v2 - Protobuf encoding has been shifted into the behaviour. This has permitted two improvements: 1. Message size verification during publishing (report to the user if the message is too large before attempting to send). 2. Message fragmentation. If an RPC is too large it is fragmented into its sub components and sent in smaller chunks. Additional Notes The peer eXchange protocol defined in the v1.1 spec is inactive in its current form. The current implementation permits sending `PeerId` in `PRUNE` messages, however a `PeerId` is not sufficient to form a new connection to a peer. A `Signed Address Record` is required to safely transmit peer identity information. Once these are confirmed (libp2p/specs#217) a future PR will implement these and make PX usable. Co-authored-by: Max Inden <[email protected]> Co-authored-by: Rüdiger Klaehn <[email protected]> Co-authored-by: blacktemplar <[email protected]> Co-authored-by: Rüdiger Klaehn <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Roman S. Borschel <[email protected]> Co-authored-by: Roman Borschel <[email protected]> Co-authored-by: David Craven <[email protected]>
Hey all, this introduces an RFC for a self-certified address record that could be published to a DHT or gossiped about in a pubsub topic without worrying about it being altered in-flight.
This is motivated by the discussion in libp2p/libp2p#47 and libp2p/go-libp2p#436.
I wrote up the record as an IPLD schema, but we could go with another format if we'd rather not have a dependency on IPLD.
I also wasn't sure whether we might want to issue records about peers other than ourselves - this draft does not, but we could have records with distinct
subject
andissuer
peers.There are some TODOs at the bottom that are better tracked here:
@bigs @raulk @mgoelzer @Stebalien @vyzo