-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Signed Address Records #217
Changes from 15 commits
77e3b66
5351d94
8d10f25
59f660b
b8f1c5e
107ddde
cba046f
35fda19
627a57c
238ca9f
4accd0a
5e06842
61617d6
536ae93
47606a0
377f05a
e401b14
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# RFC 0002 - Signed Envelopes | ||
|
||
- Start Date: 2019-10-21 | ||
- Related RFC: [0003 Address Records][addr-records-rfc] | ||
|
||
## Abstract | ||
|
||
This RFC proposes a "signed envelope" structure that contains an arbitrary byte | ||
string payload, a signature of the payload, and the public key that can be used | ||
to verify the signature. | ||
|
||
This was spun out of an earlier draft of the [address records | ||
RFC][addr-records-rfc], since it's generically useful. | ||
|
||
## Problem Statement | ||
|
||
Sometimes we'd like to store some data in a public location (e.g. a DHT, etc), | ||
or make use of potentially untrustworthy intermediaries to relay information. It | ||
would be nice to have an all-purpose data container that includes a signature of | ||
the data, so we can verify that the data came from a specific peer and that it hasn't | ||
been tampered with. | ||
|
||
## Domain Separation | ||
|
||
Signatures can be used for a variety of purposes, and a signature made for a | ||
specific purpose MUST NOT be considered valid for a different purpose. | ||
|
||
Without this property, an attacker could convince a peer to sign a payload in | ||
one context and present it as valid in another, for example, presenting a signed | ||
address record as a pubsub message. | ||
|
||
We separate signatures into "domains" by prefixing the data to be signed with a | ||
string unique to each domain. This string is not contained within the payload or | ||
the outer envelope structure. Instead, each libp2p subsystem that makes use of | ||
signed envelopes will provide their own domain string when constructing the | ||
envelope, and again when validating the envelope. If the domain string used to | ||
validate is different from the one used to sign, the signature validation will | ||
fail. | ||
|
||
Domain strings may be any valid UTF-8 string, but should be fairly short and | ||
descriptive of their use case, for example `"libp2p-routing-record"`. | ||
|
||
## Payload Type Information | ||
|
||
The envelope record can contain an arbitrary byte string payload, which will | ||
need to be interpreted in the context of a specific use case. To assist in | ||
"hydrating" the payload into an appropriate domain object, we include a "payload | ||
type" field. This field consists of a [multicodec][multicodec] code, | ||
optionally followed by an arbitrary byte sequence. | ||
|
||
This allows very compact type hints that contain just a multicodec, as well as | ||
"path" multicodecs of the form `/some/thing`, using the ["namespace" | ||
multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv#L23), | ||
whose binary value is equivalent to the UTF-8 `/` character. | ||
|
||
Use of the payload type field is encouraged, but the field may be left empty | ||
without invalidating the envelope. | ||
|
||
## Wire Format | ||
|
||
Since we already have a [protobuf definition for public keys][peer-id-spec], we | ||
can use protobuf for this as well and easily embed the key in the envelope: | ||
|
||
|
||
```protobuf | ||
message SignedEnvelope { | ||
jacobheun marked this conversation as resolved.
Show resolved
Hide resolved
|
||
PublicKey public_key = 1; // see peer id spec for definition | ||
bytes payload_type = 2; // payload type indicator | ||
bytes payload = 3; // opaque binary payload | ||
bytes signature = 4; // see below for signing rules | ||
} | ||
raulk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
The `public_key` field contains the public key whose secret counterpart was used | ||
to sign the message. This MUST be consistent with the peer id of the signing | ||
peer, as the recipient will derive the peer id of the signer from this key. | ||
|
||
The `payload_type` field contains a [multicodec][multicodec]-prefixed type | ||
indicator as described in the [Payload Type Information | ||
section](#payload-type-information). | ||
|
||
The `payload` field contains the arbitrary byte string payload. | ||
|
||
The `signature` field contains a signature of all fields except `public_key`, | ||
generated as described below. | ||
|
||
## Signature Production / Verification | ||
|
||
When signing, a peer will prepare a buffer by concatenating the following: | ||
|
||
- The length of the [domain separation string](#domain-separation) string in | ||
bytes | ||
- The domain separation string, encoded as UTF-8 | ||
- The length of the `payload_type` field in bytes | ||
- The value of the `payload_type` field | ||
- The length of the `payload` field in bytes | ||
- The value of the `payload` field | ||
|
||
The length values for each field are encoded as unsigned variable-length | ||
integers as defined in the [multiformats uvarint spec][uvarint]. | ||
|
||
Then they will sign the buffer according to the rules in the [peer id | ||
spec][peer-id-spec] and set the `signature` field accordingly. | ||
|
||
To verify, a peer will "inflate" the `public_key` into a domain object that can | ||
verify signatures, prepare a buffer as above and verify the `signature` field | ||
against it. | ||
|
||
[addr-records-rfc]: ./0003-address-records.md | ||
jacobheun marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[peer-id-spec]: ../peer-ids/peer-ids.md | ||
[multicodec]: https://github.com/multiformats/multicodec | ||
[uvarint]: https://github.com/multiformats/unsigned-varint |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,278 @@ | ||
# RFC 0003 - Peer Routing Records | ||
|
||
- Start Date: 2019-10-04 | ||
- Related Issues: | ||
- [libp2p/issues/47](https://github.com/libp2p/libp2p/issues/47) | ||
- [go-libp2p/issues/436](https://github.com/libp2p/go-libp2p/issues/436) | ||
|
||
## Abstract | ||
|
||
This RFC proposes a method for distributing peer routing records, which contain | ||
a peer's publicly reachable listen addresses, and may be extended in the future | ||
to contain additional metadata relevant to routing. This serves a similar | ||
purpose to [Ethereum Node Records][eip-778]. Like ENR records, libp2p routing | ||
records should be extensible, so that we can add information relevant to as-yet | ||
unknown use cases. | ||
|
||
The record described here does not include a signature, but it is expected to | ||
be serialized and wrapped in a [signed envelope][envelope-rfc], which will | ||
prove the identity of the issuing peer. The dialer can then prioritize | ||
self-certified addresses over addresses from an unknown origin. | ||
|
||
## Problem Statement | ||
|
||
All libp2p peers keep a "peer store", which maps [peer ids][peer-id-spec] to a | ||
set of known addresses for each peer. When the application layer wants to | ||
contact a peer, the dialer will pull addresses from the peer store and try to | ||
initiate a connection on one or more addresses. | ||
|
||
Addresses for a peer can come from a variety of sources. If we have already made | ||
a connection to a peer, the libp2p [identify protocol][identify-spec] will | ||
inform us of other addresses that they are listening on. We may also discover | ||
their address by querying the DHT, checking a fixed "bootstrap list", or perhaps | ||
through a pubsub message or an application-specific protocol. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably worth mentioning rendezvous here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also Peer eXchange. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and mDNS. Linking to the specs would be ideal. |
||
|
||
In the case of the identify protocol, we can be fairly certain that the | ||
addresses originate from the peer we're speaking to, assuming that we're using a | ||
secure, authenticated communication channel. However, more "ambient" discovery | ||
methods such as DHT traversal and pubsub depend on potentially untrustworthy | ||
third parties to relay address information. | ||
|
||
Even in the case of receiving addresses via the identify protocol, our | ||
confidence that the address came directly from the peer is not actionable, because | ||
the peer store does not track the origin of an address. Once added to the peer | ||
store, all addresses are considered equally valid, regardless of their source. | ||
|
||
We would like to have a means of distributing _verifiable_ address records, | ||
which we can prove originated from the addressed peer itself. We also need a way to | ||
track the "provenance" of an address within libp2p's internal components such as | ||
the peer store. Once those pieces are in place, we will also need a way to | ||
prioritize addresses based on their authenticity, with the most strict strategy | ||
being to only dial certified addresses. | ||
|
||
### Complications | ||
|
||
While producing a signed record is fairly trivial, there are a few aspects to | ||
this problem that complicate things. | ||
|
||
1. Addresses are not static. A given peer may have several addresses at any given | ||
time, and the set of addresses can change at arbitrary times. | ||
2. Peers may not know their own addresses. It's often impossible to automatically | ||
infer one's own public address, and peers may need to rely on third party | ||
peers to inform them of their observed public addresses. | ||
3. A peer may inadvertently or maliciously sign an address that they do not | ||
control. In other words, a signature isn't a guarantee that a given address is | ||
valid. | ||
4. Some addresses may be ambiguous. For example, addresses on a private subnet | ||
are valid within that subnet but are useless on the public internet. | ||
|
||
The first point can be addressed by having records contain a sequence number | ||
Stebalien marked this conversation as resolved.
Show resolved
Hide resolved
|
||
that increases monotonically when new records are issued, and by having newer | ||
records replace older ones. | ||
|
||
The other points, while worth thinking about, are out of scope for this RFC. | ||
However, we can take care to make our records extensible so that we can add | ||
additional metadata in the future. Some thoughts along these lines are in the | ||
[Future Work section below](#future-work). | ||
|
||
## Address Record Format | ||
|
||
Here's a protobuf that might work: | ||
|
||
```protobuf | ||
|
||
// RoutingState contains the listen addresses for a peer at a particular point in time. | ||
message RoutingState { | ||
// AddressInfo wraps a multiaddr. In the future, it may be extended to | ||
// contain additional metadata, such as "routability" (whether an address is | ||
// local or global, etc). | ||
message AddressInfo { | ||
bytes multiaddr = 1; | ||
} | ||
|
||
// the peer id of the subject of the record (who these addresses belong to). | ||
bytes peer_id = 1; | ||
|
||
// A monotonically increasing sequence number, used for record ordering. | ||
uint64 seq = 2; | ||
|
||
// All current listen addresses | ||
repeated AddressInfo addresses = 4; | ||
jacobheun marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
``` | ||
|
||
The `AddressInfo` wrapper message is used instead of a bare multiaddr to allow | ||
us to extend addresses with additional metadata [in the future](#future-work). | ||
|
||
The `seq` field contains a sequence number that MUST increase monotonically as | ||
new records are created. Newer records MUST have a higher `seq` value than older | ||
records. To avoid persisting state across restarts, implementations MAY use unix | ||
epoch time as the `seq` value, however they MUST NOT attempt to interpret a | ||
`seq` value from another peer as a valid timestamp. | ||
|
||
#### Example | ||
|
||
```javascript | ||
{ | ||
peer_id: "QmAlice...", | ||
seq: 1570215229, | ||
|
||
addresses: [ | ||
{ | ||
addr: "/ip4/1.2.3.4/tcp/42/p2p/QmAlice", | ||
}, | ||
{ | ||
addr: "/ip4/10.0.1.2/tcp/42/p2p/QmAlice", | ||
} | ||
] | ||
} | ||
``` | ||
|
||
A peer SHOULD only include addresses that it believes are routable via the | ||
public internet, ideally having confirmed that this is the case via some | ||
external mechanism such as a successful AutoNAT dial-back. | ||
|
||
In some cases we may want to include localhost or LAN-local address; for | ||
example, when testing the DHT using many processes on a single machine. To | ||
support this, implementations may use a global runtime configuration flag or | ||
environment variable to control whether local addresses will be included. | ||
|
||
## Certification / Verification | ||
|
||
This structure can be serialized and contained in a [signed | ||
envelope][envelope-rfc], which lets us issue "self-certified" address records | ||
that are signed by the peer that the addresses belong to. | ||
|
||
To produce a "self-certified" address, a peer will construct a `RoutingState` | ||
containing their listen addresses and serialize it to a byte array using a | ||
protobuf encoder. The serialized records will then be wrapped in a [signed | ||
envelope][envelope-rfc], which is signed with the libp2p peer's private host | ||
key. The corresponding public key MUST be included in the envelope's | ||
`public_key` field. | ||
|
||
When receiving a `RoutingState` wrapped in a signed envelope, a peer MUST | ||
validate the signature before deserializing the `RoutingState` record. If the | ||
signature is invalid, the envelope MUST be discarded without deserializing the | ||
envelope payload. | ||
|
||
Once the signature has been verified and the `RoutingState` has been | ||
deserialized, the receiving peer MUST verify that the `peer_id` contained in the | ||
`RoutingState` matches the `public_key` from the envelope. If the public key in | ||
the envelope cannot derive the peer id contained in the routing state record, | ||
the `RoutingState` MUST be discarded. | ||
|
||
### Signed Envelope Domain | ||
|
||
Signed envelopes require a "domain separation" string that defines the scope | ||
or purpose of a signature. | ||
|
||
When wrapping a `RoutingState` in a signed envelope, the domain string MUST be | ||
`libp2p-routing-state`. | ||
|
||
### Signed Envelope Payload Type | ||
|
||
Signed envelopes contain a `payload_type` field that indicates how to interpret | ||
the contents of the envelope. | ||
|
||
Ideally, we should define a new multicodec for routing records, so that we can | ||
identify them in a few bytes. While we're still spec'ing and working on the | ||
initial implementation, we can use the UTF-8 string | ||
`"/libp2p/routing-state-record"` as the `payload_type` value. | ||
|
||
## Peer Store APIs | ||
|
||
We will need to add a few methods to the peer store: | ||
|
||
- `AddCertifiedAddrs(envelope) -> Maybe<Error>` | ||
- Add a self-certified address, wrapped in a signed envelope. This should | ||
validate the envelope signature & store the envelope for future reference. | ||
If any certified addresses already exist for the peer, only accept the new | ||
envelope if it has a greater `seq` value than existing envelopes. | ||
|
||
- `CertifiedAddrs(peer_id) -> Set<Multiaddr>` | ||
- return the set of self-certified addresses for the given peer id | ||
|
||
- `SignedRoutingState(peer_id) -> Maybe<SignedEnvelope>` | ||
- retrieve the signed envelope that was most recently added to the peerstore | ||
for the given peer, if any exists. | ||
|
||
And possibly: | ||
|
||
- `IsCertified(peer_id, multiaddr) -> Boolean` | ||
- has a particular address been self-certified by the given peer? | ||
|
||
|
||
We'll also need a method that constructs a new `RoutingState` containing our | ||
listen addresses and wraps it in a signed envelope. This may belong on the Host | ||
instead of the peer store, since it needs access to the private signing key. | ||
|
||
When adding records to the peerstore, a receiving peer MUST keep track of the | ||
latest `seq` value received for each peer and reject incoming `RoutingState` | ||
messages unless they contain a greater `seq` value than the last received. | ||
|
||
After integrating the information from the `RoutingState` into the peerstore, | ||
implementations SHOULD retain the original signed envelope. This will allow | ||
other libp2p systems to share signed `RoutingState` records with other peers in | ||
the network, preserving the signature of the issuing peer. The [Exchanging | ||
Records section](#exchanging-records) section lists some systems that would need | ||
to retrieve the original signed record from the peerstore. | ||
|
||
## Dialing Strategies | ||
|
||
Once self-certified addresses are available via the peer store, we can update | ||
the dialer to prefer using them when possible. Some systems may want to _only_ | ||
dial self-certified addresses, so we should include some configuration options | ||
to control whether non-certified addresses are acceptable. | ||
|
||
## Exchanging Records | ||
|
||
We currently have several systems in libp2p that deal with peer addressing and | ||
which could be updated to use signed routing records: | ||
|
||
- Public peer discovery using [libp2p's DHT][dht-spec] | ||
- Local peer discovery with [mDNS][mdns-spec] | ||
- Direct exchange using the [identify protocol][identify-spec] | ||
- Service discovery via the [rendezvous protocol][rendezvous-spec] | ||
- A proposal for [a public peer exchange protocol][pex-proposal] | ||
|
||
Of these, the highest priority for updating seems to be the DHT, since it's | ||
actively used by several deployed systems and is vulnerable to routing attacks | ||
by malicious peers. We should work on extending the `FIND_NODE`, `ADD_PROVIDER`, | ||
and `GET_PROVIDERS` RPC messages to support returning signed records in addition | ||
to the current unsigned address information they currently support. | ||
|
||
We should also either define a new "secure peer routing" interface or extend the | ||
existing peer routing interfaces to support signed records, so that we don't end | ||
up with a bunch of similar but incompatible APIs for exchanging signed address | ||
records. | ||
|
||
## Future Work | ||
|
||
Some things that were originally considered in this RFC were trimmed so that we | ||
can focus on delivering a basic self-certified record, which is a pressing need. | ||
|
||
This includes a notion of "routability", which could be used to communicate | ||
whether a given address is global (reachable via the public internet), | ||
LAN-local, etc. We may also want to include some kind of confidence score or | ||
priority ranking, so that peers can communicate which addresses they would | ||
prefer other peers to use. | ||
|
||
To allow these fields to be added in the future, we wrap multiaddrs in the | ||
`AddressInfo` message instead of having the `addresses` field be a list of "raw" | ||
multiaddrs. | ||
|
||
Another potentially useful extension would be a compact protocol table or bloom | ||
filter that could be used to test whether a peer supports a given protocol | ||
before interacting with them directly. This could be added as a new field in the | ||
`RoutingState` message. | ||
|
||
|
||
|
||
[identify-spec]: ../identify/README.md | ||
[peer-id-spec]: ../peer-ids/peer-ids.md | ||
[mdns-spec]: ../discovery/mdns.md | ||
[rendezvous-spec]: ../rendezvous/README.md | ||
[pex-proposal]: https://github.com/libp2p/notes/issues/7 | ||
[autonat]: https://github.com/libp2p/specs/issues/180 | ||
[envelope-rfc]: ./0002-signed-envelopes.md | ||
[eip-778]: https://eips.ethereum.org/EIPS/eip-778 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth considering how generically useful this structure is given that the payload must be kept exactly as it is received (instead of allowing it to be deserailized and then reserialized).
If we chose to use a deterministic encoding scheme (e.g. Canonical CBOR or IPLD) instead of Protobufs this would be less of a problem. However, if we'd like to keep using Protobufs then it'd be great to have some documentation letting people know.
Thanks @yusefnapora for the great work putting this together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The envelope contains both the byte payload, and the signature over that byte payload. The serialisation scheme is irrelevant at this layer.
The recipient of this payload validates that the signature matches the plaintext and the key, then deserialises the payload with the serialisation format mandated for the payload type, in order to process it (e.g. to consume the multiaddrs).
If the recipient intends to relay this payload (as is the case of p2p discovery mechanisms), it does not send a re-serialised form, but rather it forwards the original envelope. In general, it's bad and fragile practice to reconstitute a payload in the hope that it'll continue matching the original signature that was annexed to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two statements are tied together and are following a rule set that you may think is correct, but is not obvious. Not obvious restrictions dictating how the data may be interacted with should be documented. Additionally, this restriction does not have to exist it's just something that's been decided is ok/insufficiently problematic to bother dealing with.
I also disagree with it being "bad" to allow consistent serialization/deserialization of objects. If I have data which I need to propagate frequently and access infrequently then I'll just store the message bytes and deserialize every time I need to access the data. If I frequently propagate and access the data I'll store the data both serialized and deserialized. If however, I infrequently propagate the data and frequently access it I'm now forced to waste space by storing both the serialized and deserialized versions for no reason other than we like Protobufs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not. You are mixing up the concerns of a cryptographic envelope, with the details of how the inner opaque payload is constructed. These two layers are decoupled, and @yusefnapora has done a good job of modelling that in this spec.
That's not what I said.
Yes, and it's a cost you assume to preserve the integrity of a signature.
Incorrect. Systems preserve the original data along with the signature for many reasons including reducing the surface for bugs, traceability/auditability, and others.
I insist it is a terrible idea to assume that, even with canonical serialisation, your system will be perennially capable of reconstituting a payload out of its constituents, in a way that it matches the original signature. Developers introduce bugs, systems change, schemas change, and maintaining such hypothetical logic is error-prone, brittle and convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what you think the downsides are of utilizing a format with canonical serialization?
I've already given a use case that would be helped by enabling canonical serialization, a record type that is infrequently propagated but frequently used would benefit from reduced memory and storage consumption.
Are you suggesting that we are intentionally using a format that has non-canonical serialization to dissuade other people from making design decisions you think are "terrible"? Are there other reasons you feel using IPLD or Canonical CBOR would be bad?
The point I'm trying to make here is that you seem to think "it's a terrible idea" for people to assume de+re serializing data will keep it identical, I think that in some situations it could be useful. Could you please list some of the negatives of utilizing a canonical serialization format and enabling developers to make their own decisions about whether to rely on its ability to de+re serialize accurately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulk this "generic and not opinionated" wrapper cannot be used if someone wanted to share (using a CID as a reference) a collection of envelopes and still access them efficiently.
Concrete example. If IPNS was being created today it could easily use one of these signed envelopes to contain its data. However, if I wanted to share over IPFS a set of IPNS records (e.g. here are the 10 public keys corresponding to my favorite website authors) I could not just take the IPNS records and stuff them into an IPLD object without compromising on storing two copies of the envelope.
I'm not saying the above example is common or something we should definitely do, but it shows a use case that your approach blocks. If we have a justification for ignoring this use case (e.g. you think protobuf is a more "amply supported, performant, well-vetted format" then the alternatives that support canonical serialization) then that's fine rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann
Even better, we have a type field so we can:
TL;DR: you absolutely free to re-serialize the content on the fly as long as you have chosen a format with a deterministic serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the discussion :) I definitely feel @aschmahmann about signing structured data that doesn't have a deterministic encoding - it just feels kind of wrong. Serializing to bytes before signing side-steps the issue, but it's also a bit awkward.
My first pass at this did use IPLD, mostly because of this issue of deterministic encoding, and also because I think the IPLD schema DSL is pretty cool. I ended up backing away from that, but I don't think I explained my thought process very well.
IPLD is attractive because you can get deterministic output with the CBOR encoding, but I was hesitant to rely on that, mostly because IPLD is still pretty new. If we start assuming that we can always serialize IPLD to the same bytes, that seems like it kind of limits the future evolution of the IPLD CBOR format. If we ever need to change how IPLD gets serialized to CBOR, any signatures made with the older implementation will be invalid.
The other problem with IPLD is just that we seem to be in the middle of a Cambrian explosion of libp2p implementations, and it seems like a tough ask to make libp2p implementers also implement IPLD.
I don't think either of those arguments really apply to just using plain CBOR & requiring the canonical encoding (sorted map keys, etc). CBOR has broad language support, and the canonical encoding is (hopefully) stable. And of course, if we did use CBOR, you could embed our records into an IPLD graph as-is without having to treat them as opaque blobs, since a valid CBOR map will presumably always be valid IPLD.
Honestly, I ended up going with protobuf instead simply because it seemed easier to define a protobuf schema than to specify the map keys, value types, etc that we'd need to define for a CBOR-based format. Also, since we need to include the public key & there's already a protobuf definition for that. That's mostly just me being lazy though, and I'd rather revisit this now than after we've baked it into a bunch of implementations.
I do like the idea of having a standard way to ship signed byte arrays around, but it's also possible that because I'm focused on this one use case of routing records that it's not actually as generic or broadly useful as I'm hoping.
You could certainly argue that it would be even more useful to have a standard way of shipping signed structured data around. We could potentially define an envelope as something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I didn't address @raulk's point in my last comment
That's the other reason I "gave up" on IPLD / CBOR and just went with the signed binary blob, although I don't know if I feel as strongly as Raúl does about it. We could potentially try to guard against differences in encoder implementations by having a ton of test vectors, but of course there's no way to guarantee we'd catch everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yusefnapora thanks for the detailed explanation here. I get IPLD being a big ask here, although it's probably worth thinking about (for the future) if there's a minimal subset of IPLD that it would be useful for libp2p to have access to.
It being easier to implement and wanting to get this shipped are totally reasonable reasons for us to want to go with protobufs. I guess I just wanted to clarify why the decision was made.
Also, I'm not sure if this is what @raulk was trying to explain but after speaking with @Stebalien I see that if a new format came along that wanted to have a consistent hash for a set of envelopes that we could just define the encoding for that new format. It's unfortunate, from a developer perspective, that we'd have to define and implement a canonical protobuf encoding instead of just using a pre-standardized and packaged encoder but it's still achievable within the spec. Given that IPLD defines codecs for each serialization format we import if we're not going with a pre-supported IPLD format then we'd have to define a new codec anyway.
@yusefnapora your suggestion would certainly do the job.