Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Signed Address Records #217
RFC: Signed Address Records #217
Changes from 5 commits
77e3b66
5351d94
8d10f25
59f660b
b8f1c5e
107ddde
cba046f
35fda19
627a57c
238ca9f
4accd0a
5e06842
61617d6
536ae93
47606a0
377f05a
e401b14
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth considering how generically useful this structure is given that the payload must be kept exactly as it is received (instead of allowing it to be deserailized and then reserialized).
If we chose to use a deterministic encoding scheme (e.g. Canonical CBOR or IPLD) instead of Protobufs this would be less of a problem. However, if we'd like to keep using Protobufs then it'd be great to have some documentation letting people know.
Thanks @yusefnapora for the great work putting this together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The envelope contains both the byte payload, and the signature over that byte payload. The serialisation scheme is irrelevant at this layer.
The recipient of this payload validates that the signature matches the plaintext and the key, then deserialises the payload with the serialisation format mandated for the payload type, in order to process it (e.g. to consume the multiaddrs).
If the recipient intends to relay this payload (as is the case of p2p discovery mechanisms), it does not send a re-serialised form, but rather it forwards the original envelope. In general, it's bad and fragile practice to reconstitute a payload in the hope that it'll continue matching the original signature that was annexed to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two statements are tied together and are following a rule set that you may think is correct, but is not obvious. Not obvious restrictions dictating how the data may be interacted with should be documented. Additionally, this restriction does not have to exist it's just something that's been decided is ok/insufficiently problematic to bother dealing with.
I also disagree with it being "bad" to allow consistent serialization/deserialization of objects. If I have data which I need to propagate frequently and access infrequently then I'll just store the message bytes and deserialize every time I need to access the data. If I frequently propagate and access the data I'll store the data both serialized and deserialized. If however, I infrequently propagate the data and frequently access it I'm now forced to waste space by storing both the serialized and deserialized versions for no reason other than we like Protobufs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not. You are mixing up the concerns of a cryptographic envelope, with the details of how the inner opaque payload is constructed. These two layers are decoupled, and @yusefnapora has done a good job of modelling that in this spec.
That's not what I said.
Yes, and it's a cost you assume to preserve the integrity of a signature.
Incorrect. Systems preserve the original data along with the signature for many reasons including reducing the surface for bugs, traceability/auditability, and others.
I insist it is a terrible idea to assume that, even with canonical serialisation, your system will be perennially capable of reconstituting a payload out of its constituents, in a way that it matches the original signature. Developers introduce bugs, systems change, schemas change, and maintaining such hypothetical logic is error-prone, brittle and convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what you think the downsides are of utilizing a format with canonical serialization?
I've already given a use case that would be helped by enabling canonical serialization, a record type that is infrequently propagated but frequently used would benefit from reduced memory and storage consumption.
Are you suggesting that we are intentionally using a format that has non-canonical serialization to dissuade other people from making design decisions you think are "terrible"? Are there other reasons you feel using IPLD or Canonical CBOR would be bad?
The point I'm trying to make here is that you seem to think "it's a terrible idea" for people to assume de+re serializing data will keep it identical, I think that in some situations it could be useful. Could you please list some of the negatives of utilizing a canonical serialization format and enabling developers to make their own decisions about whether to rely on its ability to de+re serialize accurately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raulk this "generic and not opinionated" wrapper cannot be used if someone wanted to share (using a CID as a reference) a collection of envelopes and still access them efficiently.
Concrete example. If IPNS was being created today it could easily use one of these signed envelopes to contain its data. However, if I wanted to share over IPFS a set of IPNS records (e.g. here are the 10 public keys corresponding to my favorite website authors) I could not just take the IPNS records and stuff them into an IPLD object without compromising on storing two copies of the envelope.
I'm not saying the above example is common or something we should definitely do, but it shows a use case that your approach blocks. If we have a justification for ignoring this use case (e.g. you think protobuf is a more "amply supported, performant, well-vetted format" then the alternatives that support canonical serialization) then that's fine rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann
Even better, we have a type field so we can:
TL;DR: you absolutely free to re-serialize the content on the fly as long as you have chosen a format with a deterministic serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the discussion :) I definitely feel @aschmahmann about signing structured data that doesn't have a deterministic encoding - it just feels kind of wrong. Serializing to bytes before signing side-steps the issue, but it's also a bit awkward.
My first pass at this did use IPLD, mostly because of this issue of deterministic encoding, and also because I think the IPLD schema DSL is pretty cool. I ended up backing away from that, but I don't think I explained my thought process very well.
IPLD is attractive because you can get deterministic output with the CBOR encoding, but I was hesitant to rely on that, mostly because IPLD is still pretty new. If we start assuming that we can always serialize IPLD to the same bytes, that seems like it kind of limits the future evolution of the IPLD CBOR format. If we ever need to change how IPLD gets serialized to CBOR, any signatures made with the older implementation will be invalid.
The other problem with IPLD is just that we seem to be in the middle of a Cambrian explosion of libp2p implementations, and it seems like a tough ask to make libp2p implementers also implement IPLD.
I don't think either of those arguments really apply to just using plain CBOR & requiring the canonical encoding (sorted map keys, etc). CBOR has broad language support, and the canonical encoding is (hopefully) stable. And of course, if we did use CBOR, you could embed our records into an IPLD graph as-is without having to treat them as opaque blobs, since a valid CBOR map will presumably always be valid IPLD.
Honestly, I ended up going with protobuf instead simply because it seemed easier to define a protobuf schema than to specify the map keys, value types, etc that we'd need to define for a CBOR-based format. Also, since we need to include the public key & there's already a protobuf definition for that. That's mostly just me being lazy though, and I'd rather revisit this now than after we've baked it into a bunch of implementations.
I do like the idea of having a standard way to ship signed byte arrays around, but it's also possible that because I'm focused on this one use case of routing records that it's not actually as generic or broadly useful as I'm hoping.
You could certainly argue that it would be even more useful to have a standard way of shipping signed structured data around. We could potentially define an envelope as something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I didn't address @raulk's point in my last comment
That's the other reason I "gave up" on IPLD / CBOR and just went with the signed binary blob, although I don't know if I feel as strongly as Raúl does about it. We could potentially try to guard against differences in encoder implementations by having a ton of test vectors, but of course there's no way to guarantee we'd catch everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yusefnapora thanks for the detailed explanation here. I get IPLD being a big ask here, although it's probably worth thinking about (for the future) if there's a minimal subset of IPLD that it would be useful for libp2p to have access to.
It being easier to implement and wanting to get this shipped are totally reasonable reasons for us to want to go with protobufs. I guess I just wanted to clarify why the decision was made.
Also, I'm not sure if this is what @raulk was trying to explain but after speaking with @Stebalien I see that if a new format came along that wanted to have a consistent hash for a set of envelopes that we could just define the encoding for that new format. It's unfortunate, from a developer perspective, that we'd have to define and implement a canonical protobuf encoding instead of just using a pre-standardized and packaged encoder but it's still achievable within the spec. Given that IPLD defines codecs for each serialization format we import if we're not going with a pre-supported IPLD format then we'd have to define a new codec anyway.
@yusefnapora your suggestion would certainly do the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought. Including the pubkey may be superfluous for some signature schemes.
Given an ECDSA signature, one can recover the public key provided we know the curve, the hash function, and the plaintext that was signed. Bitcoin and Ethereum use that trick heavily to validate transactions.
See:
https://crypto.stackexchange.com/questions/18105/how-does-recovering-the-public-key-from-an-ecdsa-signature-work
https://crypto.stackexchange.com/questions/60218/recovery-public-key-from-secp256k1-signature-and-message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need a
PROXIMITY
enum value for network-less transports that rely on physical proximity, e.g. Bluetooth.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What classes of problems do you foresee communicating our perceived confidence of our own addresses would solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's to signal which dials to prioritise, we can simplify this by conveying a 1-byte precedence value in the range (-128, 128).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend that records are rejected outright when they leak addresses that are outside the scope of the discovery mechanism?