From df4e932851a49663c9b1e82861f8e58c6d6ab955 Mon Sep 17 00:00:00 2001 From: protolambda Date: Mon, 14 Sep 2020 22:59:07 +0200 Subject: [PATCH 01/13] gossipsub: introduce message signing policy, see libp2p/go-libp2p-pubsub#359 --- pubsub/README.md | 5 +++++ pubsub/gossipsub/gossipsub-v1.1.md | 24 ++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/pubsub/README.md b/pubsub/README.md index 687851ec5..eadd8c51d 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -153,6 +153,10 @@ and Messages can be optionally signed, and it is up to the peer whether to accept and forward unsigned messages. +When the receiver expects unsigned content-based messages, and thus does not expect +the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`). +And if not, the receiver may choose to enforce signatures strictly (`StrictSign`). +This optionality is configurable with the signing policy options starting from `v1.1`. For signing purposes, the `signature` and `key` fields are used: - The `signature` field contains the signature. @@ -160,6 +164,7 @@ For signing purposes, the `signature` and `key` fields are used: When present, it must match the peer ID. The signature is computed over the marshalled message protobuf _excluding_ the key field. +This includes any fields that are not recognized, but still included in the marshalled data. The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing. When signature validation fails for a signed message, the implementation must diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 6b754606e..b56e6d609 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -37,6 +37,7 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an - [Explicit Peering Agreements](#explicit-peering-agreements) - [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange) - [Protobuf](#protobuf) + - [Signature policy](#signature-policy) - [Flood Publishing](#flood-publishing) - [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination) - [Outbound Mesh Quotas](#outbound-mesh-quotas) @@ -134,6 +135,29 @@ message PeerInfo { } ``` +### Signature policy + +The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. +These fields may negatively affect privacy in content-addressed messaging, +and may need to be strictly enforced in author-addressed messaging. + +In gossipsub v1.0, a "lax" signing policy is effective: verify signatures, and if not, only when present. +In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether. +An implementation may choose to support the legacy v1.0 "lax" signing policy, + along with an explicit message authoring option. + +Gossipsub v1.1 has two policies to choose from: +- `StrictSign`: + - Produces the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. + - Enforce the fields to be present, reject otherwise. + - Verify the signature, reject otherwise. + - Propagate the fields if valid. +- `StrictNoSign`: + - Produces messages without the `signature`, `key`, `from` and `seqno` fields. + The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty. + - Enforce the fields to be absent, reject otherwise. + - Propagate only if the fields are absent. + ### Flood Publishing In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to From e4eb5ee1135aef19d62ed19cec2615185f3b55d9 Mon Sep 17 00:00:00 2001 From: protolambda Date: Thu, 24 Sep 2020 18:37:56 +0200 Subject: [PATCH 02/13] PR suggestions, formatting, clarification --- pubsub/README.md | 9 +++--- pubsub/gossipsub/gossipsub-v1.1.md | 51 +++++++++++++++++++++--------- 2 files changed, 41 insertions(+), 19 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index eadd8c51d..7b602c58a 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -153,17 +153,18 @@ and Messages can be optionally signed, and it is up to the peer whether to accept and forward unsigned messages. -When the receiver expects unsigned content-based messages, and thus does not expect +The default choice of origin-stamped messaging, the receiver should enforce signatures strictly (`StrictSign`). +When the receiver expects unsigned content-stamped messages, and thus does not expect the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`). -And if not, the receiver may choose to enforce signatures strictly (`StrictSign`). -This optionality is configurable with the signing policy options starting from `v1.1`. + +This optionality is configurable with the signature policy options starting from gossipsub v1.1. For signing purposes, the `signature` and `key` fields are used: - The `signature` field contains the signature. - The `key` field contains the signing key when it cannot be inlined in the source peer ID. When present, it must match the peer ID. -The signature is computed over the marshalled message protobuf _excluding_ the key field. +The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself. This includes any fields that are not recognized, but still included in the marshalled data. The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing. diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index b56e6d609..8c6b9d8c3 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -138,25 +138,46 @@ message PeerInfo { ### Signature policy The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. -These fields may negatively affect privacy in content-addressed messaging, -and may need to be strictly enforced in author-addressed messaging. +Initially this could be configured globally, however, configuration on a per-topic basis will facilitate mixed protocols better. -In gossipsub v1.0, a "lax" signing policy is effective: verify signatures, and if not, only when present. -In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether. -An implementation may choose to support the legacy v1.0 "lax" signing policy, - along with an explicit message authoring option. +In the default origin-stamped messaging, the fields need to be strictly enforced: +the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions. -Gossipsub v1.1 has two policies to choose from: +In content-stamped messaging, the fields may negatively affect privacy: +revealing the relationship between `data` and `from`/`seqno`. + +#### Signature policy options + +In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether: - `StrictSign`: - - Produces the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. - - Enforce the fields to be present, reject otherwise. - - Verify the signature, reject otherwise. - - Propagate the fields if valid. + - On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. + - On the consuming side: + - Enforce the fields to be present, reject otherwise. + - Propagate only if the fields are valid and signature can be verified, reject otherwise. - `StrictNoSign`: - - Produces messages without the `signature`, `key`, `from` and `seqno` fields. - The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty. - - Enforce the fields to be absent, reject otherwise. - - Propagate only if the fields are absent. + - On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + - The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty. + - On the consuming side: + - Enforce the fields to be absent, reject otherwise. + - Propagate only if the fields are absent, reject otherwise. + - A `message_id` function will not be able to use the above fields, and may instead rely on the `data` field. + +In gossipsub v1.0, a legacy "lax" signing policy could be configured, to not verify signatures except when present: +- `LaxSign`: *Defined for completeness, insecure*. Also known as authoring but not verifying. + - On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. + - On the consuming side: + - `signature` may be absent, and not verified. + - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. +- `LaxNoSign`: *Previous default for no-verification* + - On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + - On the consuming side: + - Accept and propagate messages with above fields. + - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. + ### Flood Publishing From 0ccb3b01ef744daa8f1c62292e5c2fdd724c6145 Mon Sep 17 00:00:00 2001 From: protolambda Date: Thu, 24 Sep 2020 19:18:28 +0200 Subject: [PATCH 03/13] define message ID option, clarify two common flavors --- pubsub/README.md | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index 7b602c58a..66da852aa 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -112,6 +112,9 @@ message Message { } ``` +The `optional` fields may be omitted, depending on the +[signature policy](#message-signing) and [message ID function](#message-identification) + The `from` field denotes the author of the message, note that this is not necessarily the peer who sent the RPC this message is contained in. This is done to allow content to be routed through a swarm of pubsubbing peers. @@ -123,14 +126,7 @@ The `seqno` field is a 64-bit big-endian uint that is a linearly increasing number that is unique among messages originating from each given peer. No two messages on a pubsub topic from the same peer should have the same `seqno` value, however messages from different peers may have the same sequence number, -so this number alone cannot be used to address messages. Notably the -'timecache' in use by the go implementation contains a `message_id`, -which is constructed from the concatenation of the `seqno` and the `from` -fields. This `message_id` is then unique among messages. It was also proposed -in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`, -however, it was noted: "a potential caveat with using hashes instead of seqnos: -the peer won't be able to send identical messages (e.g. keepalives) within the -timecache interval, as they will get rejected as duplicates." +so this number alone cannot be used to address messages by origin-stamping. The `topicIDs` field specifies a set of topics that this message is being published to. @@ -149,6 +145,30 @@ economics (see e.g. and [here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)). +## Message identification + +To uniquely identify a message in a set of topics, a `message_id` is computed based on the message. +This can be configured on the application layer, as `message_id_fn(*Message) => message_id`, +which generally fits in two flavors: +- **origin-stamped** messaging: the concatenation of the `seqno` and `from` fields + uniquely identifies a message based on the *author*. +- **content-stamped** messaging: a message ID derived from the `data` field + uniquely identifies a message based on the *data*. + +If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in, +a `message_id` based on a short digest of inputs may benefit performance. + +Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, +may use the `message_id` to key messages. + +It was also proposed in [#116](https://github.com/libp2p/specs/issues/116) +to use a `message_hash`, however, it was noted: +> a potential caveat with using hashes instead of seqnos: +the peer won't be able to send identical messages (e.g. keepalives) within the +timecache interval, as they will get rejected as duplicates. + +Some applications may not need keepalives, or choose to implement something more specific than a message hash. + ## Message Signing Messages can be optionally signed, and it is up to the peer whether to accept and forward @@ -161,7 +181,7 @@ This optionality is configurable with the signature policy options starting from For signing purposes, the `signature` and `key` fields are used: - The `signature` field contains the signature. -- The `key` field contains the signing key when it cannot be inlined in the source peer ID. +- The `key` field contains the signing key when it cannot be inlined in the source peer ID (`from`). When present, it must match the peer ID. The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself. From e80c67cc1502faa10bb2344d0e948ba52116e946 Mon Sep 17 00:00:00 2001 From: protolambda Date: Thu, 24 Sep 2020 19:23:27 +0200 Subject: [PATCH 04/13] update TOC --- pubsub/README.md | 3 ++- pubsub/gossipsub/gossipsub-v1.1.md | 7 ++++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index 66da852aa..7613a69d5 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -32,6 +32,7 @@ and spec status. - [The RPC](#the-rpc) - [The Message](#the-message) - [Message Signing](#message-signing) + - [Message Identification](#message-identification) - [The Topic Descriptor](#the-topic-descriptor) - [AuthOpts](#authopts) - [AuthMode 'NONE'](#authmode-none) @@ -145,7 +146,7 @@ economics (see e.g. and [here](https://ethresear.ch/t/improving-the-ux-of-rent-with-a-sleeping-waking-mechanism/1480)). -## Message identification +## Message Identification To uniquely identify a message in a set of topics, a `message_id` is computed based on the message. This can be configured on the application layer, as `message_id_fn(*Message) => message_id`, diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 8c6b9d8c3..5d5991929 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -37,7 +37,8 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an - [Explicit Peering Agreements](#explicit-peering-agreements) - [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange) - [Protobuf](#protobuf) - - [Signature policy](#signature-policy) + - [Signature Policy](#signature-policy) + - [Signature Policy Options](#signature-policy-options) - [Flood Publishing](#flood-publishing) - [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination) - [Outbound Mesh Quotas](#outbound-mesh-quotas) @@ -135,7 +136,7 @@ message PeerInfo { } ``` -### Signature policy +### Signature Policy The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. Initially this could be configured globally, however, configuration on a per-topic basis will facilitate mixed protocols better. @@ -146,7 +147,7 @@ the `seqno` and `from` fields form the `message_id`, and should be verified to a In content-stamped messaging, the fields may negatively affect privacy: revealing the relationship between `data` and `from`/`seqno`. -#### Signature policy options +#### Signature Policy Options In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether: - `StrictSign`: From b585401d35814c53240c43cb6e591f9b1a03a471 Mon Sep 17 00:00:00 2001 From: Diederik Loerakker Date: Thu, 24 Sep 2020 19:39:38 +0200 Subject: [PATCH 05/13] Apply suggestion to clarify policy incompatibility Co-authored-by: Jacek Sieka --- pubsub/gossipsub/gossipsub-v1.1.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 5d5991929..1631e0cc6 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -139,7 +139,7 @@ message PeerInfo { ### Signature Policy The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. -Initially this could be configured globally, however, configuration on a per-topic basis will facilitate mixed protocols better. +Initially this could be configured globally, however, because the policies are mutually incompatible, configuration on a per-topic basis will facilitate mixed protocols better. In the default origin-stamped messaging, the fields need to be strictly enforced: the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions. From dc1fe8ba67996dfb65f5dee2f60e3fcc428b1673 Mon Sep 17 00:00:00 2001 From: protolambda Date: Thu, 24 Sep 2020 19:58:48 +0200 Subject: [PATCH 06/13] clarify message ID per topic, and default function --- pubsub/README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index 7613a69d5..c438a6c98 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -149,13 +149,17 @@ and ## Message Identification To uniquely identify a message in a set of topics, a `message_id` is computed based on the message. -This can be configured on the application layer, as `message_id_fn(*Message) => message_id`, -which generally fits in two flavors: -- **origin-stamped** messaging: the concatenation of the `seqno` and `from` fields +This can be configured on the application layer, as `message_id_fn(*Message) => message_id`. +A `message_id_fn` may conditionally call different `message_id_fn` implementations per topic (or group thereof). + +The message ID approach generally fits in two flavors: +- **origin-stamped** messaging: the combination of the `seqno` and `from` fields uniquely identifies a message based on the *author*. - **content-stamped** messaging: a message ID derived from the `data` field uniquely identifies a message based on the *data*. +The default `message_id_fn` is origin-stamped, and defined as the string concatenation of `from` and `seqno`. + If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in, a `message_id` based on a short digest of inputs may benefit performance. From 58df0ddfa90a6ec0472ad5b00fce972d4a312631 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Thu, 24 Sep 2020 23:01:10 +0100 Subject: [PATCH 07/13] apply @raulk's suggestions. --- pubsub/README.md | 21 +++++++++++---------- pubsub/gossipsub/gossipsub-v1.1.md | 12 ++++++------ 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index c438a6c98..67afc9062 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -148,23 +148,24 @@ and ## Message Identification -To uniquely identify a message in a set of topics, a `message_id` is computed based on the message. -This can be configured on the application layer, as `message_id_fn(*Message) => message_id`. -A `message_id_fn` may conditionally call different `message_id_fn` implementations per topic (or group thereof). +To uniquely identify a message in a set of topics (for de-duplication, tracking, scoring and other purposes), a `message_id` is calculated based on the message. +How the calculated happens can be configured on the application layer by supplying a function `message_id_fn`, such that `message_id_fn(*Message) => message_id`. -The message ID approach generally fits in two flavors: +> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) only allows configuring a single top-level `message_id_fn`. This function may, however, vary its behaviour based on the topic (contained inside its `*Message`) argument. Thus, it's feasible to implement a per-topic policy using branch selection control flow logic. go-libp2p-pubsub plans to push down the configuration of the `message_id_fn` to the topic level. Other implementations are encouraged to do the same. + +The message ID calculation approach generally fits in two flavors: - **origin-stamped** messaging: the combination of the `seqno` and `from` fields uniquely identifies a message based on the *author*. -- **content-stamped** messaging: a message ID derived from the `data` field +- **content-addressed** messaging: a message ID derived from the `data` field uniquely identifies a message based on the *data*. -The default `message_id_fn` is origin-stamped, and defined as the string concatenation of `from` and `seqno`. +**The default `message_id_fn` is origin-stamped,** and defined as the string concatenation of `from` and `seqno`. If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in, -a `message_id` based on a short digest of inputs may benefit performance. +a `message_id` based on a short digest of inputs may benefit performance. Whichever the choice, it is crucial that **all peers** participating in a topic implement the same message ID calculation logic, or the topic may function suboptimally. -Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, -may use the `message_id` to key messages. +Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, scoring functions or circuit-breakers +may use the `message_id` to key and track messages. It was also proposed in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`, however, it was noted: @@ -172,7 +173,7 @@ to use a `message_hash`, however, it was noted: the peer won't be able to send identical messages (e.g. keepalives) within the timecache interval, as they will get rejected as duplicates. -Some applications may not need keepalives, or choose to implement something more specific than a message hash. +Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate. ## Message Signing diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 1631e0cc6..f70f2cc02 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -139,7 +139,7 @@ message PeerInfo { ### Signature Policy The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. -Initially this could be configured globally, however, because the policies are mutually incompatible, configuration on a per-topic basis will facilitate mixed protocols better. +> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) allows for configuring the signature policy at a global pubsub instance level. This needs to be pushed down to topic-level configuration. Other implementations are encouraged to support topic-level configuration, as the spec mandates. In the default origin-stamped messaging, the fields need to be strictly enforced: the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions. @@ -152,7 +152,7 @@ revealing the relationship between `data` and `from`/`seqno`. In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether: - `StrictSign`: - On the producing side: - - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. + - Build messages with the `signature`, `key` (`from` may be enough for certain inlineable public key types), `from` and `seqno` fields. - On the consuming side: - Enforce the fields to be present, reject otherwise. - Propagate only if the fields are valid and signature can be verified, reject otherwise. @@ -163,16 +163,16 @@ In gossipsub v1.1, these fields are strictly present and verified, or completely - On the consuming side: - Enforce the fields to be absent, reject otherwise. - Propagate only if the fields are absent, reject otherwise. - - A `message_id` function will not be able to use the above fields, and may instead rely on the `data` field. + - A `message_id` function will not be able to use the above fields, and should instead rely on the `data` field. A commonplace strategy is to calculate a hash. -In gossipsub v1.0, a legacy "lax" signing policy could be configured, to not verify signatures except when present: -- `LaxSign`: *Defined for completeness, insecure*. Also known as authoring but not verifying. +In gossipsub v1.0, a legacy "lax" signing policy could be configured, to only verify signatures when present. For security reasons, this is strategy is discarded in subsequent versions, but MAY still be supported for backwards-compatibility. If so, its use should be discouraged through prominent deprecation warnings. These strategies will be entirely dropped in the future. +- `LaxSign`: *this was never an original gossipsub 1.0 option, but it's defined here for completeness, and considered insecure*. Always sign, and verify incoming signatures, and but accept unsigned messages. - On the producing side: - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. - On the consuming side: - `signature` may be absent, and not verified. - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. -- `LaxNoSign`: *Previous default for no-verification* +- `LaxNoSign`: *Previous default for no-verification*. Do not sign nor origin-stamp, but verify incoming signatures, and accept unsigned messages. - On the producing side: - Build messages without the `signature`, `key`, `from` and `seqno` fields. - On the consuming side: From a7862dd1f57235b718da6590c006cae62ea85663 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Thu, 24 Sep 2020 23:01:41 +0100 Subject: [PATCH 08/13] apply @raulk's suggestions. --- pubsub/gossipsub/gossipsub-v1.1.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index f70f2cc02..466074169 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -138,7 +138,7 @@ message PeerInfo { ### Signature Policy -The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable. +The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable per topic, in the manners specified in this section. > [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) allows for configuring the signature policy at a global pubsub instance level. This needs to be pushed down to topic-level configuration. Other implementations are encouraged to support topic-level configuration, as the spec mandates. In the default origin-stamped messaging, the fields need to be strictly enforced: From e8b85d0b12eb974c7a18358539eea81146c05070 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Fri, 25 Sep 2020 13:43:52 +0100 Subject: [PATCH 09/13] Update pubsub/README.md --- pubsub/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pubsub/README.md b/pubsub/README.md index 67afc9062..3404e32d7 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -171,7 +171,7 @@ It was also proposed in [#116](https://github.com/libp2p/specs/issues/116) to use a `message_hash`, however, it was noted: > a potential caveat with using hashes instead of seqnos: the peer won't be able to send identical messages (e.g. keepalives) within the -timecache interval, as they will get rejected as duplicates. +timecache interval, as they will get treated as duplicates. Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate. From 03a68d53ebeeb92a0510e55041b49282894e612b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Fri, 25 Sep 2020 15:50:16 +0100 Subject: [PATCH 10/13] pubsub: signing policy editorial changes. --- pubsub/README.md | 207 +++++++++++++++++++++++------ pubsub/gossipsub/gossipsub-v1.1.md | 46 ------- 2 files changed, 165 insertions(+), 88 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index 3404e32d7..31ed627f8 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -4,9 +4,9 @@ | Lifecycle Stage | Maturity | Status | Latest Revision | |-----------------|----------------|--------|-----------------| -| 3A | Recommendation | Active | r2, 2019-02-01 | +| 3A | Recommendation | Active | r3, 2020-09-25 | -Authors: [@whyrusleeping] +Authors: [@whyrusleeping], [@protolambda], [@raulk], [@vyzo]. Interest Group: [@yusefnapora], [@raulk], [@vyzo], [@Stebalien], [@jamesray1], [@vasco-santos] @@ -17,6 +17,7 @@ Interest Group: [@yusefnapora], [@raulk], [@vyzo], [@Stebalien], [@jamesray1], [ [@Stebalien]: https://github.com/Stebalien [@jamesray1]: https://github.com/jamesray1 [@vasco-santos]: https://github.com/vasco-santos +[@protolambda]: https://github.com/protolambda See the [lifecycle document][lifecycle-spec] for context about maturity level and spec status. @@ -32,6 +33,7 @@ and spec status. - [The RPC](#the-rpc) - [The Message](#the-message) - [Message Signing](#message-signing) + - [Signature Policy](#signature-policy) - [Message Identification](#message-identification) - [The Topic Descriptor](#the-topic-descriptor) - [AuthOpts](#authopts) @@ -114,25 +116,31 @@ message Message { ``` The `optional` fields may be omitted, depending on the -[signature policy](#message-signing) and [message ID function](#message-identification) +[signature policy](#message-signing) and +[message ID function](#message-identification). -The `from` field denotes the author of the message, note that this is not -necessarily the peer who sent the RPC this message is contained in. This is -done to allow content to be routed through a swarm of pubsubbing peers. +The `from` field denotes the author of the message. This is the peer who +initially authored the message, and NOT the peer who propagated it. Thus, as +the message is routed through a swarm of pubsubbing peers, the original +authorship is preserved. -The `data` field is an opaque blob of data, it can contain any data that the -publisher wants it to. +The `data` field is an opaque blob of data representing the payload. It can +contain any data that the publisher wants it to. The `seqno` field is a 64-bit big-endian uint that is a linearly increasing number that is unique among messages originating from each given peer. No two messages on a pubsub topic from the same peer should have the same `seqno` -value, however messages from different peers may have the same sequence number, -so this number alone cannot be used to address messages by origin-stamping. +value, however messages from different peers may (and likely will) have the same +sequence number, so this number alone cannot be used to address messages by +**origin-stamping**. In other words, this number is not globally unique. It is +used in conjunction with `from` to derive a unique `message_id` (in the default +configuration). The `topicIDs` field specifies a set of topics that this message is being published to. -The `signature` and `key` fields are used for message signing, as explained below. +The `signature` and `key` fields are used for message signing, if such feature +is enabled, as explained below. The size of the `Message` should be limited, say to 1 MiB, but could also be configurable, for more information see @@ -148,55 +156,170 @@ and ## Message Identification -To uniquely identify a message in a set of topics (for de-duplication, tracking, scoring and other purposes), a `message_id` is calculated based on the message. -How the calculated happens can be configured on the application layer by supplying a function `message_id_fn`, such that `message_id_fn(*Message) => message_id`. +Pubsub requires to uniquely identify messages via a message ID. This enables +a wide range of processes like for de-duplication, tracking, scoring, +circuit-breaking, and others. -> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) only allows configuring a single top-level `message_id_fn`. This function may, however, vary its behaviour based on the topic (contained inside its `*Message`) argument. Thus, it's feasible to implement a per-topic policy using branch selection control flow logic. go-libp2p-pubsub plans to push down the configuration of the `message_id_fn` to the topic level. Other implementations are encouraged to do the same. +**The `message_id` is calculated from the `Message` struct.** -The message ID calculation approach generally fits in two flavors: -- **origin-stamped** messaging: the combination of the `seqno` and `from` fields - uniquely identifies a message based on the *author*. -- **content-addressed** messaging: a message ID derived from the `data` field - uniquely identifies a message based on the *data*. +By default, **origin-stamping** is in force. This strategy relies on the string +concatenation of the `from` and `seqno` fields, to uniquely identify a message +based on the *author*. -**The default `message_id_fn` is origin-stamped,** and defined as the string concatenation of `from` and `seqno`. +Alternatively, a user-defined `message_id_fn` may be supplied, where +`message_id_fn(Message) => message_id`. Such a function could compute the hash +of the `data` field within the `Message`, and thus one could reify +**content-addressed messaging**. -If fabricated collisions are not a concern, or difficult enough within the window the message is relevant in, -a `message_id` based on a short digest of inputs may benefit performance. Whichever the choice, it is crucial that **all peers** participating in a topic implement the same message ID calculation logic, or the topic may function suboptimally. +If fabricated collisions are not a concern, or difficult enough within the +window the message is relevant in, a `message_id` based on a short digest of +inputs may benefit performance. -Note that different specialized pubsub components, such as the 'timecache' used in the Go implementation, scoring functions or circuit-breakers -may use the `message_id` to key and track messages. +> **[[ Margin note ]]:** There's a potential caveat with using hashes instead of +> seqnos: the peer won't be able to send identical messages (e.g. keepalives) +> within the timecache interval, as they will get treated as duplicates. This +> consequence may or may not be relevant to the application at hand. +> Reference: [#116](https://github.com/libp2p/specs/issues/116). -It was also proposed in [#116](https://github.com/libp2p/specs/issues/116) -to use a `message_hash`, however, it was noted: -> a potential caveat with using hashes instead of seqnos: -the peer won't be able to send identical messages (e.g. keepalives) within the -timecache interval, as they will get treated as duplicates. +**Note that the availability of these fields on the `Message` object will depend +on the [signature policy](#signature-policy) configured for the topic.** -Some applications may not need keepalives, or choose to implement something more specific than a message hash. In those cases where duplicate payloads are not desirable, a `content-based` message ID function may be more appropriate. +Whichever the choice, it is crucial that **all peers** participating in a topic +implement identical message ID calculation logic, or the topic may function +suboptimally. + +> **[[ Implementation note ]]:** At the time of writing this section, +> go-libp2p-pubsub (reference implementation of this spec) only allows +> configuring a single top-level `message_id_fn`. This function may, however, +> vary its behaviour based on the topic (contained inside its `Message`) +> argument. Thus, it's feasible to implement a per-topic policy using branch +> selection control flow logic. In the near future, go-libp2p-pubsub plans to +> push down the configuration of the `message_id_fn` to the topic level. Other +> implementations are encouraged to do the same. ## Message Signing -Messages can be optionally signed, and it is up to the peer whether to accept and forward -unsigned messages. -The default choice of origin-stamped messaging, the receiver should enforce signatures strictly (`StrictSign`). -When the receiver expects unsigned content-stamped messages, and thus does not expect -the `from`, `seqno`, `signature`, or `key` fields, it may reject the messages (`StrictNoSign`). +Signature behavior is configured in two axes: signature creation, and signature +verification. + +**Signature creation.** There are two configurations possible: -This optionality is configurable with the signature policy options starting from gossipsub v1.1. +* `Sign`: when publishing a message, perform **origin-stamping** and produce a + signature. +* `NoSign`: when publishing a message, do not perform **origin-stamping** and + do not produce a signature. For signing purposes, the `signature` and `key` fields are used: - The `signature` field contains the signature. -- The `key` field contains the signing key when it cannot be inlined in the source peer ID (`from`). - When present, it must match the peer ID. +- The `key` field contains the signing key when it cannot be inlined in + the source peer ID (`from`). When present, it must match the peer ID. + +The signature is computed over the marshalled message protobuf _excluding_ the +`signature` field itself. + +This includes any fields that are not recognized, but still included in the +marshalled data. -The signature is computed over the marshalled message protobuf _excluding_ the `signature` field itself. -This includes any fields that are not recognized, but still included in the marshalled data. The protobuf blob is prefixed by the string `libp2p-pubsub:` before signing. +> **[[ Margin note: ]]** Protobuf serialization is non-deterministic/canonical, +> and the same data structure may result in different, valid serialised bytes +> across implementations, as well as other issues. In the near future, the +> signature creation and verification algorithm will be made deterministic. + +**Signature verification.** There are two configurations possible: + +* `Strict`: either expect or not expect a signature. +* `Lax` (legacy, insecure, underterministic, to be deprecated): accept a signed + message if the signature verification passes, or if it's unsigned. + When signature validation fails for a signed message, the implementation must -drop the message and omit propagation. Locally, it may treat this event in whichever -manner it wishes (e.g. logging). +drop the message and omit propagation. Locally, it may treat this event in +whichever manner it wishes (e.g. logging, penalization, etc.). + +#### Signature Policy Options + +The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` +is configurable per topic. + +> **[[ Implementation note ]]:** At the time of writing this section, +> go-libp2p-pubsub (reference implementation of this spec) allows for +> configuring the signature policy at the **global pubsub instance level**. +> This needs to be pushed down to topic-level configuration. +> Other implementations should support topic-level configuration, as this spec +> mandates. + +The intersection of signing behaviours across the two axes (signature creation +and signature verification) gives way to four signature policy options: + +* `StrictSign`, `StrictNoSign`. Deterministic, usage encouraged. +* `LaxSign`, `LaxNoSign`. Non-deterministic, legacy, usage discouraged. Mostly + for backwards compatibility. Will be deprecated. If the implementation decides + to support these, their use should be discouraged through deprecation warnings. + +**`StrictSign` option** + +On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough for + certain inlineable public key types), `from` and `seqno` fields. + +On the consuming side: + - Enforce the fields to be present, reject otherwise. + - Propagate only if the fields are valid and signature can be verified, + reject otherwise. + +**`StrictNoSign` option** + +On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + - The corresponding protobuf key-value pairs are absent from the marshalled + message, not just empty. + +On the consuming side: + - Enforce the fields to be absent, reject otherwise. + - Propagate only if the fields are absent, reject otherwise. + - A `message_id` function will not be able to use the above fields, and should + instead rely on the `data` field. A commonplace strategy is to calculate + a hash. + +**`LaxSign` legacy option** + +_Not required for backwards-compatibility. Considered insecure, nevertheless +defined for completeness._ + +Always sign, and verify incoming signatures, and but accept unsigned messages. + +On the producing side: + - Build messages with the `signature`, `key` (`from` may be enough), `from` + and `seqno` fields. + +On the consuming side: + - `signature` may be absent, and not verified. + - Verify `signature`, iff the `signature` is present, then reject if + `signature` is invalid. + +**`LaxNoSign` option** + +_Previous default_. + +Do not sign nor origin-stamp, but verify incoming signatures, and accept +unsigned messages. + +On the producing side: + - Build messages without the `signature`, `key`, `from` and `seqno` fields. + +On the consuming side: + - Accept and propagate messages with above fields. + - Verify `signature`, iff the `signature` is present, then reject if + `signature` is invalid. + +> **[[ Margin note: ]]** For content-addressed messaging, `StrictNoSign` is the +> most appropriate policy option, coupled with a user-defined `message_id_fn`, +> and a validator function to verify protocol-defined signatures. +> +> When publisher anonymity is being sought, `StrictNoSign` is also the most +> appropriate policy, as it refrains from outputting the `from` and `seqno` +> fields. ## The Topic Descriptor diff --git a/pubsub/gossipsub/gossipsub-v1.1.md b/pubsub/gossipsub/gossipsub-v1.1.md index 466074169..6b754606e 100644 --- a/pubsub/gossipsub/gossipsub-v1.1.md +++ b/pubsub/gossipsub/gossipsub-v1.1.md @@ -37,8 +37,6 @@ See the [lifecycle document][lifecycle-spec] for context about maturity level an - [Explicit Peering Agreements](#explicit-peering-agreements) - [PRUNE Backoff and Peer Exchange](#prune-backoff-and-peer-exchange) - [Protobuf](#protobuf) - - [Signature Policy](#signature-policy) - - [Signature Policy Options](#signature-policy-options) - [Flood Publishing](#flood-publishing) - [Adaptive Gossip Dissemination](#adaptive-gossip-dissemination) - [Outbound Mesh Quotas](#outbound-mesh-quotas) @@ -136,50 +134,6 @@ message PeerInfo { } ``` -### Signature Policy - -The usage of the `signature`, `key`, `from`, and `seqno` fields in `Message` is now configurable per topic, in the manners specified in this section. -> [[ Implementation note ]]: At the time of writing this section, go-libp2p-pubsub (reference implementation of this spec) allows for configuring the signature policy at a global pubsub instance level. This needs to be pushed down to topic-level configuration. Other implementations are encouraged to support topic-level configuration, as the spec mandates. - -In the default origin-stamped messaging, the fields need to be strictly enforced: -the `seqno` and `from` fields form the `message_id`, and should be verified to avoid `message_id` collisions. - -In content-stamped messaging, the fields may negatively affect privacy: -revealing the relationship between `data` and `from`/`seqno`. - -#### Signature Policy Options - -In gossipsub v1.1, these fields are strictly present and verified, or completely omitted altogether: -- `StrictSign`: - - On the producing side: - - Build messages with the `signature`, `key` (`from` may be enough for certain inlineable public key types), `from` and `seqno` fields. - - On the consuming side: - - Enforce the fields to be present, reject otherwise. - - Propagate only if the fields are valid and signature can be verified, reject otherwise. -- `StrictNoSign`: - - On the producing side: - - Build messages without the `signature`, `key`, `from` and `seqno` fields. - - The corresponding protobuf key-value pairs are absent from the marshalled message, not just empty. - - On the consuming side: - - Enforce the fields to be absent, reject otherwise. - - Propagate only if the fields are absent, reject otherwise. - - A `message_id` function will not be able to use the above fields, and should instead rely on the `data` field. A commonplace strategy is to calculate a hash. - -In gossipsub v1.0, a legacy "lax" signing policy could be configured, to only verify signatures when present. For security reasons, this is strategy is discarded in subsequent versions, but MAY still be supported for backwards-compatibility. If so, its use should be discouraged through prominent deprecation warnings. These strategies will be entirely dropped in the future. -- `LaxSign`: *this was never an original gossipsub 1.0 option, but it's defined here for completeness, and considered insecure*. Always sign, and verify incoming signatures, and but accept unsigned messages. - - On the producing side: - - Build messages with the `signature`, `key` (`from` may be enough), `from` and `seqno` fields. - - On the consuming side: - - `signature` may be absent, and not verified. - - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. -- `LaxNoSign`: *Previous default for no-verification*. Do not sign nor origin-stamp, but verify incoming signatures, and accept unsigned messages. - - On the producing side: - - Build messages without the `signature`, `key`, `from` and `seqno` fields. - - On the consuming side: - - Accept and propagate messages with above fields. - - Verify `signature`, iff the `signature` is present, then reject if `signature` is invalid. - - ### Flood Publishing In gossipsub v1.0, peers publish new messages to the members of their mesh if they are subscribed to From b3c498ebadd14df6cad4d87996e7737530af88e2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Fri, 25 Sep 2020 16:00:06 +0100 Subject: [PATCH 11/13] pubsub: define 'origin-stamped' term. --- pubsub/README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index 31ed627f8..4ee038bc9 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -124,18 +124,20 @@ initially authored the message, and NOT the peer who propagated it. Thus, as the message is routed through a swarm of pubsubbing peers, the original authorship is preserved. -The `data` field is an opaque blob of data representing the payload. It can -contain any data that the publisher wants it to. - The `seqno` field is a 64-bit big-endian uint that is a linearly increasing number that is unique among messages originating from each given peer. No two messages on a pubsub topic from the same peer should have the same `seqno` value, however messages from different peers may (and likely will) have the same -sequence number, so this number alone cannot be used to address messages by -**origin-stamping**. In other words, this number is not globally unique. It is -used in conjunction with `from` to derive a unique `message_id` (in the default +sequence number. In other words, this number is not globally unique. It is used +in conjunction with `from` to derive a unique `message_id` (in the default configuration). +Henceforth, we define the term **origin-stamped messaging** to refer to messages +whose `from` and `seqno` fields are populated. + +The `data` field is an opaque blob of data representing the payload. It can +contain any data that the publisher wants it to. + The `topicIDs` field specifies a set of topics that this message is being published to. From 948c901feac2e3aa05f284e3ca951ee328b065a1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Fri, 25 Sep 2020 16:17:30 +0100 Subject: [PATCH 12/13] fix typo. --- pubsub/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pubsub/README.md b/pubsub/README.md index 4ee038bc9..edfc5035c 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -159,7 +159,7 @@ and ## Message Identification Pubsub requires to uniquely identify messages via a message ID. This enables -a wide range of processes like for de-duplication, tracking, scoring, +a wide range of processes like de-duplication, tracking, scoring, circuit-breaking, and others. **The `message_id` is calculated from the `Message` struct.** From 402b8c3c4ad8376fb692c298e0b6cbf476810b96 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Kripalani?= Date: Tue, 29 Sep 2020 15:27:27 +0100 Subject: [PATCH 13/13] minor edits. --- pubsub/README.md | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/pubsub/README.md b/pubsub/README.md index edfc5035c..b672c2431 100644 --- a/pubsub/README.md +++ b/pubsub/README.md @@ -119,30 +119,30 @@ The `optional` fields may be omitted, depending on the [signature policy](#message-signing) and [message ID function](#message-identification). -The `from` field denotes the author of the message. This is the peer who -initially authored the message, and NOT the peer who propagated it. Thus, as +The `from` field (optional) denotes the author of the message. This is the peer +who initially authored the message, and NOT the peer who propagated it. Thus, as the message is routed through a swarm of pubsubbing peers, the original authorship is preserved. -The `seqno` field is a 64-bit big-endian uint that is a linearly increasing -number that is unique among messages originating from each given peer. No two -messages on a pubsub topic from the same peer should have the same `seqno` -value, however messages from different peers may (and likely will) have the same -sequence number. In other words, this number is not globally unique. It is used -in conjunction with `from` to derive a unique `message_id` (in the default +The `seqno` field (optional) is a 64-bit big-endian uint that is a linearly +increasing number that is unique among messages originating from each given +peer. No two messages on a pubsub topic from the same peer should have the same +`seqno` value, however messages from different peers may have the same sequence +number. In other words, this number is not globally unique. It is used in +conjunction with `from` to derive a unique `message_id` (in the default configuration). Henceforth, we define the term **origin-stamped messaging** to refer to messages whose `from` and `seqno` fields are populated. -The `data` field is an opaque blob of data representing the payload. It can -contain any data that the publisher wants it to. +The `data` (optional) field is an opaque blob of data representing the payload. +It can contain any data that the publisher wants it to. The `topicIDs` field specifies a set of topics that this message is being published to. -The `signature` and `key` fields are used for message signing, if such feature -is enabled, as explained below. +The `signature` and `key` fields (optional) are used for message signing, if +such feature is enabled, as explained below. The size of the `Message` should be limited, say to 1 MiB, but could also be configurable, for more information see @@ -187,8 +187,7 @@ inputs may benefit performance. on the [signature policy](#signature-policy) configured for the topic.** Whichever the choice, it is crucial that **all peers** participating in a topic -implement identical message ID calculation logic, or the topic may function -suboptimally. +implement identical message ID calculation logic, or the topic will malfunction. > **[[ Implementation note ]]:** At the time of writing this section, > go-libp2p-pubsub (reference implementation of this spec) only allows @@ -302,7 +301,7 @@ On the consuming side: **`LaxNoSign` option** -_Previous default_. +_Previous default for 'no signature verification' mode_. Do not sign nor origin-stamp, but verify incoming signatures, and accept unsigned messages.