From 720348a51e351c5bb2f2a13159a1c8007e17d042 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 11:55:00 +0000 Subject: [PATCH 01/10] feat: add IPNI spec Describes the `ipni/offer` capability and how to merge inclusion claims with IPNI Advertisements License: MIT Signed-off-by: Oli Evans --- w3-ipni.md | 237 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 237 insertions(+) create mode 100644 w3-ipni.md diff --git a/w3-ipni.md b/w3-ipni.md new file mode 100644 index 0000000..419b0c9 --- /dev/null +++ b/w3-ipni.md @@ -0,0 +1,237 @@ +# W3 IPNI Protocol + +![status:wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) + +## Authors + +- [olizilla], [Protocol Labs] + +# Abstract + +For IPNI we assert that we can provide batches of multihashes by signing "Advertisements". + +With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index. + +This spec describes how to merge these two concepts by adding an `ipni/offer` capability to submit an inclusion claim as an IPNI Advertisement. + +## Language + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). + +## Introduction + +**What this unlocks** (tl;dr) + +- Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId) + - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. + - We can **delete** the IPNI records by CAR CID if the CAR is deleted. +- Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events + - With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. + - This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. +- Put the lime in the coconut. Put an inclusion claim in the IPNI advert metadata. + - We show the source of our provider claim is a user signed inclusion content claim. + - We have to sign IPNI Adverts as the provider, so we can warn folks that this ad is as good as the user provided content claim it includes. + +### Quick IPNI primer + +IPNI ingests and replicates billions of signed provider claims for where individual block CIDs can be retrieved from. + +Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific ContextID and optional metadata. + +http://cid.contact hosts an IPNI server that Protocol Labs maintains. *(at time of writing)* + +```bash +$ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq +``` + +```json +{ + "MultihashResults": [ + { + "Multihash": "EiBAsLcLTWQSx+WydZNGhjs75z18AfE0r1OPYAPsDU7NFw==", + "ProviderResults": [ + { + "ContextID": "YmFndXFlZXJheTJ2ZWJsZGNhY2JjM3Z0em94bXBvM2NiYmFsNzV3d3R0aHRyamhuaDdvN2o2c2J0d2xmcQ==", + "Metadata": "gBI=", + "Provider": { + "ID": "QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC", + "Addrs": [ + "/dns4/elastic.dag.house/tcp/443/wss" + ] + } + }, + { + "ContextID": "YmFndXFlZXJheTJ2ZWJsZGNhY2JjM3Z0em94bXBvM2NiYmFsNzV3d3R0aHRyamhuaDdvN2o2c2J0d2xmcQ==", + "Metadata": "oBIA", + "Provider": { + "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp", + "Addrs": [ + "/dns4/dag.w3s.link/tcp/443/https" + ] + } + } +``` + +web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to S3 as an `Advertisement`, addressed by it's CID. + +An `Advertisement` includes `Provider` info which claims that a the batch of multihashes are available via bitswap or HTTP, and are signed by the provider PeerId private key; Each advert is a claim that this peer will provide that batch of multihashes. + +Advertisements also include a CID link to any previous ones from the same provider forming a hash linked list. + +The latest `head` CID of the ad list can be broadcast over gossipsub, to be replicated and indexed by all listeners, or POSTed over HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. + +The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same ContextID. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. + +A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear (http headers? bitswap what now?). Regardless it is more space for provider specified bytes... like maybe... a content claim! *(foreshadowing!)* + +### How web3.storage integrates IPNI today + +w3s publishes IPNI advertisements as a side-effect of the e-ipfs car block indexer. + +Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to S3 as json. + +The lambda makes an http request to the cid.contact to inform it when the head CID of the Advertisement linked list changes. + +The cid.contact IPNI server fetches new head Advertisement from our s3 bucket, and any others in the chain it hasn't read yet, and updates it's indexes. + +Our `Advertisement`s contain arbitrary batches of multihashes defined by SQS queue batching config. The ContextID is set to opaque bytes (a custom hash of the hashes). + +#### Diagram + +```mermaid +flowchart TD + A[(dotstorage\nbucket)] -->|ObjectCreated fa:fa-car| B(bucket-to-indexer ƛ) + B -->|region/bucket/cid/cid.car| C[/indexer queue/] + C --> indexer(Indexer ƛ) + indexer --> |zQmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn| E[/multihash queue/] + E --> F(ipni Advertisement content ƛ) + F --> |PUT /advertCid|I + F --> |advert CID| G[/Advertisement queue/] + G --> H(ipni publish ƛ) + H --> |PUT /head|I[(Advert Bucket)] + H --> |POST head|IPNI[["`**IPNI**`"]] + + carpark[(carpark\nbucket)] --> |ObjectCreated fa:fa-car|w3infra-carpark-consumer(carpark-consumer ƛ) + w3infra-carpark-consumer -->|region/bucket/cid/cid.car| C[/indexer queue/] + + indexer ---> dynamo[Dynamo\nblocks index] +``` + +## Proposal + +Provide a `ipni/offer` ucan ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers. + +```mermaid +sequenceDiagram + actor Alice + Alice->>w3s: ipni/offer (inclusion proof) + activate w3s + w3s-->>w3s: fetch & verify index + w3s-->>w3s: write advert + w3s-->>Alice: OK (advertisement CID) + w3s-->>ipni: publish head (CID) + deactivate w3s + ipni-->>w3s: fetch advert + activate ipni + ipni-->>ipni: index entries + deactivate ipni + Alice->>ipni: query (CID) +``` + + +Invoke it with the CID for an [inclusion-claim] that associates a CAR CID wth [MultihashIndexSorted CARv2 Index] CID. + +:::info +Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byteLength for a multihash from the start of the CAR file. +::: + + +```json +{ + "iss": "did:key:zAlice", + "aud": "did:web:web3.storage", + "att": [{ + "can": "ipni/offer", + "with": "did:key:space", // users space DID + "nb": { + "inclusion": CID // inclusion claim CID + } + }] +} +``` + +**Inclusion claim** +```json +{ + "content": CID, // CAR CID + "includes": CID // CARv2 Index CID +} +``` + +When `ipni/offer` is invoked the service must fetch the inclusion claim. The encoded claim block may be sent with the invocation. + +The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index) + +The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. + +```ipldsch +type Advertisement struct { + PreviousID optional Link + Provider String + Addresses [String] + Signature Bytes + Entries Link + ContextID Bytes + Metadata Bytes + IsRm Bool + ExtendedProvider optional ExtendedProvider +} +``` + +- `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR. +- `ContextID` must be the byte encoded form of the CAR CID. +- `Metadata` must be the bytes of the inclusion claim. + +See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement) + +The Advertisement CID should be POSTed to an IPNI server. `cid.contact` is assumed initially. + +The Advertisement CID should be gossiped on the `/indexer/ingest/mainnet` topic so they can be replicated by other IPNI servers, to ensure many nodes can answer queries for the blocks we host. + + +### Verifying the CARv2 Index + +The service must fetch the CARv2 Index and may verify 1 or more multihashes from the index exist at the specified offsets in the associated CAR. + +The verifier should pick a set of multihashes at random and fetch the bytes from the CAR identified by the index entry and verify it's multihash. The invocation must return an error if any entry is found to be invalid. + +Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index. + +Full validation of every block is not recommended as it opens us up to performing unbounded work. *We have seen CAR files with millions of tiny blocks.* + + +### Encoding the IPNI Advertisement + +> The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. + +Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks + +```ipldsch +type EntryChunk struct { + Entries [Bytes] + Next optional Link +} +``` + +It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch. + +The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextId` set to the CAR CID. + + +[MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted + +[inclusion-claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim + +[IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements + +[olizilla]: https://github.com/olizilla From 05caee25e1ccbbe84686aadb602b01939b3ec47f Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 14:35:18 +0000 Subject: [PATCH 02/10] chore: lint License: MIT Signed-off-by: Oli Evans --- .github/workflows/words-to-ignore.txt | 2 ++ w3-ipni.md | 30 ++++++++++++++------------- 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index 3c54472..1445af3 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -134,3 +134,5 @@ Irakli Gozalishvili Vasco invoker +IPNI +multihash multihashes diff --git a/w3-ipni.md b/w3-ipni.md index 419b0c9..ea9ea51 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -6,7 +6,7 @@ - [olizilla], [Protocol Labs] -# Abstract +## Abstract For IPNI we assert that we can provide batches of multihashes by signing "Advertisements". @@ -18,19 +18,21 @@ This spec describes how to merge these two concepts by adding an `ipni/offer` ca The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119](https://datatracker.ietf.org/doc/html/rfc2119). -## Introduction +## Introduction -**What this unlocks** (tl;dr) +We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with content-claims, by publishing an advert per inclusion claim, and include the source claim in the IPNI advert. + +**What this Unlocks** - Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId) - - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. - - We can **delete** the IPNI records by CAR CID if the CAR is deleted. + - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. + - We can **delete** the IPNI records by CAR CID if the CAR is deleted. - Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events - - With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. - - This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. + - With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. + - This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. - Put the lime in the coconut. Put an inclusion claim in the IPNI advert metadata. - - We show the source of our provider claim is a user signed inclusion content claim. - - We have to sign IPNI Adverts as the provider, so we can warn folks that this ad is as good as the user provided content claim it includes. + - We show the source of our provider claim is a user signed inclusion content claim. + - We have to sign IPNI Adverts as the provider, so we can warn folks that this ad is as good as the user provided content claim it includes. ### Quick IPNI primer @@ -38,10 +40,10 @@ IPNI ingests and replicates billions of signed provider claims for where individ Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific ContextID and optional metadata. -http://cid.contact hosts an IPNI server that Protocol Labs maintains. *(at time of writing)* + hosts an IPNI server that Protocol Labs maintains. *(at time of writing)* ```bash -$ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq +curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq ``` ```json @@ -70,6 +72,7 @@ $ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3a ] } } +]}]} ``` web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to S3 as an `Advertisement`, addressed by it's CID. @@ -227,11 +230,10 @@ It is possible to create long chains of `EntryChunk` blocks by setting the `Next The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextId` set to the CAR CID. +
[MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted - [inclusion-claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim - [IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements - [olizilla]: https://github.com/olizilla +[Protocol Labs]: https://protocol.ai From f01552e1b440c25b9909101f6afc3563f337875f Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 14:47:50 +0000 Subject: [PATCH 03/10] chore: lint License: MIT Signed-off-by: Oli Evans --- .github/workflows/words-to-ignore.txt | 7 ++++++- w3-ipni.md | 21 +++++++++------------ 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index 1445af3..0e86c0c 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -135,4 +135,9 @@ Gozalishvili Vasco invoker IPNI -multihash multihashes +multihash like multihashes +tl +dr +S3 +bitswap +PeerId like PeerID \ No newline at end of file diff --git a/w3-ipni.md b/w3-ipni.md index ea9ea51..c7f649a 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -8,7 +8,7 @@ ## Abstract -For IPNI we assert that we can provide batches of multihashes by signing "Advertisements". +For IPNI we assert that we can provide batches of multihashes by signing "Advertisements". With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index. @@ -22,7 +22,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with content-claims, by publishing an advert per inclusion claim, and include the source claim in the IPNI advert. -**What this Unlocks** +**What this Unlocks** _(tl;dr)_ - Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId) - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. @@ -38,7 +38,7 @@ We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align ou IPNI ingests and replicates billions of signed provider claims for where individual block CIDs can be retrieved from. -Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific ContextID and optional metadata. +Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific `ContextID` and optional metadata. hosts an IPNI server that Protocol Labs maintains. *(at time of writing)* @@ -75,9 +75,9 @@ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aap ]}]} ``` -web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to S3 as an `Advertisement`, addressed by it's CID. +web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to S3 as an `Advertisement`, addressed by it's CID. -An `Advertisement` includes `Provider` info which claims that a the batch of multihashes are available via bitswap or HTTP, and are signed by the provider PeerId private key; Each advert is a claim that this peer will provide that batch of multihashes. +An `Advertisement` includes `Provider` info which claims that a the batch of multihashes are available via bitswap or HTTP, and are signed by the provider PeerID private key; Each advert is a claim that this peer will provide that batch of multihashes. Advertisements also include a CID link to any previous ones from the same provider forming a hash linked list. @@ -141,14 +141,13 @@ sequenceDiagram Alice->>ipni: query (CID) ``` - Invoke it with the CID for an [inclusion-claim] that associates a CAR CID wth [MultihashIndexSorted CARv2 Index] CID. :::info Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byteLength for a multihash from the start of the CAR file. ::: - +**UCAN invocation** example ```json { "iss": "did:key:zAlice", @@ -163,7 +162,7 @@ Other CAR index forms may be supported in the future. A more convenient external } ``` -**Inclusion claim** +**Inclusion claim** example ```json { "content": CID, // CAR CID @@ -175,7 +174,7 @@ When `ipni/offer` is invoked the service must fetch the inclusion claim. The enc The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index) -The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. +The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. ```ipldsch type Advertisement struct { @@ -201,7 +200,6 @@ The Advertisement CID should be POSTed to an IPNI server. `cid.contact` is assum The Advertisement CID should be gossiped on the `/indexer/ingest/mainnet` topic so they can be replicated by other IPNI servers, to ensure many nodes can answer queries for the blocks we host. - ### Verifying the CARv2 Index The service must fetch the CARv2 Index and may verify 1 or more multihashes from the index exist at the specified offsets in the associated CAR. @@ -212,12 +210,11 @@ Random validation of a number of blocks allows us to detect invalid indexes and Full validation of every block is not recommended as it opens us up to performing unbounded work. *We have seen CAR files with millions of tiny blocks.* - ### Encoding the IPNI Advertisement > The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. -Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks +Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks. ```ipldsch type EntryChunk struct { From 8e5ee5a4d2b59b9e8da3a7b9cdeaf4b164bf4d89 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:00:19 +0000 Subject: [PATCH 04/10] chore: lint License: MIT Signed-off-by: Oli Evans --- .github/workflows/words-to-ignore.txt | 5 +++-- w3-ipni.md | 15 +++++++-------- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index 0e86c0c..32b51ee 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -135,9 +135,10 @@ Gozalishvili Vasco invoker IPNI -multihash like multihashes +multihash +multihashes tl dr S3 bitswap -PeerId like PeerID \ No newline at end of file +PeerID diff --git a/w3-ipni.md b/w3-ipni.md index c7f649a..3c6fa53 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -8,11 +8,11 @@ ## Abstract -For IPNI we assert that we can provide batches of multihashes by signing "Advertisements". +For [IPNI] we assert that we can provide batches of multihashes by signing "Advertisements". -With an inclusion claim, a user asserts that a CAR contains a given set of multihashes via a car index. +With an [inclusion claim], a user asserts that a CAR contains a given set of multihashes via a car index. -This spec describes how to merge these two concepts by adding an `ipni/offer` capability to submit an inclusion claim as an IPNI Advertisement. +This spec describes how to merge these two concepts by adding an `ipni/offer` capability to publish an inclusion claim as [IPNI Advertisements]. ## Language @@ -141,7 +141,7 @@ sequenceDiagram Alice->>ipni: query (CID) ``` -Invoke it with the CID for an [inclusion-claim] that associates a CAR CID wth [MultihashIndexSorted CARv2 Index] CID. +Invoke it with the CID for an [inclusion claim] that associates a CAR CID wth [MultihashIndexSorted CARv2 Index] CID. :::info Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byteLength for a multihash from the start of the CAR file. @@ -196,9 +196,7 @@ type Advertisement struct { See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement) -The Advertisement CID should be POSTed to an IPNI server. `cid.contact` is assumed initially. - -The Advertisement CID should be gossiped on the `/indexer/ingest/mainnet` topic so they can be replicated by other IPNI servers, to ensure many nodes can answer queries for the blocks we host. +The Advertisement should then be available for consumption by indexer nodes per the [Advertisement Transfer](https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer) section of the IPNI spec. ### Verifying the CARv2 Index @@ -229,8 +227,9 @@ The containing CAR CID provides a useful `ContextID` for grouping multiple (ligh
+[IPNI]: https://github.com/ipni/specs/blob/main/IPNI.md [MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted -[inclusion-claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim +[inclusion claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim [IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements [olizilla]: https://github.com/olizilla [Protocol Labs]: https://protocol.ai From 458f660ba11392d74403656c58ac150245120552 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:15:48 +0000 Subject: [PATCH 05/10] chore: lint License: MIT Signed-off-by: Oli Evans --- w3-ipni.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/w3-ipni.md b/w3-ipni.md index 3c6fa53..d9793a0 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -22,17 +22,16 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with content-claims, by publishing an advert per inclusion claim, and include the source claim in the IPNI advert. -**What this Unlocks** _(tl;dr)_ +### Motivation -- Create 1 or more IPNI Adverts per user uploaded CAR and set the ContextID to be the CAR CID (instead of arbitrary batches with no ContextId) +- Align IPNI advert entries with CAR block sets and setting the ContextID to be the CAR CID. - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. - We can **delete** the IPNI records by CAR CID if the CAR is deleted. - Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events - With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. - This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. -- Put the lime in the coconut. Put an inclusion claim in the IPNI advert metadata. - - We show the source of our provider claim is a user signed inclusion content claim. - - We have to sign IPNI Adverts as the provider, so we can warn folks that this ad is as good as the user provided content claim it includes. +- Put the source inclusion claim in the IPNI advert metadata. + - We have to sign IPNI Adverts as the provider. Providing a signed source claim allows more nuanced reputation decisions. ### Quick IPNI primer @@ -85,7 +84,10 @@ The latest `head` CID of the ad list can be broadcast over gossipsub, to be repl The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same ContextID. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. -A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear (http headers? bitswap what now?). Regardless it is more space for provider specified bytes... like maybe... a content claim! *(foreshadowing!)* +A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(http headers? bitswap?)_. + +Regardless, it is space for provider specified bytes which we can use as to include the portable cryptographic proof that an end-user made the original claim that a set of blocks are included in a CAR and that as a large provider we have alerted IPNI on their behalf. + ### How web3.storage integrates IPNI today @@ -148,6 +150,7 @@ Other CAR index forms may be supported in the future. A more convenient external ::: **UCAN invocation** example + ```json { "iss": "did:key:zAlice", @@ -163,6 +166,7 @@ Other CAR index forms may be supported in the future. A more convenient external ``` **Inclusion claim** example + ```json { "content": CID, // CAR CID @@ -193,20 +197,20 @@ type Advertisement struct { - `Entries` must be the CID of an `EntryChunk` for a subset (or all) of multihashes in the CAR. - `ContextID` must be the byte encoded form of the CAR CID. - `Metadata` must be the bytes of the inclusion claim. - + See: [Encoding the IPNI Advertisement](#encoding-the-ipni-advertisement) The Advertisement should then be available for consumption by indexer nodes per the [Advertisement Transfer](https://github.com/ipni/specs/blob/main/IPNI.md#advertisement-transfer) section of the IPNI spec. ### Verifying the CARv2 Index -The service must fetch the CARv2 Index and may verify 1 or more multihashes from the index exist at the specified offsets in the associated CAR. +The service must fetch the CARv2 Index and may verify 1 or more multihashes from the index exist at the specified offsets in the associated CAR. The verifier should pick a set of multihashes at random and fetch the bytes from the CAR identified by the index entry and verify it's multihash. The invocation must return an error if any entry is found to be invalid. Random validation of a number of blocks allows us to detect invalid indexes and lets us tune how much work we are willing to do per car index. -Full validation of every block is not recommended as it opens us up to performing unbounded work. *We have seen CAR files with millions of tiny blocks.* +Full validation of every block is not recommended as it opens us up to performing unbounded work. _We have seen CAR files with millions of tiny blocks._ ### Encoding the IPNI Advertisement @@ -225,8 +229,6 @@ It is possible to create long chains of `EntryChunk` blocks by setting the `Next The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextId` set to the CAR CID. -
- [IPNI]: https://github.com/ipni/specs/blob/main/IPNI.md [MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted [inclusion claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim From e9ad45991998234b908415965d2380141cbc10d9 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:30:25 +0000 Subject: [PATCH 06/10] chore: lint License: MIT Signed-off-by: Oli Evans --- .github/workflows/words-to-ignore.txt | 2 ++ w3-ipni.md | 23 ++++++++++++----------- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index 32b51ee..5fbb7ae 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -142,3 +142,5 @@ dr S3 bitswap PeerID +gossipsub +w3s diff --git a/w3-ipni.md b/w3-ipni.md index d9793a0..548e4f7 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -22,7 +22,7 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with content-claims, by publishing an advert per inclusion claim, and include the source claim in the IPNI advert. -### Motivation +### Motivation - Align IPNI advert entries with CAR block sets and setting the ContextID to be the CAR CID. - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. @@ -39,7 +39,7 @@ IPNI ingests and replicates billions of signed provider claims for where individ Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific `ContextID` and optional metadata. - hosts an IPNI server that Protocol Labs maintains. *(at time of writing)* + hosts an IPNI server that Protocol Labs maintains. _(at time of writing)_ ```bash curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq @@ -80,26 +80,25 @@ An `Advertisement` includes `Provider` info which claims that a the batch of mul Advertisements also include a CID link to any previous ones from the same provider forming a hash linked list. -The latest `head` CID of the ad list can be broadcast over gossipsub, to be replicated and indexed by all listeners, or POSTed over HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. +The latest `head` CID of the ad list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same ContextID. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. -A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(http headers? bitswap?)_. +A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_. Regardless, it is space for provider specified bytes which we can use as to include the portable cryptographic proof that an end-user made the original claim that a set of blocks are included in a CAR and that as a large provider we have alerted IPNI on their behalf. - ### How web3.storage integrates IPNI today -w3s publishes IPNI advertisements as a side-effect of the e-ipfs car block indexer. +web3.storage publishes IPNI advertisements as a side-effect of the E-IPFS car [indexer-lambda]. -Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to S3 as json. +Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to S3 as JSON. -The lambda makes an http request to the cid.contact to inform it when the head CID of the Advertisement linked list changes. +The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes. -The cid.contact IPNI server fetches new head Advertisement from our s3 bucket, and any others in the chain it hasn't read yet, and updates it's indexes. +The IPNI server fetches new head Advertisement from our s3 bucket, and any others in the chain it hasn't read yet, and updates it's indexes. -Our `Advertisement`s contain arbitrary batches of multihashes defined by SQS queue batching config. The ContextID is set to opaque bytes (a custom hash of the hashes). +Our `Advertisement`s contain arbitrary batches of multihashes defined by SQS queue batching config. The `ContextID` is set to opaque bytes (a custom hash of the hashes). #### Diagram @@ -227,11 +226,13 @@ type EntryChunk struct { It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch. -The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextId` set to the CAR CID. +The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextID` set to the CAR CID. [IPNI]: https://github.com/ipni/specs/blob/main/IPNI.md [MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted [inclusion claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim [IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements +[gossipsub]: https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md +[indexer-lambda]: https://github.com/elastic-ipfs/indexer-lambda/blob/a38d8074424d3f02845bac303a0d3fb3719dad82/src/lib/block.js#L22-L32 [olizilla]: https://github.com/olizilla [Protocol Labs]: https://protocol.ai From f3922bc12fd2bebd39f9b1ebacbd828c189bdcec Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:37:40 +0000 Subject: [PATCH 07/10] chore: lint License: MIT Signed-off-by: Oli Evans --- .github/workflows/words-to-ignore.txt | 10 ++++++++++ w3-ipni.md | 16 ++++++++-------- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/.github/workflows/words-to-ignore.txt b/.github/workflows/words-to-ignore.txt index 5fbb7ae..62462c8 100644 --- a/.github/workflows/words-to-ignore.txt +++ b/.github/workflows/words-to-ignore.txt @@ -144,3 +144,13 @@ bitswap PeerID gossipsub w3s +E-IPFS +SQS +config +discoverable +MultihashIndexSorted +CARv2 +4MiB +verifier +libp2p +EntryChunk diff --git a/w3-ipni.md b/w3-ipni.md index 548e4f7..69671ca 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -24,7 +24,7 @@ We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align ou ### Motivation -- Align IPNI advert entries with CAR block sets and setting the ContextID to be the CAR CID. +- Align IPNI advert entries with CAR block sets and setting the `ContextID` to be the CAR CID. - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. - We can **delete** the IPNI records by CAR CID if the CAR is deleted. - Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events @@ -74,7 +74,7 @@ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aap ]}]} ``` -web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to S3 as an `Advertisement`, addressed by it's CID. +web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to a bucket as an `Advertisement`, addressed by it's CID. An `Advertisement` includes `Provider` info which claims that a the batch of multihashes are available via bitswap or HTTP, and are signed by the provider PeerID private key; Each advert is a claim that this peer will provide that batch of multihashes. @@ -82,7 +82,7 @@ Advertisements also include a CID link to any previous ones from the same provid The latest `head` CID of the ad list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. -The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same ContextID. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. +The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_. @@ -92,11 +92,11 @@ Regardless, it is space for provider specified bytes which we can use as to incl web3.storage publishes IPNI advertisements as a side-effect of the E-IPFS car [indexer-lambda]. -Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to S3 as JSON. +Each multihash in a CAR is sent to an SQS queue. The `publisher-lambda` takes batches from the queue, encodes and signs `Advertisement`s and writes them to a bucket as JSON. The lambda makes an HTTP request to the IPNI server at `cid.contact` to inform it when the head CID of the Advertisement linked list changes. -The IPNI server fetches new head Advertisement from our s3 bucket, and any others in the chain it hasn't read yet, and updates it's indexes. +The IPNI server fetches new head Advertisement from our bucket, and any others in the chain it hasn't read yet, and updates it's indexes. Our `Advertisement`s contain arbitrary batches of multihashes defined by SQS queue batching config. The `ContextID` is set to opaque bytes (a custom hash of the hashes). @@ -123,7 +123,7 @@ flowchart TD ## Proposal -Provide a `ipni/offer` ucan ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers. +Provide a `ipni/offer` UCAN ability to sign and publish an IPNI Advertisement for the set of multihashes in a CAR a user has stored with w3s, to make them discoverable via IPFS implementations and other IPNI consumers. ```mermaid sequenceDiagram @@ -142,10 +142,10 @@ sequenceDiagram Alice->>ipni: query (CID) ``` -Invoke it with the CID for an [inclusion claim] that associates a CAR CID wth [MultihashIndexSorted CARv2 Index] CID. +Invoke it with the CID for an [inclusion claim] that associates a CAR CID with a [MultihashIndexSorted CARv2 Index] CID. :::info -Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byteLength for a multihash from the start of the CAR file. +Other CAR index forms may be supported in the future. A more convenient external CAR index format would provide the offset byte and block byte length for a given multihash from the start of the CAR file. ::: **UCAN invocation** example From ad9e3296c8590676e07fb893652040c796053f98 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:53:40 +0000 Subject: [PATCH 08/10] docs: add more on entrychunk encoding License: MIT Signed-off-by: Oli Evans --- w3-ipni.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/w3-ipni.md b/w3-ipni.md index 69671ca..7f2c959 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -179,6 +179,8 @@ The service must fetch he CARv2 index and parse it to find the set of multihashe The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. +_Advertisement IPLD schema_ + ```ipldsch type Advertisement struct { PreviousID optional Link @@ -215,7 +217,11 @@ Full validation of every block is not recommended as it opens us up to performin > The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. -Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks. +In IPNI, batches of multihashes are encoded as `EntryChunk` blocks, each batch includes an array of multihashes. + +A `MultihashIndexSorted` Index encodes a set of multihashes. Mapping from an index to an `EntryChunk` requires parsing the index and encoding the multihashes it contains with the EntryChunk IPLD schema. + +_EntryChunk IPLD schema_ ```ipldsch type EntryChunk struct { @@ -224,6 +230,8 @@ type EntryChunk struct { } ``` +Where the IPLD encoded size of an `EntryChunk` with the set of multihashes would exceed 4MiB (the upper limit for a block that can be transferred by libp2p) the set of multihashes must be split into multiple `EntryChunk` blocks. + It is possible to create long chains of `EntryChunk` blocks by setting the `Next` field to the CID to another `EntryChunk`, but this requires an entire EntryChunk to be fetched and decoded, before the IPNI server can determine the next chunk to fetch. The containing CAR CID provides a useful `ContextID` for grouping multiple (light weight) Advertisement blocks so it is recommended to split the set across multiple `Advertisement` blocks each pointing to an `EntryChunk` with a partition of the set of multihashes in, and the `ContextID` set to the CAR CID. From 2c212e63dff4e0ec2ac58be6fd6ed43ad30884be Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Wed, 13 Dec 2023 15:57:37 +0000 Subject: [PATCH 09/10] chore: lint License: MIT Signed-off-by: Oli Evans --- w3-ipni.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/w3-ipni.md b/w3-ipni.md index 7f2c959..01d6148 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -177,9 +177,7 @@ When `ipni/offer` is invoked the service must fetch the inclusion claim. The enc The service must fetch he CARv2 index and parse it to find the set of multihashes included in the CAR. see: [Verifying the CARv2 Index](#verifying-the-carv2-index) -The set of multihashes must be encoded as 1 or more [IPNI Advertisements]. - -_Advertisement IPLD schema_ +The set of multihashes must be encoded as 1 or more [IPNI Advertisements] per the IPLD Schema: ```ipldsch type Advertisement struct { @@ -221,8 +219,6 @@ In IPNI, batches of multihashes are encoded as `EntryChunk` blocks, each batch i A `MultihashIndexSorted` Index encodes a set of multihashes. Mapping from an index to an `EntryChunk` requires parsing the index and encoding the multihashes it contains with the EntryChunk IPLD schema. -_EntryChunk IPLD schema_ - ```ipldsch type EntryChunk struct { Entries [Bytes] From 40fb9981225dc8b6180e9e4586f6a31b970508e8 Mon Sep 17 00:00:00 2001 From: Oli Evans Date: Thu, 18 Jan 2024 14:08:12 +0000 Subject: [PATCH 10/10] chore: copy tweaks License: MIT Signed-off-by: Oli Evans --- w3-ipni.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/w3-ipni.md b/w3-ipni.md index 01d6148..0e8985c 100644 --- a/w3-ipni.md +++ b/w3-ipni.md @@ -20,13 +20,13 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "S ## Introduction -We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with content-claims, by publishing an advert per inclusion claim, and include the source claim in the IPNI advert. +We publish ad-hoc batches of multihashes to IPNI. This proposal aims to align our usage of IPNI with [content-claims], by publishing an advert per [inclusion claim], and include the source claim in the IPNI Advertisement ### Motivation - Align IPNI advert entries with CAR block sets and setting the `ContextID` to be the CAR CID. - - With this we (or anyone, ipni is open access) can now use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. - - We can **delete** the IPNI records by CAR CID if the CAR is deleted. + - This exposes our block-to-car indexes. Anyone could use IPNI to find which CAR a block is in. The context id bytes provide the CAR CID for any block look up. The CAR CID can then be used to find the CAR index via our content-claims API. + - We could delete the IPNI records by CAR CID if the CAR is deleted. - Make IPNI advertising an explicit UCAN capability that clients can invoke rather than a side-effect of bucket events - With this we are free to write CARs anywhere. The users agent invokes a `ipni/offer` capability to ask us to publish and IPNI ad for the blocks in their CAR. - This empowers the user to opt-in or out as they need, and allows us to bill for the (small) cost of running that service. @@ -39,7 +39,9 @@ IPNI ingests and replicates billions of signed provider claims for where individ Users can query IPNI servers for any CID, and it provides a set of provider addresses and transport info, along with a provider specific `ContextID` and optional metadata. - hosts an IPNI server that Protocol Labs maintains. _(at time of writing)_ +For example: hosts an IPNI server that Protocol Labs maintains. + +_Query IPNI for a cid_ ```bash curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aapwa2twnc4 -sS | jq @@ -74,19 +76,19 @@ curl https://cid.contact/cid/bafybeicawc3qwtlecld6lmtvsndimoz3446xyaprgsxvhd3aap ]}]} ``` -web3.storage publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to a bucket as an `Advertisement`, addressed by it's CID. +[web3.storage] publishes the blocks it can provide by encoding a batch of multihashes as an IPLD object and writing it to a bucket as an `Advertisement`, addressed by it's CID. -An `Advertisement` includes `Provider` info which claims that a the batch of multihashes are available via bitswap or HTTP, and are signed by the provider PeerID private key; Each advert is a claim that this peer will provide that batch of multihashes. +An `Advertisement` includes `Provider` info which claims that the batch of multihashes are available via bitswap or HTTP, and are signed by the providers PeerID private key; Each advert is a claim that this peer will provide that batch of multihashes. Advertisements also include a CID link to any previous ones from the same provider forming a hash linked list. -The latest `head` CID of the ad list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. +The latest `head` CID of the advert list can be broadcast over [gossipsub], to be replicated and indexed by all listeners, or sent via HTTP to specific IPNI servers as a notification to pull and index the latest ads from you at their earliest convenience. The advert `ContextID` allows providers to specify a custom grouping key for multiple adverts. You can update or remove multiple adverts by specifying the same `ContextID`. The value is an opaque byte array as far as IPNI is concerned, and is provided in the query response. A `Metadata` field is also available for provider specific retrieval hints, that a user should send to the provider when making a request for the block, but the mechanism here is unclear _(HTTP headers? bitswap?)_. -Regardless, it is space for provider specified bytes which we can use as to include the portable cryptographic proof that an end-user made the original claim that a set of blocks are included in a CAR and that as a large provider we have alerted IPNI on their behalf. +Regardless, it is a field we can use to include the portable cryptographic proof of the content-claim that an end-user made that a set of blocks are included in a CAR. The provider has to sign the IPNI advert with the peerID key that should be used to secure the libp2p connection when retrieving the block. For upload services like web3.storage, ### How web3.storage integrates IPNI today @@ -234,9 +236,11 @@ The containing CAR CID provides a useful `ContextID` for grouping multiple (ligh [IPNI]: https://github.com/ipni/specs/blob/main/IPNI.md [MultihashIndexSorted CARv2 Index]: https://ipld.io/specs/transport/car/carv2/#format-0x0401-multihashindexsorted +[content-claims]: https://github.com/web3-storage/content-claims [inclusion claim]: https://github.com/web3-storage/content-claims?tab=readme-ov-file#inclusion-claim [IPNI Advertisements]: https://github.com/ipni/specs/blob/main/IPNI.md#advertisements [gossipsub]: https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/README.md [indexer-lambda]: https://github.com/elastic-ipfs/indexer-lambda/blob/a38d8074424d3f02845bac303a0d3fb3719dad82/src/lib/block.js#L22-L32 [olizilla]: https://github.com/olizilla [Protocol Labs]: https://protocol.ai +[web3.storage]: https://web3.storage