Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New values for DHT Provider Record Republish and Expiration (22h/48h, RFM17) #451

Merged
merged 7 commits into from
Dec 12, 2022
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 70 additions & 12 deletions kad-dht/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,22 @@ nodes, unrestricted nodes should operate in _server mode_ and restricted nodes,
e.g. those with intermittent availability, high latency, low bandwidth, low
CPU/RAM/Storage, etc., should operate in _client mode_.

As an example, running the libp2p Kademlia protocol on top of the Internet,
publicly routable nodes, e.g. servers in a datacenter, might operate in _server
As an example, publicly routable nodes running the libp2p Kademlia protocol,
e.g. servers in a datacenter, should operate in _server
mode_ and non-publicly routable nodes, e.g. laptops behind a NAT and firewall,
might operate in _client mode_. The concrete factors used to classify nodes into
should operate in _client mode_. The concrete factors used to classify nodes into
_clients_ and _servers_ depend on the characteristics of the network topology
and the properties of the Kademlia DHT . Factors to take into account are e.g.
and the properties of the Kademlia DHT. Factors to take into account are e.g.
network size, replication factor and republishing period.
yiannisbot marked this conversation as resolved.
Show resolved Hide resolved

For instance, setting the replication factor to a low value would require more
reliable peers, whereas having higher replication factor could allow for less
reliable peers at the cost of more overhead. Ultimately, peers that act as
servers should help the network (i.e., provide positive utility in terms of
availability, reachability, bandwidth). Any factor that slows down network
operations (e.g., a node not being reachable, or overloaded) for the majority
of times it is being contacted should instead be operating as a client node.

Nodes, both those operating in _client_ and _server mode_, add another node to
their routing table if and only if that node operates in _server mode_. This
distinction allows restricted nodes to utilize the DHT, i.e. query the DHT,
Expand Down Expand Up @@ -228,7 +236,7 @@ Then we loop:
becomes the new best peer (`Pb`).
2. If the new value loses, we add the current peer to `Po`.
2. If successful with or without a value, the response will contain the
closest nodes the peer knows to the key `Key`. Add them to the candidate
closest nodes the peer knows to the `Key`. Add them to the candidate
list `Pn`, except for those that have already been queried.
3. If an error or timeout occurs, discard it.
4. Go to 1.
Expand Down Expand Up @@ -256,7 +264,7 @@ type Validator interface {
```

`Validate()` should be a pure function that reports the validity of a record. It
may validate a cryptographic signature, or else. It is called on two occasions:
may validate a cryptographic signature, or similar. It is called on two occasions:

1. To validate values retrieved in a `GET_VALUE` query.
2. To validate values received in a `PUT_VALUE` query before storing them in the
Expand All @@ -268,23 +276,71 @@ heuristic of the value to make the decision.

### Content provider advertisement and discovery

Nodes must keep track of which nodes advertise that they provide a given key
(CID). These provider advertisements should expire, by default, after 24 hours.
These records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
There are two things at play with regard to provider record (and therefore content)
liveness and reachability:

Content needs to be reachable, despite peer churn;
and nodes that store and serve provider records should not serve records for stale content,
i.e., content that the original provider does not wish to make available anymore.

The following two parameters help cover both of these cases.
yiannisbot marked this conversation as resolved.
Show resolved Hide resolved
1. **Provider Record Republish Interval:** The content provider
needs to make sure that the nodes chosen to store the provider record
are still online when clients ask for the record. In order to
guarantee this, while taking into account the peer churn, content providers
republish the records they want to provide. Choosing the particular value for the
Republish interval is network-specific and depends on several parameters, such as
peer reliability and churn. For the IPFS network it is currently set to 22 hours.
2. **Provider Record Expiration Interval:** The network needs to provide
content that content providers are still interested in providing. In other words,
nodes should not keep records for content that content providers have stopped
providing (aka stale records). In order to guarantee this, provider records
should _expire_ after some interval, i.e., nodes should stop serving those records,
unless the content provider has republished the provider record. Again, the specific
setting depends on the characteristics of the network. In the IPFS DHT the Expiration
Interval is set to 48hrs.

The values chosen for those parameters should be subject to continuous monitoring
and investigation. Ultimately, the values of those parameters should balance
the tradeoff between provider record liveness (due to node churn) and traffic overhead
(to republish records).
The latest parameters are based on the comprehensive study published
in [provider-record-measurements].

Provider records are managed through the `ADD_PROVIDER` and `GET_PROVIDERS`
messages.

It is also worth noting that the keys for provider records are multihashes. This
is because:

- Provider records are used as a rendezvous point for all the parties who have
advertised that they store some piece of content.
- The same multihash can be in different CIDs (e.g. CIDv0 vs CIDv1 of a SHA-256 dag-pb object,
or the same multihash but with different codecs such as dag-pb vs raw).
- Therefore, the rendezvous point should converge on the minimal thing everyone agrees on,
which is the multihash, not the CID.
Comment on lines +318 to +326
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏


#### Content provider advertisement

When the local node wants to indicate that it provides the value for a given
key, the DHT finds the closest peers to the key using the `FIND_NODE` RPC (see
key, the DHT finds the (`k` = 20) closest peers to the key using the `FIND_NODE` RPC (see
[peer routing section](#peer-routing)), and then sends an `ADD_PROVIDER` RPC with
its own `PeerInfo` to each of these peers.
its own `PeerInfo` to each of these peers. The study in [provider-record-measurements]
proved that the replication factor of `k` = 20 is a good setting, although continuous
monitoring and investigation may change this recommendation in the future.

Each peer that receives the `ADD_PROVIDER` RPC should validate that the received
`PeerInfo` matches the sender's `peerID`, and if it does, that peer should store
the `PeerInfo` in its datastore. Implementations may choose to not store the
addresses of the providing peer e.g. to reduce the amount of required storage or
to prevent storing potentially outdated address information.
to prevent storing potentially outdated address information. Implementations that choose
to keep the network address (i.e., the `multiaddress`) of the providing peer should do it for
a period of time that they are confident the network addresses of peers do not change after the
provider record has been (re-)published. As with previous constant values, this is dependent
on the network's characteristics. A safe value here is the Routing Table Refresh Interval.
In the kubo IPFS implementation, this is set to 30 mins. After that period, peers provide
the provider's `peerID` only, in order to avoid pointing to stale network addresses
(i.e., the case where the peer has moved to a new network address).

#### Content provider discovery

Expand Down Expand Up @@ -470,3 +526,5 @@ multiaddrs are stored in the node's peerbook.
[ping]: https://github.com/libp2p/specs/issues/183

[go-libp2p-xor]: https://github.com/libp2p/go-libp2p-xor

[provider-record-measurements]: https://github.com/protocol/network-measurements/blob/master/results/rfm17-provider-record-liveness.md