Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Add a way to ask for a storage proof of just the hash of a value #10623

Open
tomaka opened this issue Jan 10, 2022 · 16 comments
Open

Add a way to ask for a storage proof of just the hash of a value #10623

tomaka opened this issue Jan 10, 2022 · 16 comments
Labels
J0-enhancement An additional feature request.

Comments

@tomaka
Copy link
Contributor

tomaka commented Jan 10, 2022

In the context of smoldot, I would like to store the latest runtime code in a cache on disk, and, when smoldot initializes, compare the runtime code in cache with the runtime code on chain.
The objective is to save bandwidth, as smoldot wouldn't have to download the full runtime code again if it already has it in cache.
cc paritytech/smoldot#1769

In order for this to be useful, the light client needs to be able to ask a full node for just the hash of :code.
As far as I know, this isn't possible with Substrate right now, and it would be nice to add this possibility.

Unfortunately I'm not really up to date with the various trie and trie proof formats so I don't really know what to suggest.

cc @cheme

@bkchr bkchr added the J0-enhancement An additional feature request. label Jan 10, 2022
@bkchr
Copy link
Member

bkchr commented Jan 10, 2022

That should be doable with #9732.

At least from the proof side, but this will still require some code path to only include the hash and not the data.

@cheme can correct me if I'm wrong. I think it should work and it should also work the way you want it for :code, but only "by accident". The point is that some data could be stored inside a node (because it isn't that big), but then we don't have the hash of the data in the trie. But since #9732 we are putting big data (bigger than the size of a hash) into its own "bucket" under the hash of the data. As the data in our case :code is probably always bigger than the size of a hash, it should work the way you want it to have.

@cheme
Copy link
Contributor

cheme commented Jan 10, 2022

👍 yes this is only doable if working on a migrated #9732 state (not on old/current state).

There is no specific api to query the hash of the value directly, but just producing the proof is the same as running one iteration of the key iterator for prefix ':code'.

If data is less than 33 byte, we would still have a proof of its hash (by hashing the value), but to read the proof a new get_value_hash function on the trie would be cleaner (then the function could also be use to create the proof).

@bkchr
Copy link
Member

bkchr commented Jan 11, 2022

If data is less than 33 byte, we would still have a proof of its hash (by hashing the value)

Ahh you mean we would have a proof with the data in it, but we can then just request the hash of this that is being found in the proof?

@cheme
Copy link
Contributor

cheme commented Jan 11, 2022

If data is less than 33 byte, we would still have a proof of its hash (by hashing the value)

Ahh you mean we would have a proof with the data in it, but we can then just request the hash of this that is being found in the proof?

yes.

Also I remember that without #9732, we can also use a trie proof as implemented in https://github.com/paritytech/trie/tree/master/trie-db/src/proof : basically we remove the value from the proof and pass it as a parameter during verification (so it is more costly to check (need to hash the wasm big trie node)).
If verification succeeds we know value did not change, if it fails we know it did probably change (could also be change of node structure).

@tomaka
Copy link
Contributor Author

tomaka commented Jul 6, 2022

#9732 has been merged, but none of the main relay chains has been updated to state_version = 2 yet, is that correct?

Assuming that we have a migrated state, we just need to add a new boolean flag to the request in the network protocol, so that the proof in the response doesn't include the unhashed values.

message RemoteReadRequest {

The protocol lets you request multiple keys at the same time. We could make it possible to configure, for each individual key, whether you want the data or just the hash. Or maybe this is overkill and a global setting is enough, I'm not sure.

@cheme
Copy link
Contributor

cheme commented Jul 6, 2022

#9732 has been merged, but none of the main relay chains has been updated to state_version = 2 yet, is that correct?

yes (update is 0 to 1), you can follow https://github.com/paritytech/devops/issues/1508

I did not think about mixing hash only and value request, does not sounds like a bad idea. Maybe I will still prefer have global setting for compactness of query, but it is only because I don't expect user to mix both query in practice. But can be done this way.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 25, 2022

I don't actually understand why this is blocked on the deployment on #9732.
Can we not generate and send back on the network a Merkle proof that contains hashes, even when the storage itself doesn't contain these hashes?

@bkchr
Copy link
Member

bkchr commented Oct 25, 2022

The problem for now is that if you want to proof that the hash of some data is part of the trie isn't that easy. We don't store the hash of the data in the trie directly. We could return a storage proof that a certain leaf is part of the trie. So, you would need to build the leaf, hash the leaf and then you could check that the leaf is part of the trie. However, when the data is for example inside a branch node, this gets even more complicated, because you would need to reconstruct the branch node, hash it etc.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 25, 2022

Ok I understood now.
If you modify the leaf nodes to contain the hashed value, you modify the hash of the leaf node. There's no way to prove that the proof is correct without including the unhashed value in it.

@bkchr
Copy link
Member

bkchr commented Oct 25, 2022

Exactly 👍 In the future the leaf/branch nodes will directly contain the hash and we can provide this proof easily.

@tomaka
Copy link
Contributor Author

tomaka commented Oct 29, 2022

Another use case for having only hashes in the storage proofs is when you want to implement state_getKeysPaged.
This JSON-RPC function returns the list of all keys starting with a certain prefix.
This is used for example to get the list of validators, I believe.

Having the values in the proof adds a lot of unnecessary bloat.

I commented this here as I don't think it's worth a separate issue.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 22, 2022

Coming back. I still don't think that this is blocked on the deployment on #9732.

What we can do when receiving a request for a "proof without values" is:

  • If the chain uses state_version = 0, then send the values anyway.
  • If the chain uses state_version = 1, then don't send the values.

The format of the nodes in the proof already lets us know the difference between the two situations, so there's no ambiguity.

I think we should implement it this way because we should preferably support this protocol even if state_version is still 0 or if the chain is in a hybrid state.

@tomaka
Copy link
Contributor Author

tomaka commented Nov 29, 2022

Would it be possible to start implementing this, at least the necessary trie/client changes?

@cheme
Copy link
Contributor

cheme commented Nov 29, 2022

I did implement it in the past, but my branches are not really up to date.
I can plan to do it start of next week, I mean if nobody else plan to do it, I will do it start of next week :)

@tomaka
Copy link
Contributor Author

tomaka commented Feb 21, 2023

Since I don't have access to https://github.com/paritytech/devops/issues/1508 anymore, can I ask what is the current progress of this update, now that 3 months have passed?

@cheme
Copy link
Contributor

cheme commented Feb 21, 2023

We did migrate westend (you did see it in the recent westend issue), I will prepare kusama and polkadot next (probably will wait to check that there is no other issue in westend and yours is fixed).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request.
Projects
None yet
Development

No branches or pull requests

3 participants