RPC call returns "PrefixKeysForbidden" error code #272

skunert · 2023-03-07T16:57:03Z

Runtime calls of the persisted_validation_data method via state_call return a "PrefixKeysForbidden" error.

Previously I assumed this is related to #177, since this runtime call uses the state_root host function. However, should the execution proof that the full node delivers not include all required state entries? Why do we need this prefix key fetching here? Is it a bug after all?

The text was updated successfully, but these errors were encountered:

tomaka · 2023-03-10T16:39:30Z

The full node indeed delivers all the required state entries. The error is returned simply because this feature isn't fully implemented in smoldot. It's a bit tricky in terms of logic to determine from a proof the list of entries starting with a prefix, and since it was so far useless I decided in the past to return an error instead of implementing it.

tomaka · 2023-03-11T08:10:37Z

An unintended side effect of #259 is that the runtime code might request the list of all keys in the entire storage in order to build the trie root cache. This might be why you're getting this error, and this is definitely problematic. I might need to revert #259.

However, as a side note, I really feel like there's a problem somewhere with what you're doing. persisted_validation_data really isn't supposed to do anything else but read the storage. Either there's a bug in what you're doing and you're not calling the right function, or there's a bug in the runtime.
Smoldot itself calls this function when you connect to a parachain, and everything works fine and has worked long before #259.

skunert · 2023-03-14T09:10:04Z

The problem can be reproduced by using this json rpc request a polkadot light client:

{"id": "id", "method": "state_call", "params": ["ParachainHost_persisted_validation_data", "0xe803000001"], "jsonrpc": "2.0"}

This is be a valid json rpc call for polkadot and should request the persisted validation data for statemint.
e803000001 = (ParaId(1000), TimedOut).

Looking at the runtime call implementation here and here, the call is indeed read-only when the occupied core assumption is TimedOut.

Also you are right that this is related to #259. Before that commit, the call works fine. After, it is broken. This is also why the smoldot internal calls succeed, they still use the read-only-runtime-host if I see that correctly.

tomaka · 2023-03-14T10:03:17Z

Ah ok, I had wrongly assumed that ParachainHost_persisted_validation_data was also failing when you reported #254

Also you are right that this is related to #259. Before that commit, the call works fine. After, it is broken. This is also why the smoldot internal calls succeed, they still use the read-only-runtime-host if I see that correctly.

The issue is that in the case of the read-only, when the runtime asks for the storage trie root, we can just provide the value that is in the header of the block, since we know that the storage hasn't been modified.

In the case of the non-read-only, however, we (re-)calculate this storage trie root.

Unfortunately, this calculation at the moment assumes that we have access to the whole storage. For example, when the calculation needs the trie node hash of a storage item, it just asks for the storage item. And because the state trie root calculation cache isn't populated, we try to access every single storage item in order to calculate the hash of the root.

This obviously isn't really compatible with reading a Merkle proof. If I were to implement the "prefix keys" thing that the issue title mentions, the outcome would be that the Merkle proof is missing entries.

So indeed #259 made things worse.

The proper solution is to refactor a bit the calculation in order to only ask what is needed and nothing more, and what is needed can be found in the Merkle proof. This is however non-trivial.

Another alternative would be for the runtime to not ask for the storage trie root. In principle it shouldn't, but for some reason I've noticed that many (if not all?) runtime calls try to read it.

skunert · 2023-03-14T18:39:16Z

Thanks for the explanation!

If I were to implement the "prefix keys" thing that the issue title mentions, the outcome would be that the Merkle proof is missing entries.

You lost me on this part, can you rephrase maybe?

Another alternative would be for the runtime to not ask for the storage trie root. In principle it shouldn't, but for some reason I've noticed that many (if not all?) runtime calls try to read it.

The relay chain storage root is part of PersistedValidationData that is returned from the call, so in this case it makes sense to me that the runtime reads it.

tomaka · 2023-03-14T19:27:44Z

You lost me on this part, can you rephrase maybe?

The PrefixKeysForbidden error comes from the fact that the runtime call tries to access the list of all keys that start with a given prefix, and that this is not implemented in smoldot yet.
However, if I implemented this by reading from the Merkle proof, it is likely that the proof would be missing entries.
That's because the fact that the runtime call tries to access the list of all keys that start with a certain prefix is "artificial". It's an API trick used by the code that does the runtime call so that it can calculate the storage root.

tomaka · 2023-03-23T12:02:28Z

It seems clear to me that the calculations cache system needs to be "refactored" so that the cache is maintained higher in the stack of layers, and the low level code asks for node values (which would then come either from the cache or from the Merkle proof).

What is giving me a lot of troubles is that there are situations where it's really hard to determine what is the minimum amount of information required.

I am going to take two example situations, which seem tricky to me, to get an idea about this:

If we imagine a runtime call that just sets a (potentially non-existing) random storage value then gets the root hash, what you need in order to fulfill this demand is the node value of its parent, but also the node value of the child of its parent that is in the same direction of the modified node, and potentially of other siblings depending on the structure of the trie, because you need to know where in the trie to insert the new entry.

If we imagine a runtime call that just clears a prefix then gets the root hash, what you need is the node value of the closest ancestor of the prefix, but also, because the number of nodes being removed is bounded, you need to be able to walk down the trie. At the moment, we have a PrefixKeys variant asking for all the keys with a certain prefix, but this is asking more data than strictly necessary.

The naive solution would be a GetNodeValue variant, however this alone doesn't let you walk down a trie because you don't know the partial keys of the descendants from a node value alone.

I think the solution is a GetNodeValue variant, where the responder must also indicate the partial key of all the children, if it is known. If the partial key of a child is not known, they can indicate None.

tomaka · 2023-03-23T13:45:15Z

As a first step, we can add a GetNodeValue variant, then remove PrefixKeys and get the list of keys through GetNodeValue.

Then, add a cache to OptimisticSync and consensus_service.

Then, remove the existing cache from runtime_host.rs and just maintain a trie structure directly in runtime_host.rs that is used in order to calculate the root hash.

tomaka · 2023-04-14T09:39:29Z

Roadmap:

Make consensus_service.rs use a TrieStructure instead of BTreeMap to store the chain state.
Add a GetNodeValue variant everywhere (but do not create it yet) and change the code reacting to these variants to respond to it.
Make the root calculation code in runtime_host.rs generate GetNodeValue variants, and ditch the cache from runtime_host.rs entirely.
Change the NextKey variant so that the next key must start with a certain prefix. This is necessary in order to implement clear_prefix on top of NextKey, as otherwise the code will think that the Merkle proof is incomplete when reaching the end of the prefix.
Make the clear_prefix host functions generate NextKey variants to walk through the trie rather than PrefixKeys.
Remove the PrefixKeys variant entirely.
~~Decide what to do with the CalculationsCache struct. Probably remove it entirely.~~ (Remove CalculationCache type #662)
~~Re-add caching of node values in consensus_service.rs. This probably involves changing the storage diffs returned when verifying a block to include updates to node values.~~ (Full node is extremely slow #661)

tomaka · 2023-06-05T06:29:32Z

#639 was supposed to be bulk of the changes. However, I've realized that the "requests" that are generated were still too loose to operate over proofs. ClosestDescendant + MerkleValue should be turned into ClosestDescendantMerkleValue, because we don't necessarily know what the closest descendant is.

tomaka · 2023-06-05T15:24:49Z

This should now be fixed after #670. At least calling parachain_validation_data now works for me.

I can't be sure that all runtime calls work because this thing is insanely complicated. I've tried my best to make the runtime calls use as few proof entries as possible, and I think that I can't reduce this any further.

tomaka mentioned this issue Mar 10, 2023

Add a network protocol to optimize state_getKeys #177

Open

tomaka mentioned this issue Mar 23, 2023

Implement support for child tries #166

Closed

This was referenced Apr 14, 2023

Use a TrieStructure in the consensus_service #424

Merged

Add TrieStructure::range #425

Merged

Implement the clear_prefix host functions by iterating keys #451

Merged

This was referenced May 29, 2023

Full node: move the storage handling and runtime handling to the consensus service #631

Merged

Make runtime_host operate over the entire trie rather than only over the storage entries of the trie #639

Merged

This was referenced Jun 5, 2023

Tweak the trie root calculator to not ask for the closest descendant key when not necessary #669

Merged

Properly handle requests for NextKey and ClosestDescendantMerkleValue when reading proofs #670

Merged

tomaka closed this as completed in #670 Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC call returns "PrefixKeysForbidden" error code #272

RPC call returns "PrefixKeysForbidden" error code #272

skunert commented Mar 7, 2023

tomaka commented Mar 10, 2023 •

edited

Loading

tomaka commented Mar 11, 2023 •

edited

Loading

skunert commented Mar 14, 2023

tomaka commented Mar 14, 2023 •

edited

Loading

skunert commented Mar 14, 2023

tomaka commented Mar 14, 2023

tomaka commented Mar 23, 2023 •

edited

Loading

tomaka commented Mar 23, 2023 •

edited

Loading

tomaka commented Apr 14, 2023 •

edited

Loading

tomaka commented Jun 5, 2023 •

edited

Loading

tomaka commented Jun 5, 2023

RPC call returns "PrefixKeysForbidden" error code #272

RPC call returns "PrefixKeysForbidden" error code #272

Comments

skunert commented Mar 7, 2023

tomaka commented Mar 10, 2023 • edited Loading

tomaka commented Mar 11, 2023 • edited Loading

skunert commented Mar 14, 2023

tomaka commented Mar 14, 2023 • edited Loading

skunert commented Mar 14, 2023

tomaka commented Mar 14, 2023

tomaka commented Mar 23, 2023 • edited Loading

tomaka commented Mar 23, 2023 • edited Loading

tomaka commented Apr 14, 2023 • edited Loading

tomaka commented Jun 5, 2023 • edited Loading

tomaka commented Jun 5, 2023

tomaka commented Mar 10, 2023 •

edited

Loading

tomaka commented Mar 11, 2023 •

edited

Loading

tomaka commented Mar 14, 2023 •

edited

Loading

tomaka commented Mar 23, 2023 •

edited

Loading

tomaka commented Mar 23, 2023 •

edited

Loading

tomaka commented Apr 14, 2023 •

edited

Loading

tomaka commented Jun 5, 2023 •

edited

Loading