Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC call returns "PrefixKeysForbidden" error code #272

Closed
skunert opened this issue Mar 7, 2023 · 11 comments · Fixed by #670
Closed

RPC call returns "PrefixKeysForbidden" error code #272

skunert opened this issue Mar 7, 2023 · 11 comments · Fixed by #670

Comments

@skunert
Copy link
Contributor

skunert commented Mar 7, 2023

Runtime calls of the persisted_validation_data method via state_call return a "PrefixKeysForbidden" error.

Previously I assumed this is related to #177, since this runtime call uses the state_root host function. However, should the execution proof that the full node delivers not include all required state entries? Why do we need this prefix key fetching here? Is it a bug after all?

@tomaka
Copy link
Contributor

tomaka commented Mar 10, 2023

The full node indeed delivers all the required state entries. The error is returned simply because this feature isn't fully implemented in smoldot. It's a bit tricky in terms of logic to determine from a proof the list of entries starting with a prefix, and since it was so far useless I decided in the past to return an error instead of implementing it.

@tomaka
Copy link
Contributor

tomaka commented Mar 11, 2023

An unintended side effect of #259 is that the runtime code might request the list of all keys in the entire storage in order to build the trie root cache. This might be why you're getting this error, and this is definitely problematic. I might need to revert #259.

However, as a side note, I really feel like there's a problem somewhere with what you're doing. persisted_validation_data really isn't supposed to do anything else but read the storage. Either there's a bug in what you're doing and you're not calling the right function, or there's a bug in the runtime.
Smoldot itself calls this function when you connect to a parachain, and everything works fine and has worked long before #259.

@skunert
Copy link
Contributor Author

skunert commented Mar 14, 2023

The problem can be reproduced by using this json rpc request a polkadot light client:

{"id": "id", "method": "state_call", "params": ["ParachainHost_persisted_validation_data", "0xe803000001"], "jsonrpc": "2.0"}

This is be a valid json rpc call for polkadot and should request the persisted validation data for statemint.
e803000001 = (ParaId(1000), TimedOut).

Looking at the runtime call implementation here and here, the call is indeed read-only when the occupied core assumption is TimedOut.

Also you are right that this is related to #259. Before that commit, the call works fine. After, it is broken. This is also why the smoldot internal calls succeed, they still use the read-only-runtime-host if I see that correctly.

@tomaka
Copy link
Contributor

tomaka commented Mar 14, 2023

Ah ok, I had wrongly assumed that ParachainHost_persisted_validation_data was also failing when you reported #254

Also you are right that this is related to #259. Before that commit, the call works fine. After, it is broken. This is also why the smoldot internal calls succeed, they still use the read-only-runtime-host if I see that correctly.

The issue is that in the case of the read-only, when the runtime asks for the storage trie root, we can just provide the value that is in the header of the block, since we know that the storage hasn't been modified.

In the case of the non-read-only, however, we (re-)calculate this storage trie root.

Unfortunately, this calculation at the moment assumes that we have access to the whole storage. For example, when the calculation needs the trie node hash of a storage item, it just asks for the storage item. And because the state trie root calculation cache isn't populated, we try to access every single storage item in order to calculate the hash of the root.

This obviously isn't really compatible with reading a Merkle proof. If I were to implement the "prefix keys" thing that the issue title mentions, the outcome would be that the Merkle proof is missing entries.

So indeed #259 made things worse.

The proper solution is to refactor a bit the calculation in order to only ask what is needed and nothing more, and what is needed can be found in the Merkle proof. This is however non-trivial.

Another alternative would be for the runtime to not ask for the storage trie root. In principle it shouldn't, but for some reason I've noticed that many (if not all?) runtime calls try to read it.

@skunert
Copy link
Contributor Author

skunert commented Mar 14, 2023

Thanks for the explanation!

If I were to implement the "prefix keys" thing that the issue title mentions, the outcome would be that the Merkle proof is missing entries.

You lost me on this part, can you rephrase maybe?

Another alternative would be for the runtime to not ask for the storage trie root. In principle it shouldn't, but for some reason I've noticed that many (if not all?) runtime calls try to read it.

The relay chain storage root is part of PersistedValidationData that is returned from the call, so in this case it makes sense to me that the runtime reads it.

@tomaka
Copy link
Contributor

tomaka commented Mar 14, 2023

You lost me on this part, can you rephrase maybe?

The PrefixKeysForbidden error comes from the fact that the runtime call tries to access the list of all keys that start with a given prefix, and that this is not implemented in smoldot yet.
However, if I implemented this by reading from the Merkle proof, it is likely that the proof would be missing entries.
That's because the fact that the runtime call tries to access the list of all keys that start with a certain prefix is "artificial". It's an API trick used by the code that does the runtime call so that it can calculate the storage root.

@tomaka
Copy link
Contributor

tomaka commented Mar 23, 2023

It seems clear to me that the calculations cache system needs to be "refactored" so that the cache is maintained higher in the stack of layers, and the low level code asks for node values (which would then come either from the cache or from the Merkle proof).

What is giving me a lot of troubles is that there are situations where it's really hard to determine what is the minimum amount of information required.

I am going to take two example situations, which seem tricky to me, to get an idea about this:

If we imagine a runtime call that just sets a (potentially non-existing) random storage value then gets the root hash, what you need in order to fulfill this demand is the node value of its parent, but also the node value of the child of its parent that is in the same direction of the modified node, and potentially of other siblings depending on the structure of the trie, because you need to know where in the trie to insert the new entry.

If we imagine a runtime call that just clears a prefix then gets the root hash, what you need is the node value of the closest ancestor of the prefix, but also, because the number of nodes being removed is bounded, you need to be able to walk down the trie. At the moment, we have a PrefixKeys variant asking for all the keys with a certain prefix, but this is asking more data than strictly necessary.

The naive solution would be a GetNodeValue variant, however this alone doesn't let you walk down a trie because you don't know the partial keys of the descendants from a node value alone.

I think the solution is a GetNodeValue variant, where the responder must also indicate the partial key of all the children, if it is known. If the partial key of a child is not known, they can indicate None.

@tomaka
Copy link
Contributor

tomaka commented Mar 23, 2023

As a first step, we can add a GetNodeValue variant, then remove PrefixKeys and get the list of keys through GetNodeValue.

Then, add a cache to OptimisticSync and consensus_service.

Then, remove the existing cache from runtime_host.rs and just maintain a trie structure directly in runtime_host.rs that is used in order to calculate the root hash.

@tomaka
Copy link
Contributor

tomaka commented Apr 14, 2023

Roadmap:

  • Make consensus_service.rs use a TrieStructure instead of BTreeMap to store the chain state.
  • Add a GetNodeValue variant everywhere (but do not create it yet) and change the code reacting to these variants to respond to it.
  • Make the root calculation code in runtime_host.rs generate GetNodeValue variants, and ditch the cache from runtime_host.rs entirely.
  • Change the NextKey variant so that the next key must start with a certain prefix. This is necessary in order to implement clear_prefix on top of NextKey, as otherwise the code will think that the Merkle proof is incomplete when reaching the end of the prefix.
  • Make the clear_prefix host functions generate NextKey variants to walk through the trie rather than PrefixKeys.
  • Remove the PrefixKeys variant entirely.
  • Decide what to do with the CalculationsCache struct. Probably remove it entirely. (Remove CalculationCache type #662)
  • Re-add caching of node values in consensus_service.rs. This probably involves changing the storage diffs returned when verifying a block to include updates to node values. (Full node is extremely slow #661)

@tomaka
Copy link
Contributor

tomaka commented Jun 5, 2023

#639 was supposed to be bulk of the changes. However, I've realized that the "requests" that are generated were still too loose to operate over proofs. ClosestDescendant + MerkleValue should be turned into ClosestDescendantMerkleValue, because we don't necessarily know what the closest descendant is.

@tomaka
Copy link
Contributor

tomaka commented Jun 5, 2023

This should now be fixed after #670. At least calling parachain_validation_data now works for me.

I can't be sure that all runtime calls work because this thing is insanely complicated. I've tried my best to make the runtime calls use as few proof entries as possible, and I think that I can't reduce this any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants