Make `runtime_host` operate over the entire trie rather than only over the storage entries of the trie #639

tomaka · 2023-05-30T11:45:17Z

Context

The runtime_host module calls the Wasm runtime. When this Wasm runtime wants to access a storage entry a StorageGet is generated, and when this Wasm runtime iterates over keys, a StorageNextKey is generated.
In parallel of the execution, the runtime_host maintains a "trie cache" that contains the structure of the storage trie. When the runtime writes onto the storage, the runtime_host updates the structure contained in this cache and invalidates the hash of the trie nodes that are cached.
At the end of the operation, the runtime_host looks into this trie structure and recalculates all the hashes that have been invalidated.
The user has the possibility to not provide a cache, in which case the runtime_host will iterate over the entire trie structure (using a StoragePrefixKeys) in order to populate it.

There are a few problems with this system:

The trie structure can in theory be very big, and you're not supposed to store it entirely in memory. I can't easily measure its size at the head of the Polkadot chain, but if the trie continues to grow smoldot can eventually run out of memory. While this isn't a problem in practice yet and while this PR doesn't solve that problem entirely (we maintain a list of changes during the execution), it is a step towards a solution.
If no cache is provided, a StoragePrefixKeys is generated with a prefix of &[] in order to fetch the list of every single key in the trie. If the execution is performed by a light client, it is not possible to know the list of all keys, and we have no choice but to fail the execution. This is the error that happens in RPC call returns "PrefixKeysForbidden" error code #272.
In the context of a full node, when we're at the head of the chain we might have to verify blocks whose parents aren't the best block. This means that we have to maintain a copy of this cache (and thus of the entire trie structure) for every block that can potentially have children (meaning: the finalized block and every single descendant). If finality stalls for some reason, there can be several thousand such blocks, and thus several thousand copies of the trie structure, which can be quite disastrous.
The cache is not adapted to child tries support (Implement support for child tries #166).

What the PR does

This PR refactors this system by asking the user of the runtime_host (and thus all the layers on top of it) to return information about the trie structure rather than only about the storage values. In other words, it is the responsibility of the API user of the runtime_host to maintain the trie structure.

The precise API changes are as follows:

The main_trie_calculation_cache has been removed everywhere.
NextKey now provides a key() and prefix() as a list of nibbles rather than a list of bytes. The value that is provided back must also be a list of a nibbles rather than a list of bytes.
NextKey now has a branch_nodes() function. If it returns true, then the API user must return the next trie node (no matter whether it is a branch node or a storage node). If it returns false, then the API user must return the next trie node that contains a storage value (i.e. the behavior before this PR).
The PrefixKeys variant has been removed entirely.
There is now a new MerkleValue variant that requires the API user to provide the Merkle value of a trie node that was previously provided to NextKey. The API user can use resume_unknown in order to indicate that it doesn't know this Merkle value and that it must be recalculated no matter what.

In the future, another change I'd like to make is: once the runtime call is successful, the list of all changes to the trie structure should be provided (including the branch nodes that have been inserted or deleted), rather than just the changes to the storage entries.

The consensus_service is directly impacted by this change. However, it doesn't store the trie structure in the database (as that change would be too big for one PR) but instead dynamically determines what the next branch node is by performing multiple database lookups.

I have considered adding a system to the runtime_host so that it does internally what the consensus_service does. For example, there could be a configuration option called branch_nodes_unknown which, when set to true, would guarantee that NextKey::branch_nodes() is always false. However, I think that this would make a weird API especially w.r.t. the future change I mention where the diff of the trie structure is provided as well on success.

cc @xlc w.r.t. Chopsticks
~~I'm opening this as a draft PR in order to make you aware of this change and maybe gather an opinion.~~ I think that you should be able to update Chopsticks to use the same system as what consensus_service.rs does in this diff, which is to dynamically determine the branch nodes.

~~This PR is currently a draft and it's failing due to a storage root mismatch (in other words there's a bug somewhere).~~

cc #272 #126

xlc · 2023-05-30T22:11:00Z

Thanks for the headsup. Will keep an eye on this.

tomaka · 2023-06-01T16:02:18Z

The state root mismatch is still there.
I would call the code in trie_root_calculator correct, given that the fuzzing test should cover every possible scenario and always passes.

This means that there would be a bug in the code in consensus_service, but I can't find what it could be and writing tests for this is kind of difficult.

tomaka · 2023-06-02T11:34:36Z

The full node now works, but has become insanely slow due to the removal of the cache. Verifying Polkadot block 1 takes 237 seconds on my machine for example. Blocks 2 to 6 take on average 350 seconds each.
This is because each call to Database::block_storage_main_trie_next_key takes around 1ms to 10ms, and we iterate over the entire trie with calls to next_key.

I believe that before this PR this isn't so much a problem because we read all the keys at the same time with a single database call.

Most the slowness might be caused by the fact that SQLite has to re-parse our SQL at every single call. Unfortunately, it's not possible to prepare statements in advance because the library we use has unnecessarily strict lifetime restrictions. cc #632

tomaka · 2023-06-02T12:19:22Z

I'm still going to go ahead with this PR even if it "breaks" the full node, because #272 is more important to fix.
I know that the changes in this PR only make the full node slower because caches have been removed, and that once they're back it should from a theoretical perspective go at a normal speed again.

tomaka added 26 commits May 14, 2023 20:18

WIP

2ef0331

Merge branch 'main' into trie-root-calculator

4a6f3e1

Work in progress

5a6e7dd

Work in progress

c061a1e

Finished trie_root_calculator normally

e261e78

Work in progress

c2eb611

Remove main_trie_root_calculation_cache

4de97ad

Fix compilation

03c121e

Fix warnings

f731479

Add branch_nodes() to the NextKey variants

ed1d734

Fix debug_assert!

3015900

Add basis for tests

4d866fc

Merge branch 'main' into trie-root-calculator

83b5229

Make NextKey produce and accept nibbles

ce59728

Implement in consensus_service

b0e930d

Small comment tweak

4ecf35d

Bugfix consensus service

96a4893

Bugfixes

8ee1ee9

Add test

8cc9a00

Stupidest bugfix

4cf72e5

Another stupid bugfix

250d2dc

More bugfixes

ef9b965

Test rename

aeaf844

Add another test

e3634cd

Small tweaks

32a41f0

Stupid debug_assert! fix

b2ef4f1

tomaka added 3 commits June 1, 2023 11:35

Add failing fuzzing test

7a9a3b3

One bugfix

62be32a

Err, wrong blake2 hash length used

add6b4f

Extend test

ecc4c71

tomaka added 7 commits June 1, 2023 18:11

Fix warnings

25992ff

Remove the PrefixKeys variant entirely

098c8b7

Create a proper state machine for the branch search

b213740

Put the changes in the block authoring as well

b6843c5

Guard against infinite loops

81664d9

Restore block building test

27cc021

Remove the now-useless storage_prefix_keys_ordered functions

faec4cd

tomaka added 6 commits June 2, 2023 16:59

Add doc to branch_search

e128504

Use a proper struct for the children info

b56a35e

Add MerkleValue variants to the runtime calls

eaa87f4

Update full node

930d54c

Update light client

aca0759

Rustfmt

51e23ea

tomaka marked this pull request as ready for review June 3, 2023 09:31

tomaka added 3 commits June 3, 2023 11:36

Docfix

a93f0ae

Fix typo

c4a7101

Propagate call to MerkleValue to the API user

9d81aaa

tomaka added this pull request to the merge queue Jun 3, 2023

Merged via the queue into smol-dot:main with commit 6b1a1c3 Jun 3, 2023

tomaka deleted the trie-root-calculator branch June 3, 2023 11:37

This was referenced Jun 3, 2023

Full node is extremely slow #661

Closed

Implement support for child tries #166

Closed

RPC call returns "PrefixKeysForbidden" error code #272

Closed

Tweak the trie root calculator to not ask for the closest descendant key when not necessary #669

Merged

tomaka mentioned this pull request Jun 18, 2023

Rework the SQLite database to store the entire trie #772

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `runtime_host` operate over the entire trie rather than only over the storage entries of the trie #639

Make `runtime_host` operate over the entire trie rather than only over the storage entries of the trie #639

tomaka commented May 30, 2023 •

edited

Loading

xlc commented May 30, 2023

tomaka commented Jun 1, 2023

tomaka commented Jun 2, 2023 •

edited

Loading

tomaka commented Jun 2, 2023

Make runtime_host operate over the entire trie rather than only over the storage entries of the trie #639

Make runtime_host operate over the entire trie rather than only over the storage entries of the trie #639

Conversation

tomaka commented May 30, 2023 • edited Loading

Context

What the PR does

xlc commented May 30, 2023

tomaka commented Jun 1, 2023

tomaka commented Jun 2, 2023 • edited Loading

tomaka commented Jun 2, 2023

Make `runtime_host` operate over the entire trie rather than only over the storage entries of the trie #639

Make `runtime_host` operate over the entire trie rather than only over the storage entries of the trie #639

tomaka commented May 30, 2023 •

edited

Loading

tomaka commented Jun 2, 2023 •

edited

Loading