[Beyond EVM] - Implement an StateDB for Flow EVM #5129

ramtinms · 2023-12-11T20:08:36Z

Problem Definition

The current 3 layer structure of the databases on geth, has several performance issues:

it leaves historical data behind and doesn't do pruning
trie computation is a big overhead and might not be needed for Flow EVM (given the higher level trie exist for execution state)
write orders are not deterministic and could cause issues with SPoCK secret capturing.

Proposed Solution

Following benchmark results and previous conversation on the performance and safety.
It has been decided to implement a new database satisfying the Geth StateDB interface that doesn't have the issues mentioned above.

ramtinms · 2023-12-14T20:00:48Z

I initially tried to follow the original design of Geth stateDB (what EVM expects), having a single hot state (with dirty flags) backed with a journal to handle reverts,
This approach upon each update, updates a hot state and writes some revert functions to a journal (depending on some conditions and states). Every time a snapshot is called a pointer to a location of the journal is returned. and when revert is called with that index, it would apply a sequence of revert operations until reaching that index of the journal.
My understanding is the choice of this architecture expects reverts to be rare calls and expects a large number of sequential transactions on the hot state when building a block.

Though I found this approach very fragile, the original implementation on Geth does these journal edits in a conditional fashion, and I'm worried their approach might result in a revert not reconstructing the original state. One concern I have is the parity of revert and apply. For example, if a sequence of these actions happens on the state, the original implementation would misbehave:

snapshot(1), add A address to allow list, snapshot(2), add A address to allow list, revert(2)
This would result in no A in the list, which means the action of revert on the second snapshot impacts the first snapshot as well.

In another hypothetical example, when addBalance is called a subBalance is captured in the journal, but this assumes addBalance and subBalance always cancel each other. However, if there is an underflow or something similar, the subBalance might not reconstruct the original state and might result in discrepancies.

This problem becomes more severe, considering append to the journal is conditional, for example for some operation calls we don't add anything to the journal.

So that's why I started the other way of dealing with snapshots and reverts the same way we do it at FVM.
We hold all the deltas as different views with backoff lookup (view style). This way we don't need to keep a journal and we keep different delta views and revert is just purging the deltas instead of applying another function. This guarantees the original state in case of revet.

This also makes the design very clean and simple. The only catch is the number of delta views grows, it has an impact on the performance, but that seems to be not a concern for us for two reasons:

the interface is designed to have the option to have a sequence of snapshots and revert but in practice in the EVM code one level of snapshoting is always used.
we are not currently targeting running thousands of transactions in a flow tx context, right now is one or a few EVM calls.

ramtinms · 2023-12-20T19:42:26Z

Details of the architecture

StateDB
- it implements a type.StateDB interface (gethVM.StateDB interface + some other methods that are missing in the gethVM but is actually used by the VMcode)
- under the hood, it uses a list of DeltaViews to track deltas between snapshots
- and a base view acts as the persistent layer of underlying storage
- Unfortunately gethVM.StateDB methods don't handle errors, and they expect to cache the error and only be returned upon commit. that's what StateDB does as well it also wraps the error with Fatal or non-fatal (stateError) so we can handle them easier later on the emulator
DeltaView
- holds changes and is where most of the ephemeral data is kept (e.g. logs, transient slots, ... )
- it uses a fallback to a parent view if a value is not found.
- method behaviours are designed to mimic the behaviour of the Geth.StateDB (so we minimize the diff of behaviour).
BaseView
- is responsible for storing account's metadata, codes and storage slots.
- under the hood, it uses collections (powered by Atree), one collection to store the account's meta data, one collection to store codes, and one collection for every account slot storage.
- it also benefits from some internal read caches.
- in the future archive nodes or any 3rd party app could implement their own Baseview.

How to review PRs:

I recommend looking at the types and then following the order of the PRs
The first PR holds the code changes related to the base view implementation
The second PR holds the code changes related to the delta view implementation
The third PR holds the code changes related to the stateDB
And the last PR updates the emulator and EVM package to use the stateDB and removes the old database from the code base.

ramtinms self-assigned this Dec 11, 2023

ramtinms added the Flow EVM label Dec 11, 2023

ramtinms changed the title ~~[Beyond EVM] - Implement StateDB~~ [Beyond EVM] - Implement an StateDB for Flow EVM Dec 14, 2023

This was referenced Dec 19, 2023

[Beyond EVM] performant StateDB implementation - part 1 #5166

Merged

[Beyond EVM] performant StateDB implementation - part 2 #5167

Merged

[Beyond EVM] performant StateDB implementation - part 3 #5168

Merged

ramtinms mentioned this issue Dec 20, 2023

[Beyond EVM] performant StateDB implementation - part 4 #5169

Merged

j1010001 mentioned this issue Jan 4, 2024

[Flow EVM] Patch the the trie storage to not keep historic tries #4963

Closed

ramtinms closed this as completed Jan 8, 2024

ramtinms added this to the Flow-EVM-M1 milestone Feb 20, 2024

j1010001 mentioned this issue Mar 11, 2024

[EPIC] Flow EVM Core Implementation #5241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

ramtinms commented Dec 11, 2023 •

edited

Loading

ramtinms commented Dec 14, 2023 •

edited

Loading

ramtinms commented Dec 20, 2023

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

Comments

ramtinms commented Dec 11, 2023 • edited Loading

Problem Definition

Proposed Solution

ramtinms commented Dec 14, 2023 • edited Loading

ramtinms commented Dec 20, 2023

ramtinms commented Dec 11, 2023 •

edited

Loading

ramtinms commented Dec 14, 2023 •

edited

Loading