Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

Closed
ramtinms opened this issue Dec 11, 2023 · 2 comments
Closed

[Beyond EVM] - Implement an StateDB for Flow EVM #5129

ramtinms opened this issue Dec 11, 2023 · 2 comments
Assignees
Labels
Milestone

Comments

@ramtinms
Copy link
Contributor

ramtinms commented Dec 11, 2023

Problem Definition

The current 3 layer structure of the databases on geth, has several performance issues:

  • it leaves historical data behind and doesn't do pruning
  • trie computation is a big overhead and might not be needed for Flow EVM (given the higher level trie exist for execution state)
  • write orders are not deterministic and could cause issues with SPoCK secret capturing.

Proposed Solution

Following benchmark results and previous conversation on the performance and safety.
It has been decided to implement a new database satisfying the Geth StateDB interface that doesn't have the issues mentioned above.

@ramtinms ramtinms self-assigned this Dec 11, 2023
@ramtinms
Copy link
Contributor Author

ramtinms commented Dec 14, 2023

I initially tried to follow the original design of Geth stateDB (what EVM expects), having a single hot state (with dirty flags) backed with a journal to handle reverts,
This approach upon each update, updates a hot state and writes some revert functions to a journal (depending on some conditions and states). Every time a snapshot is called a pointer to a location of the journal is returned. and when revert is called with that index, it would apply a sequence of revert operations until reaching that index of the journal.
My understanding is the choice of this architecture expects reverts to be rare calls and expects a large number of sequential transactions on the hot state when building a block.

Though I found this approach very fragile, the original implementation on Geth does these journal edits in a conditional fashion, and I'm worried their approach might result in a revert not reconstructing the original state. One concern I have is the parity of revert and apply. For example, if a sequence of these actions happens on the state, the original implementation would misbehave:

  • snapshot(1), add A address to allow list, snapshot(2), add A address to allow list, revert(2)
    This would result in no A in the list, which means the action of revert on the second snapshot impacts the first snapshot as well.

In another hypothetical example, when addBalance is called a subBalance is captured in the journal, but this assumes addBalance and subBalance always cancel each other. However, if there is an underflow or something similar, the subBalance might not reconstruct the original state and might result in discrepancies.

This problem becomes more severe, considering append to the journal is conditional, for example for some operation calls we don't add anything to the journal.

So that's why I started the other way of dealing with snapshots and reverts the same way we do it at FVM.
We hold all the deltas as different views with backoff lookup (view style). This way we don't need to keep a journal and we keep different delta views and revert is just purging the deltas instead of applying another function. This guarantees the original state in case of revet.

This also makes the design very clean and simple. The only catch is the number of delta views grows, it has an impact on the performance, but that seems to be not a concern for us for two reasons:

  • the interface is designed to have the option to have a sequence of snapshots and revert but in practice in the EVM code one level of snapshoting is always used.
  • we are not currently targeting running thousands of transactions in a flow tx context, right now is one or a few EVM calls.

@ramtinms ramtinms changed the title [Beyond EVM] - Implement StateDB [Beyond EVM] - Implement an StateDB for Flow EVM Dec 14, 2023
@ramtinms
Copy link
Contributor Author

Details of the architecture

  • StateDB

    • it implements a type.StateDB interface (gethVM.StateDB interface + some other methods that are missing in the gethVM but is actually used by the VMcode)
    • under the hood, it uses a list of DeltaViews to track deltas between snapshots
    • and a base view acts as the persistent layer of underlying storage
    • Unfortunately gethVM.StateDB methods don't handle errors, and they expect to cache the error and only be returned upon commit. that's what StateDB does as well it also wraps the error with Fatal or non-fatal (stateError) so we can handle them easier later on the emulator
  • DeltaView

    • holds changes and is where most of the ephemeral data is kept (e.g. logs, transient slots, ... )
    • it uses a fallback to a parent view if a value is not found.
    • method behaviours are designed to mimic the behaviour of the Geth.StateDB (so we minimize the diff of behaviour).
  • BaseView

    • is responsible for storing account's metadata, codes and storage slots.
    • under the hood, it uses collections (powered by Atree), one collection to store the account's meta data, one collection to store codes, and one collection for every account slot storage.
    • it also benefits from some internal read caches.
    • in the future archive nodes or any 3rd party app could implement their own Baseview.

Ramtin's scratchpad - Frame 1 (4)

How to review PRs:

  • I recommend looking at the types and then following the order of the PRs
  • The first PR holds the code changes related to the base view implementation
  • The second PR holds the code changes related to the delta view implementation
  • The third PR holds the code changes related to the stateDB
  • And the last PR updates the emulator and EVM package to use the stateDB and removes the old database from the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant