Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify LMDB resize #1804

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from
Open

Conversation

Doy-lee
Copy link
Collaborator

@Doy-lee Doy-lee commented Mar 6, 2025

Removing of some unused headers. Enforcing the code defined by ENABLE_AUTO_RESIZE (that macro has been defined for 10 years and isn't ever going away) and also replacing the previous LMDB resize logic with something simpler resembling more like dynamic array growth (grow by 1GiB when 90% full always vs maybe check the last 100 blocks to estimate the size of a batch of blocks being written to the DB or use this other heuristic to grow the DB).

The meat of the changes:

The resize logic before this change was quite intricate with it reading
the previous window of blocks for the average size, then, applying some
fudge-factor ontop of it to try and overshoot the expected number of
bytes required so writing the objects to the DB doesn't fail.

It's actually still possible for this to fail if you had a batch of
blocks that were larger than the remaining space in the DB because
inherently the DB was unable to resize in the middle of a batch.

This is all-nice machinery that we don't necessarily need with some
sensible resizing logic:

1. We always increase the size of the DB by 1 GiB. Before this, we'd
enumerate the size of the payload and check if we overflowed, and then,
do some logic to choose a reasonable size.

Since we now only increase in 1 GiB increments, to ensure this doesn't break
we need to ensure that any blockchain payload cannot exceed 1 GiB. The
current default syncing window is 100 blocks per payload (and increasing
it above this is quite dicey as the node actually can't support much
higher than this without encountering some internal errors) and so for
1 GiB to be reasonable it must not be possible for a 100 block payload
to exceed 1 GiB. It's reasonable to assume that we won't have a window
of 100, 10MiB blocks without hitting some block weight limit or some
fatal exploit to bloat up the blocks.

Even if this were possible however, I've added an additional change which
ensures that in practice writing to the DB will always work as follows.

2. Allow a batch to be paused mid-write to resize the DB before continuing the
batch. This was as simple as stopping the batch before resize and
restarting the batch afterwards.

This now means that the _only_ way to break a write to the DB is to insert a
_single_ object that is larger than 1 GiB. The only individual item even
remotely close to this is the SNL blob. As of current with at most
2 years worth of 10k intervals and the last 60 SNL states, that consumes
~140 MiB which has plenty of leeway to grow.

As a result, assuming these assumptions we can drop the complicated
resize logic for a simpler resizing scheme.

--

Currently letting the node resync the chain from scratch to see if there are any issues. Will update when that is done successfully.

@Doy-lee Doy-lee requested a review from tewinget March 6, 2025 06:19
Doy-lee added 9 commits March 12, 2025 13:28
ENABLE_AUTO_RESIZE was set 10 years ago and hasn't ever changed so we
can remove the macro.
There's nothing different between ARM or x86_64 that I'm aware of that
would necessitate having the default LMDB map size be different (1 <<
30) vs (1 << 31).

Even though we're decreasing the map size on ARM to (1 << 30), the LMDB
is strictly growing so this will only affect newly created DBs.
The resize logic before this change was quite intricate with it reading
the previous window of blocks for the average size, then, applying some
fudge-factor ontop of it to try and overshoot the expected number of
bytes required so writing the objects to the DB doesn't fail.

It's actually still possible for this to fail if you had a batch of
blocks that were larger than the remaining space in the DB because
inherently the DB was unable to resize in the middle of a batch.

This is all-nice machinery that we don't necessarily need with some
sensible resizing logic:

1. We always increase the size of the DB by 1 GiB. Before this, we'd
enumerate the size of the payload and check if we overflowed, and then,
do some logic to choose a reasonable size.

Since we now only increase in 1 GiB increments, to ensure this doesn't break
we need to ensure that any blockchain payload cannot exceed 1 GiB. The
current default syncing window is 100 blocks per payload (and increasing
it above this is quite dicey as the node actually can't support much
higher than this without encountering some internal errors) and so for
1 GiB to be reasonable it must not be possible for a 100 block payload
to exceed 1 GiB. It's reasonable to assume that we won't have a window
of 100, 10MiB blocks without hitting some block weight limit or some
fatal exploit to bloat up the blocks.

Even if this were possible however, I've added an additional change which
ensures that in practice writing to the DB will always work as follows.

2. Allow a batch to be paused mid-write to resize the DB before continuing the
batch. This was as simple as stopping the batch before resize and
restarting the batch afterwards.

This now means that the _only_ way to break a write to the DB is to insert a
_single_ object that is larger than 1 GiB. The only individual item even
remotely close to this is the SNL blob. As of current with at most
2 years worth of 10k intervals and the last 60 SNL states, that consumes
~140 MiB which has plenty of leeway to grow.

As a result, assuming these assumptions we can drop the complicated
resize logic for a simpler resizing scheme.
The previous logic was doing the incorrect operation to round up the
requested map size to a size that is aligned with the page size. LMDB's
set map size API has a note that the size passed in should always be
a multiple of the DB page size.
Resizing on-demand breaks the transactional nature of the batch which
means if you commit a block blob but fail for whatever reason to commit
the TX (maybe IO error, or some other bug) then the DB will have the
block but not the TXs.

This is problematic because before this would have been reverted by
aborting the batch. We can preserve that behaviour by moving the resize
logic back to the start of the batch instead of on-demand when writing
to the DB.

This actually simplifies the code as resizing only happens in one spot.
All writes to the DB are done when receiving a block over the P2P
network so this is the only entrypoint for where writes are permitted
into the DB (in other words, there's no other place that this resize
check needs to happen).
@Doy-lee Doy-lee force-pushed the doyle-simplify-lmdb-resize branch from fb38806 to 3f5e65f Compare March 12, 2025 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants