-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block processing time slowdown following trie persistence #16674
Comments
The extra overhead may introduced by the leveldb compaction. Geth will persist a part of state data to the disk following some rules. In other words, geth will accumulate many state data generated in the past several blocks in memory, and batch write a part at a certain moment. When the size of whole database become large, leveldb compaction will appear more and more frequently. And when the compaction burden is heavy, normal database writes will be blocked, which will result in a longer time for the block process. |
OP's intuition is most probably correct on this one. When we flush the cache on mainnet, we push out about 256MB worth of trie data to disk. However, probably a lot of that will be read back in for the next blocks. A good optimization would be to have some form of LRU cache integrated and avoid flushing out everything, rather keep the recently accessed ones in. It's not a trivial thing to implement though, as flushing the data destroys the internal reference counters used by the garbage collector. @rjl493456442 You are also right that compaction might influence it, but if we were to keep some of the flushed data in memory, then compaction would have less of an impact. |
Wonder why I am here, sorry for bother, went too far from real life.
Downgrade. Thank you all, I am leaving net for real jobs. Tired and lost.
Sencirely Serik.
…On Fri, 4 May 2018 14:28 Péter Szilágyi, ***@***.***> wrote:
OP's intuition is most probably correct on this one. When we flush the
cache on mainnet, we push out about 256MB worth of trie data to disk.
However, probably a lot of that will be read back in for the next blocks.
A good optimization would be to have some form of LRU cache integrated and
avoid flushing out everything, rather keep the recently accessed ones in.
It's not a trivial thing to implement though, as flushing the data destroys
the internal reference counters used by the garbage collector.
@rjl493456442 <https://github.com/rjl493456442> You are also right that
compaction might influence it, but if we were to keep some of the flushed
data in memory, then compaction would have less of an impact.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#16674 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/Ac3rNfPH3J3_Aq5v4-rYJeVbWjD7d4F5ks5tvBEigaJpZM4TyD7r>
.
|
I can look at some options for implementing this. @rjl493456442 Is there a way to manually turn compaction on and off? I only see API methods to initiate a compaction. |
@maxgillett Unfortunately, i think it's difficult for us to adjust the leveldb's compaction strategy to avoid the overhead.
For the compaction trigger, although you can change the compaction trigger configuration to postpone the compaction, but can not avoid. Anyway, the overhead of compaction is inevitable for LSM Tree type databases. |
System information
Geth version:
Geth/v1.8.3
OS & Version: Ubuntu 16.04
My block processing time is typically on the order of 250ms, but I have noticed that there are occasional extended periods (can last more than an hour) during which this time increase drastically, sometimes averaging between 2-6 seconds on my machine. This almost always happens immediately following a log message indicating that the state trie in memory has been persisted to disk.
If I understand it correctly, this makes sense as this part of the state trie is no longer located in RAM, and thus slower to retrieve when verifying a block. Is there a way to avoid this slowdown in processing time? Could this be avoided by always making sure that the cache contains a substantial portion of the recent state, and not purging too much of the it when its persisted to disk?
The text was updated successfully, but these errors were encountered: