Block processing time slowdown following trie persistence #16674

maxgillett · 2018-05-04T03:15:52Z

System information

Geth version: Geth/v1.8.3
OS & Version: Ubuntu 16.04

My block processing time is typically on the order of 250ms, but I have noticed that there are occasional extended periods (can last more than an hour) during which this time increase drastically, sometimes averaging between 2-6 seconds on my machine. This almost always happens immediately following a log message indicating that the state trie in memory has been persisted to disk.

If I understand it correctly, this makes sense as this part of the state trie is no longer located in RAM, and thus slower to retrieve when verifying a block. Is there a way to avoid this slowdown in processing time? Could this be avoided by always making sure that the cache contains a substantial portion of the recent state, and not purging too much of the it when its persisted to disk?

The text was updated successfully, but these errors were encountered:

rjl493456442 · 2018-05-04T05:45:36Z

The extra overhead may introduced by the leveldb compaction.

Geth will persist a part of state data to the disk following some rules. In other words, geth will accumulate many state data generated in the past several blocks in memory, and batch write a part at a certain moment.

When the size of whole database become large, leveldb compaction will appear more and more frequently. And when the compaction burden is heavy, normal database writes will be blocked, which will result in a longer time for the block process.

karalabe · 2018-05-04T08:27:38Z

OP's intuition is most probably correct on this one. When we flush the cache on mainnet, we push out about 256MB worth of trie data to disk. However, probably a lot of that will be read back in for the next blocks.

A good optimization would be to have some form of LRU cache integrated and avoid flushing out everything, rather keep the recently accessed ones in. It's not a trivial thing to implement though, as flushing the data destroys the internal reference counters used by the garbage collector.

@rjl493456442 You are also right that compaction might influence it, but if we were to keep some of the flushed data in memory, then compaction would have less of an impact.

ghost · 2018-05-04T09:09:27Z

Wonder why I am here, sorry for bother, went too far from real life. Downgrade. Thank you all, I am leaving net for real jobs. Tired and lost. Sencirely Serik.

…

On Fri, 4 May 2018 14:28 Péter Szilágyi, ***@***.***> wrote: OP's intuition is most probably correct on this one. When we flush the cache on mainnet, we push out about 256MB worth of trie data to disk. However, probably a lot of that will be read back in for the next blocks. A good optimization would be to have some form of LRU cache integrated and avoid flushing out everything, rather keep the recently accessed ones in. It's not a trivial thing to implement though, as flushing the data destroys the internal reference counters used by the garbage collector. @rjl493456442 <https://github.com/rjl493456442> You are also right that compaction might influence it, but if we were to keep some of the flushed data in memory, then compaction would have less of an impact. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#16674 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac3rNfPH3J3_Aq5v4-rYJeVbWjD7d4F5ks5tvBEigaJpZM4TyD7r> .

maxgillett · 2018-05-05T04:32:56Z

I can look at some options for implementing this.

@rjl493456442 Is there a way to manually turn compaction on and off? I only see API methods to initiate a compaction.

rjl493456442 · 2018-05-06T08:43:36Z

@maxgillett Unfortunately, i think it's difficult for us to adjust the leveldb's compaction strategy to avoid the overhead.
The leveldb for compaction is due to the following reasons：

Remove redundant data. Since leveldb is a typical LSM Tree implementation, so it will save all versions for an entry. To avoid disk waste, it will clean redundant data during the compaction.
Balance reading and writing speed difference. For leveldb, writing a data entry is really fast, it only evolve one O(log(n)) memory insert operation and ordered file writing. While for reading operation, the cost is much higher, especially when the speed of writing and the amount of data are large. So leveldb will balance these two operations by merging the level0 files and
slowing down or even pausing write operation according the situation of compaction.

For the compaction trigger, although you can change the compaction trigger configuration to postpone the compaction, but can not avoid.

Anyway, the overhead of compaction is inevitable for LSM Tree type databases.

karalabe mentioned this issue May 29, 2018

core, eth, trie: streaming GC for the trie cache #16810

Merged

karalabe closed this as completed in #16810 Jun 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block processing time slowdown following trie persistence #16674

Block processing time slowdown following trie persistence #16674

maxgillett commented May 4, 2018

rjl493456442 commented May 4, 2018

karalabe commented May 4, 2018

ghost commented May 4, 2018 via email

maxgillett commented May 5, 2018

rjl493456442 commented May 6, 2018

Block processing time slowdown following trie persistence #16674

Block processing time slowdown following trie persistence #16674

Comments

maxgillett commented May 4, 2018

System information

rjl493456442 commented May 4, 2018

karalabe commented May 4, 2018

ghost commented May 4, 2018 via email

maxgillett commented May 5, 2018

rjl493456442 commented May 6, 2018