Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/state: improve update times #27

Closed
wants to merge 33 commits into from

Conversation

holiman
Copy link

@holiman holiman commented Feb 5, 2020

Work in progress to improve the account_update / storage_update.

Background: the chart below shows switching from snapshot-5 to master, and how the account_update and storage_update eats up the gains we get from the account/storage-read operations.

Screenshot_2020-02-05 Dual Geth - Grafana

@holiman holiman requested a review from karalabe as a code owner February 5, 2020 12:14
@holiman
Copy link
Author

holiman commented Feb 5, 2020

Benchmarking: https://geth-bench.ethdevops.io/d/Jpk-Be5Wk/dual-geth?orgId=1&var-exp=mon08&var-master=mon09&var-percentile=50&from=1580917100649&to=now

"master" = snapshot-5
"exp" = this PR

Feb 05 16:38:01 mon08.ethdevops.io geth INFO [02-05|15:38:01.543] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-075e621a-20200205/linux-amd64/go1.13.7
Feb 05 16:38:09 mon09.ethdevops.io geth INFO [02-05|15:38:09.261] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-4eeafc17-20200205/linux-amd64/go1.13.7 

Currently doing a fast-sync, so we can later compare the times for account and storage updates.

@holiman
Copy link
Author

holiman commented Feb 6, 2020

restarted

Feb 06 10:54:33 mon08.ethdevops.io geth INFO [02-06|09:54:33.331] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-4695d38b-20200206/linux-amd64/go1.13.7
Feb 06 10:55:06 mon09.ethdevops.io geth INFO [02-06|09:55:06.622] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-9ad532f9-20200206/linux-amd64/go1.13.7 

@holiman holiman force-pushed the improve_updates branch 2 times, most recently from 636bf14 to 89416e1 Compare February 7, 2020 16:18
@holiman
Copy link
Author

holiman commented Feb 7, 2020

Trying out a wild idea. Updated the machines:

Feb 07 17:34:38 mon08.ethdevops.io geth INFO [02-07|16:34:38.293] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-89416e1a-20200207/linux-amd64/go1.13.7
Feb 07 17:35:31 mon09.ethdevops.io geth INFO [02-06|09:55:06.622] Starting peer-to-peer node instance=Geth/v1.9.11-unstable-9ad532f9-20200206/linux-amd64/go1.13.7 

@holiman
Copy link
Author

holiman commented Feb 7, 2020

The change in 89416e1 is a test which, if it works, should improve account_update (the red part of the charts).
Restarting the snapshot-5-(master) machine, this is the graph:
Screenshot_2020-02-07 Dual Geth - Grafana(1)

Before the interruption, the account_update sits at 6.6ms, and restart bumps it and after a while it resumrs 6.2ms

This pr-machine is at 6.9ms before restart, the restart "bumps" it only to 5.9ms, and later it lands on 4.5ms.

Screenshot_2020-02-07 Dual Geth - Grafana(2)

Next up: I'll try the same trick for storage.

@holiman
Copy link
Author

holiman commented Feb 7, 2020

Here's the gist of it. First of all, scrap the existing precacher.

Background:

When we execute block n , we execute transactions serially.
After tx 1, we call Finalize, which puts the address for each pending object into pending.
Similarly, we for each of those, we call obj.Finalize, which does a similar thing for each modified slot.

Idea

So, during finalize, we also send each addr off onto a channel (non-blocking). And on that channel, there's a little
precacher-thread which loads it from the trie and warms up the cache.

So the snapshotter allows us to parallelize the trie-reads instead of doing it serially. At the end of the block, when we finally reach the update-phase, the trie caches will be prepopulated already, and the update will be quick.

@holiman
Copy link
Author

holiman commented Feb 9, 2020

The docker stop timeout of 60s might be insufficient, one of them got corrupted and I had to resync. After both are in sync again, and snapshot generation complete, snapshot-5 on mon09 is slightly faster overall.

However, here's the account_update/storage_update chart for snapshot-5:
Screenshot_2020-02-09 Dual Geth - Grafana

The corresponding one for this pr is a couple of milliseconds better:
Screenshot_2020-02-09 Dual Geth - Grafana(1)

mon09 seems a bit slower than mon08 in general, so I think the actual performance of this PR is a bit better than the charts show.

The new prefetcher seems ot be loading about 50 entries per second, which is ~700 items per block, which seems a bit on the low side -- we're probably missing a lot of things due to async sends.

Screenshot_2020-02-09 Dual Geth - Grafana(2)

@holiman
Copy link
Author

holiman commented Feb 11, 2020

After sync-up, the mon08 (this PR) is slighly faster than mon09.
Screenshot_2020-02-11 Dual Geth - Grafana

The account_update and storage_updates:
Screenshot_2020-02-11 Dual Geth - Grafana(1)

Also, looking at the clean cache hits, one can see that mon08, in yellow, consistently has a higher hitrate:

Screenshot_2020-02-11 Home - Grafana

@holiman
Copy link
Author

holiman commented Feb 11, 2020

Oh, this is a nice graph aswell:
Screenshot_2020-02-11 Home - Grafana(1)

@holiman holiman changed the title core/state: remove snapdb storage lock + lazily init snapshot storage… core/state: improve update times Feb 13, 2020
@holiman
Copy link
Author

holiman commented Feb 13, 2020

I am now testing to swap the nodes around, so this PR runs on mon09 and snapshot-5 on mon08

@holiman
Copy link
Author

holiman commented Feb 13, 2020

Ok, here are some stats from switching them
Screenshot_2020-02-13 Dual Geth - Grafana

The upper one is first snapshot-5, then this PR was swapped in. The storage/account update time went from 15.5ms down to 11.1.

The lower one was this PR, then snapshot-5 swapped in. The storage account update time went up from 12.9 to 16.4ms.

TLDR; this PR makes account/storage updates 27-39 % faster.

@holiman
Copy link
Author

holiman commented Feb 13, 2020

Interestingly, the dirty cache hit rate is better with this PR. I have no idea why:
Screenshot_2020-02-13 Dual Geth - Grafana(1)

holiman and others added 5 commits February 25, 2020 17:57
This makes eth_call and eth_estimateGas use the zero address
as sender when the "from" parameter is not supplied.

Co-authored-by: Felix Lange <[email protected]>
* les: separate peer into clientPeer and serverPeer

* les: address comments
This was missing because I forgot to wrap it when bind.CallOpts.From
as added.
@karalabe
Copy link
Owner

The measurements of this PR vs without is not comparable because you modified the metric measuring account/storage updates. That change was never upstreamed into the original PR.

meowsbits and others added 26 commits March 10, 2020 10:55
…m#20746)

Includes difficulty tests for EIP2384 aka MuirGlacier.
This is supposed to fix the occasional failures in 
TestCancel* on Travis CI.
eth: fix transaction announce/broadcast goroutine leak
go.mod: update golang.org/x/crypto to fix a Go 1.14 race rejection
This revision of go-duktype fixes the following warning

```
duk_logging.c: In function ‘duk__logger_prototype_log_shared’:
duk_logging.c:184:64: warning: ‘Z’ directive writing 1 byte into a region of size between 0 and 9 [-Wformat-overflow=]
  184 |  sprintf((char *) date_buf, "%04d-%02d-%02dT%02d:%02d:%02d.%03dZ",
      |                                                                ^
In file included from /usr/include/stdio.h:867,
                 from duk_logging.c:5:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:36:10: note: ‘__builtin___sprintf_chk’ output between 25 and 85 bytes into a destination of size 32
   36 |   return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   37 |       __bos (__s), __fmt, __va_arg_pack ());
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Fixes: Condition is always 'false' because 'err' is always 'nil'
This PR fixes issues in TableDatabase.

TableDatabase is a wrapper of underlying ethdb.Database with an additional prefix.
The prefix is applied to all entries it maintains. However when we try to retrieve entries
from it we don't handle the key properly. In theory the prefix should be truncated and
only user key is returned. But we don't do it in some cases, e.g. the iterator and batch
replayer created from it. So this PR is the fix to these issues.
eth: when triggering a sync, check the head header TD, not block
core/rawdb: fix freezer table test error check
internal/web3ext: fix clique console apis to work on missing arguments
These tests occasionally fail on Travis.
…0776)

This just prevents a false negative ERROR warning when, for some unknown
reason, a user attempts to turn on the module rpc even though it's already going
to be on.
@holiman holiman closed this Mar 23, 2020
karalabe pushed a commit that referenced this pull request Oct 8, 2021
eth/downloader: restart the downloader after completion on new head
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.