Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block sync hash traversal perf #3558

Merged
merged 2 commits into from
Feb 15, 2021

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Feb 3, 2021

Resolves #3554.

When we transition from "header sync" to "body sync" we identify missing blocks based on the delta between the chain of headers and the current chain of full blocks. i.e. We sync headers and then we go request the full blocks for each of those headers.
There is some additional complexity where we support a fork between the header chain and the full block chain as one is not always a subset of the other. And to do this we identify the "fork point" which is the last common header between header chain and full block chain before they begin to diverge.


Fork example -

Headers: A -> B -> C' -> D -> E

Full blocks: A -> B -> C

Header sync gave us a chain of headers building on C' while our local node was previously aware of C.
We need to "rewind" to the fork point B immediately prior to C and C' and request missing blocks [C', D, E].


The existing implementation rewinds back from the head of the header chain to identify both the fork point and a set of missing block hashes. This is fine if the delta between header chain and full block chain is small but is not suitable if the header chain must be rewound a significant number of blocks.

It fails badly in the "full archval node sync" scenario where we must rewind from height ~1,500,00 back to 1.


The PR introduces a more efficient approach -

  1. identify height of full block chain
  2. increase this height by the number of blocks we will request in parallel
  3. lookup this "max" header directly on header chain via get_header_by_height()
  4. Iterate back from this "max" header to the fork point.

This places an upper bound on the iteration required as we can lookup headers directly based on height.
This greatly reduces the amount of iteration required when identifying missing blocks to sync.


As part of this rework check_txhashset_needed() was refactored, splitting out get_fork_point() and check_txhashset_needed() into separate fns.
This allowed the implementation to be simplified significantly.

@antiochp antiochp marked this pull request as ready for review February 9, 2021 09:29
@antiochp antiochp force-pushed the block_sync_traversal_perf branch from 6c7700b to a69fd05 Compare February 9, 2021 12:34
@antiochp
Copy link
Member Author

Going to merge this to master.
We need to make sure we test sync thoroughly as part of the next release.

@antiochp antiochp merged commit 7649d36 into mimblewimble:master Feb 15, 2021
@antiochp antiochp deleted the block_sync_traversal_perf branch February 15, 2021 13:48
@antiochp antiochp mentioned this pull request May 6, 2021
bayk added a commit to mwcproject/mwc-node that referenced this pull request Jun 12, 2024
* sync traversal performance improvements
* rework how body_sync determines block hashes to request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue when syncing historical blocks
1 participant