Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

Closed
antiochp opened this issue Mar 23, 2021 · 2 comments · Fixed by #3626
Closed

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

antiochp opened this issue Mar 23, 2021 · 2 comments · Fixed by #3626
Assignees

Comments

@antiochp
Copy link
Member

antiochp commented Mar 23, 2021

Related #3605

We need to investigate how header_sync behaves in the presence of a large reorg.
It appears we do not sync to the good chain correctly and peers are banned due to "fraud height".

Interestingly/coincidentally - the "sync MMR" was cleaned up on master in #3556.
Note: sync head and sync MMR are both in use on 5.0.x branch (so cleaning this up was not to blame...)

The sync_head vs. header_head concept never worked 100% reliably and the cleanup was done to simplify this code.

There is a legitimate reason to keep track of sync_head though during header_sync of a large reorg.
I don't think this warrants maintaining a full MMR structure (see PR above for cleanup) but we can simply "rewind and reapply" based on a sync_head in the db.
During header sync we want to request headers based on our local sync_head which may be both different and lower in cumulative difficulty relative to the best known header_head.

So we actually request the wrong headers due to how we construct the "locator" relative to our local header_head.
And then we ban the peer because they don't send us useful headers.

@antiochp antiochp self-assigned this Mar 23, 2021
@antiochp
Copy link
Member Author

antiochp commented Mar 23, 2021

Hmm - thinking a bit more about this and our current approach and proposed "fix" are still broken at a fundamental level.
This has continued to bug is repeatedly without us finding a real fix for this.

We cannot track a single sync_head (call it whatever we want here) during header_sync as we do not know, for a batch of received headers, which fork we are progressing along.

We could have 3 peers all advertising different cumulative difficulty, each on an independent chain fork.
We can request a batch of headers from an arbitrary peer, via a "locator", itself based on a particular local fork.
But that locator may be mismatched with the chosen peer and they will return headers based on the common header relative to the locator. So we may always repeatedly rewind back a given common header and fail to actually make progress.

  • peer A is on fork A
  • peer B is on fork B
  • we are on chain C locally
  • for simplicity A. B and C share a single common header at height 1000 before diverging
    • 1001 has 3 competing headers 1001A, 1001B, 1001C

Build a locator from 1100C locally and ask peer A for headers then we will receive -

  • 1001A
  • 1002A
  • 1003A
  • ...
  • 1512A

We can then update our sync_head locally.
We now ask peer B for another batch of headers.

Build a locator from 1512A locally and ask peer B for headers then we will receive -

  • 1001B
  • 1002B
  • 1003B
  • ...
  • 1512B

The locator provides enough information to peer B to start sending headers from the common header.
But we don't make any progress like this.
We get the early headers at the start of each fork.

But we cannot progress along either fork unless we somehow track the "head" of each fork.
This is what we are not doing currently.
We incorrectly assume we only need to track a single head along a single header chain.

The above example uses two forks in parallel competing with a single local chain. but the same issue is also present with a single fork, just a bit more subtle.

@antiochp
Copy link
Member Author

antiochp commented Mar 23, 2021

Maybe less of an issue than I was thinking though as we do choose a "most work peer" to sync headers from, so this logic should pick the "correct" chain given what we know about connected peers at a given point in time.

We do get into problems if we have competing chains with similar cumulative diff (total work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant