header_sync during reorg in excess of 512 headers (problematic behavior) #3615

antiochp · 2021-03-23T15:54:07Z

Related #3605

We need to investigate how header_sync behaves in the presence of a large reorg.
It appears we do not sync to the good chain correctly and peers are banned due to "fraud height".

Interestingly/coincidentally - the "sync MMR" was cleaned up on master in #3556.
Note: sync head and sync MMR are both in use on 5.0.x branch (so cleaning this up was not to blame...)

The sync_head vs. header_head concept never worked 100% reliably and the cleanup was done to simplify this code.

There is a legitimate reason to keep track of sync_head though during header_sync of a large reorg.
I don't think this warrants maintaining a full MMR structure (see PR above for cleanup) but we can simply "rewind and reapply" based on a sync_head in the db.
During header sync we want to request headers based on our local sync_head which may be both different and lower in cumulative difficulty relative to the best known header_head.

So we actually request the wrong headers due to how we construct the "locator" relative to our local header_head.
And then we ban the peer because they don't send us useful headers.

The text was updated successfully, but these errors were encountered:

antiochp · 2021-03-23T17:27:29Z

Hmm - thinking a bit more about this and our current approach and proposed "fix" are still broken at a fundamental level.
This has continued to bug is repeatedly without us finding a real fix for this.

We cannot track a single sync_head (call it whatever we want here) during header_sync as we do not know, for a batch of received headers, which fork we are progressing along.

We could have 3 peers all advertising different cumulative difficulty, each on an independent chain fork.
We can request a batch of headers from an arbitrary peer, via a "locator", itself based on a particular local fork.
But that locator may be mismatched with the chosen peer and they will return headers based on the common header relative to the locator. So we may always repeatedly rewind back a given common header and fail to actually make progress.

peer A is on fork A
peer B is on fork B
we are on chain C locally
for simplicity A. B and C share a single common header at height 1000 before diverging
- 1001 has 3 competing headers 1001_A, 1001_B, 1001_C

Build a locator from 1100_C locally and ask peer A for headers then we will receive -

1001_A
1002_A
1003_A
...
1512_A

We can then update our sync_head locally.
We now ask peer B for another batch of headers.

Build a locator from 1512_A locally and ask peer B for headers then we will receive -

1001_B
1002_B
1003_B
...
1512_B

The locator provides enough information to peer B to start sending headers from the common header.
But we don't make any progress like this.
We get the early headers at the start of each fork.

But we cannot progress along either fork unless we somehow track the "head" of each fork.
This is what we are not doing currently.
We incorrectly assume we only need to track a single head along a single header chain.

The above example uses two forks in parallel competing with a single local chain. but the same issue is also present with a single fork, just a bit more subtle.

antiochp · 2021-03-23T17:47:51Z

Maybe less of an issue than I was thinking though as we do choose a "most work peer" to sync headers from, so this logic should pick the "correct" chain given what we know about connected peers at a given point in time.

We do get into problems if we have competing chains with similar cumulative diff (total work).

antiochp self-assigned this Mar 23, 2021

antiochp mentioned this issue Mar 29, 2021

track sync_head on header_sync sync status #3626

Merged

antiochp closed this as completed in #3626 Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

antiochp commented Mar 23, 2021 •

edited

Loading

antiochp commented Mar 23, 2021 •

edited

Loading

antiochp commented Mar 23, 2021 •

edited

Loading

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

header_sync during reorg in excess of 512 headers (problematic behavior) #3615

Comments

antiochp commented Mar 23, 2021 • edited Loading

antiochp commented Mar 23, 2021 • edited Loading

antiochp commented Mar 23, 2021 • edited Loading

antiochp commented Mar 23, 2021 •

edited

Loading

antiochp commented Mar 23, 2021 •

edited

Loading

antiochp commented Mar 23, 2021 •

edited

Loading