Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve handling of UnfinishedBlocks #17247

Merged
merged 7 commits into from
Jan 26, 2024
Merged

improve handling of UnfinishedBlocks #17247

merged 7 commits into from
Jan 26, 2024

Conversation

arvidn
Copy link
Contributor

@arvidn arvidn commented Jan 9, 2024

This PR is best reviewed one commit at a time.

Background

UnfinishedBlock

When a farmer find a valid proof-of-space, it builds a block and distributes it to the network (an UnfinishedBlock). This block only has a valid proof-of-space, in order to become complete and part of the blockchain (a FullBlock), it also needs to be infused with proof-of-time, by a timelord.

When propagating an UnfinishedBlock, full nodes notifies all of its peers about the new block and peers that haven't already seen it will request it. Full nodes don't request the same block multiple times. i.e. If a block has already been seen (and already propagated), we don't request it.

UnfinishedBlocks are (currently) identified by their proof-of-space. Specifically, the reward block hash.

FullNodeStore

When the full node receives an UnfinishedBlock, it's validated, stored in the full node store and then peers are notified of the new unfinished block.

The full node store currently treats the reward block hash as a unique identifier for UnfinishedBlocks, and only stores a single block for a given reward block hash.

Protocol

When we notify our peers about a new UnfinishedBlock, we send a NewUnfinishdeBlock message, which looks like this:

class NewUnfinishedBlock(Streamable):
    unfinished_reward_hash: bytes32

If the peer has not seen an UnfinishedBlock with this reward block hash, it requests it, by sending a RequestUnfinishedBlock message, which looks like this:

class RequestUnfinishedBlock(Streamable):
    unfinished_reward_hash: bytes32

Just like with the full node store, UnfinishedBlocks are identified by their reward block hash.

Scenario

When a farmer runs multiple (redundant) full node and farmer instances, harvesting the same plots, it's possible to produce multiple different UnfinishedBlocks with the same proof-of-space. Full nodes may have slightly different mempool, and therefore produce slightly different blocks. For example, consider a new transaction being propagated throughout the network that has reached one node but not the other when the block is created.

Since the network consider these blocks the same (because the reward block hash, i.e. proof-of-space) is the same, they will each propagate to nodes and which one is seen by which node is somewhat random. In a system with many redundant timelords, different timelords are likely to infuse different versions of the unfinished blocks. The timelords will not even be aware of the existence of the other variant, because it's never propagated in that part of the network.

These blocks, once infused, will form competing chains, eventually resolved by one side of the network re-orging to the other chain.

Purpose

The purpose of this change is to improve the determinism of which block is to become the next block in the chain by making the network propagate all variants of the UnfinishedBlocks. This means the timelords will have access to all and can pick which one to infuse deterministically.

Technically, the timelords are only infusing proofs-of-space, and it's the full node that the timelord is connected to that makes the decision of which variant of the UnfinishedBlock to be infused.

This change, once popular in the network, is believed to reduce the risk of short-lived competing chains and reorgs.

Changes

Propagating all variants of UnfinishedBlocks requires changes to:

  • The full node store, to allow storing multiple UnfinishedBlocks under the same reward block hash
  • The network protocol, to identify UnfinishedBlocks not just by their reward block hash, but also their foliage transaction block hash
  • some rule to, deterministically, pick the variant of the UnfinishedBlock to infuse

The commits in this PR make the above changes in that order.

Double requesting blocks

Currently, when we learn about an unfinished block we want to request, we record making the request in the full node store, by the reward block hash (requesting_unfinished_blocks). If some other peer tell us about an unfinished block with the same hash, we won't request it as long as we have an outstanding request already.

This logic needs some changes when adding requests for unfinished blocks, not just by their reward block hash, but also foliage hash.

One commit add a second collection to the full node store, (requesting_unfinished_blocks2), recording both block reward hash and foliage hash of the outstanding requests. This means we can have multiple outstanding requests for blocks with the same reward block hash, as long as they have different foliage hashes.

There is one issue with this. If an old peer tells us about a block with reward block hash A (and we make a request). Then a new peer tells us about a block with reward block hash A and foliage hash B. We can't know that these are the same, and we request block (A, B) from the new peer.

However, if we first learn about the block from a new peer and request (A, B), if we then learn about a block with hash A (from an old peer), we assume it's the same block.

This will cause some duplicate unfinished blocks to be downloaded. At least in the beginning of the network upgrade period.

Full node store

The underlying data structure of the full node store for unfinished blocks is this:

unfinished_blocks: Dict[bytes32, Tuple[uint32, UnfinishedBlock, PreValidationResult]]

The value type of this dictionary was changed to include another dictionary, keyed by the foliage transaction block hash.

unfinished_blocks: Dict[bytes32, Dict[Optional[bytes32], UnfinishedBlockEntry]]

Along with new methods to query an UnfinishedBlock by both its reward block hash and its foliage transaction block hash. (get_unfinished_block2() and get_unfinished_block_result2()).

Network protocol

We need to remain backwards compatibility with the network protocol, so we can't change the existing messages. We need to add two new messages: NewUnfinishedBlock2 and RequestUnfinishedBlock2. These messages are like the existing ones except they both add an optional field for the foliage transaction block hash of the UnfinishedBlock.

When we propagate a new UnfinishedBlock to peers, we now have to send the new protocol message to new peers, and the old message to old peers. To know whether a peer supports the new messages, we bump the protocol version from 0.35.0 to 0.36.0.

Any peers that advertise protocol version 0.36.0 will receive a NewUnfinishedBlock2 message, and older peers will receive the current NewUnfinishedBlock message. This required adding a new variant of send_to_all() on the Service class. Specifically one where you can pass a predicate function to filter peers. This lets you send the new message only to new peers and vice versa.

Anti-DOS

In order to prevent a malicious farmer (that finds a valid proof-of-space) from flooding the network with different variants of UnfinishedBlocks, nodes will only forward 3 variants of a block (given the same proof-of-space). This is configurable in config.yaml as max_duplicate_unfinished_blocks. This limit is believed to be appropriate to support non-malicious cases of farmers running redundant nodes.

The Full Node Store exposes the number of blocks it stores under each reward block hash. When this number is too high, we stop propagating notifications of more UnfinishedBlocks.

Deterministic block selection

The rule for picking the UnfinishedBlock to infuse is simply the one with the lowest foliage transaction block hash. If the block is not a transaction block, it doesn't have such hash, and there is only one block to chose from.

This rule is implemented and tested in the last commit of this PR. It only affects the full node store, since full nodes are the ones converting UnfinishedBlocks to FullBlocks in response to the timelord. The timelord, and the timelord protocol, is agnostic to foliage. Only proofs of space are infused, and it's up to the full node to pick the block to infuse.

Therefore, the full node store returns the "best" unfinished block whenever it's asked for one just based on the reward block hash.

Current Behavior:

Only a single variant of UnfinishedBlocks are propagated by nodes, possibly splitting the networks into sections that see different variants.

New Behavior:

up to 3 variants of UnfinishedBlocks are propagated by nodes, making it very likely that all nodes (and all timelords) see all variants.

@arvidn arvidn added the Added Required label for PR that categorizes merge commit message as "Added" for changelog label Jan 9, 2024
@arvidn arvidn force-pushed the full-node-store branch 2 times, most recently from 6f6f468 to c5244a2 Compare January 10, 2024 13:18
@github-actions github-actions bot added the merge_conflict Branch has conflicts that prevent merge to main label Jan 10, 2024

This comment was marked as outdated.

@github-actions github-actions bot removed the merge_conflict Branch has conflicts that prevent merge to main label Jan 10, 2024

This comment was marked as outdated.

Copy link

coveralls-official bot commented Jan 12, 2024

Pull Request Test Coverage Report for Build 7611459731

Warning: This coverage report may be inaccurate.

We've detected an issue with your CI configuration that might affect the accuracy of this pull request's coverage report.
To ensure accuracy in future PRs, please see these guidelines.
A quick fix for this PR: rebase it; your next report should be accurate.

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.05%) to 90.602%

Totals Coverage Status
Change from base Build 7601805424: -0.05%
Covered Lines: 95087
Relevant Lines: 104904

💛 - Coveralls

@arvidn arvidn force-pushed the full-node-store branch 4 times, most recently from 8ef5606 to 8cbd204 Compare January 17, 2024 11:18
@arvidn arvidn changed the title UnfinishedBlocks with the same reward hash but different foliage improve handling of UnfinishedBlocks Jan 17, 2024
@arvidn arvidn marked this pull request as ready for review January 17, 2024 14:38
@arvidn arvidn requested a review from a team as a code owner January 17, 2024 14:38
@arvidn
Copy link
Contributor Author

arvidn commented Jan 17, 2024

I think it's fine to let through the missing coverage. there are two cases not covered by tests:

  1. receiving a new_unfinished_block2() message when a node is syncing
  2. a node receiving a request for an unfinished block that the node doesn't have, request_unfinished_block2()

Copy link
Contributor

@Rigidity Rigidity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take these comments with a grain of salt, way out of my depth here so they could be completely wrong. But I've taken a look anyways, and worst case I learn something new 😄

@arvidn arvidn force-pushed the full-node-store branch 2 times, most recently from 129d3fb to ed112f5 Compare January 19, 2024 08:38
@arvidn arvidn requested a review from Rigidity January 19, 2024 14:21
Copy link
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added the merge_conflict Branch has conflicts that prevent merge to main label Jan 19, 2024
@github-actions github-actions bot removed the merge_conflict Branch has conflicts that prevent merge to main label Jan 20, 2024
Copy link
Contributor

Conflicts have been resolved. A maintainer will review the pull request shortly.

@arvidn
Copy link
Contributor Author

arvidn commented Jan 23, 2024

I think it's reasonable to not add coverage to the case where we're syncing in this PR.

Copy link
Contributor

@Rigidity Rigidity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

…partial hash; return the highest ranking (lowest foliage tx block hash)
Copy link
Contributor

File Coverage Missing Lines
chia/full_node/full_node_api.py 91.2% lines 477, 486, 489
Total Missing Coverage
591 lines 3 lines 99%

@emlowe
Copy link
Contributor

emlowe commented Jan 26, 2024

Is there any material difference in a return of (None, 0, x) and (None, 1 or more, x) in the get_unfinished_block2 return?
That is if the first entry in the Tuple is None, does any of the rest matter?
Is Optional[Tuple[UnfinishedBlock, int, bool]] an alternative?

@arvidn
Copy link
Contributor Author

arvidn commented Jan 26, 2024

@emlowe the count return value does matter when the block is None. The block being None means we don't have it already, which normally would mean we're interested. But if count is too high, it means we have too many alternatives for this reward block hash and we should stop downloading and forwarding any more candidates.

Here's the main case: https://github.com/Chia-Network/chia-blockchain/pull/17247/files#r1464906269

@arvidn arvidn added ready_to_merge Submitter and reviewers think this is ready and removed coverage-diff labels Jan 26, 2024
Copy link
Contributor

@wjblanke wjblanke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aok

@Starttoaster Starttoaster merged commit bdfffca into main Jan 26, 2024
266 of 267 checks passed
@Starttoaster Starttoaster deleted the full-node-store branch January 26, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Added Required label for PR that categorizes merge commit message as "Added" for changelog ready_to_merge Submitter and reviewers think this is ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants