Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Don't add blocks to the datastore #571

Merged
merged 3 commits into from
Jul 28, 2022
Merged

Conversation

MichaelMure
Copy link
Contributor

This leave the responsibility and choice to do so to the caller, typically go-blockservice.

This has several benefit:

  • untangle the code
  • allow to use an exchange as pure block retrieval
  • avoid double add

Close ipfs/kubo#7956

bitswap.go Outdated
select {
case <-bs.process.Closing():
return errors.New("bitswap is closed")
default:
}

wanted := blks
// NOTE: There exists the possibility for a race condition here. If a user
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this PR split receiveBlocksFrom into two cases:

  • block added from the node
  • block added from the network

Not only this make the code much clearer, it also segregate the problem mentioned in this comment in a single code path.

@MichaelMure
Copy link
Contributor Author

cc @Jorropo

@Jorropo
Copy link
Contributor

Jorropo commented Jul 8, 2022

@MichaelMure i dont think its quite the right thingy to do here.

Afait bitswap was dedupping concurrent getblocks while what you propose would have a put per getblock.

I think bitswap should do puts but only when the block is receved from the network. Or am I missing something here ?

@MichaelMure
Copy link
Contributor Author

Afait bitswap was dedupping concurrent getblocks while what you propose would have a put per getblock.

I believe this is still happening, the same way it was done before:

The diff in the PR is quite unreadable, I'd suggest looking at the resulting code where it's much clearer.

I think bitswap should do puts but only when the block is receved from the network. Or am I missing something here ?

This PR removes the put done in HasBlock (that were unnecessary) and move the puts for blocks from the network to ipfs/go-blockservice#92. The former for performance reason, the latter for better separation of concerns and composability.

One thing that this PR does change is that Puts are no longer batched together, or rather no longer batched the same way. Before, there would be a PutMany for each bitswap message. Now, BlockService only get a channel of blocks and doesn't know how to batch Put together. At first it looks like it would reduce performance but I believe it makes sense in the long run, as batching with a time window in the BlockService side could result in a better batching, instead of following strictly the network messages. How often is there more than one block in a message?

@MichaelMure
Copy link
Contributor Author

Not sure what's happening with the CI, test are OK for me locally.

Stebalien
Stebalien previously approved these changes Jul 11, 2022
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a strict improvement. Thank you!

bitswap.go Outdated
Comment on lines 480 to 483
// creates a node, then adds it to the dagservice while another goroutine
// is waiting on a GetBlock for that object, they will receive a reference
// to the same node. We should address this soon, but i'm not going to do
// it now as it requires more thought and isn't causing immediate problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. You're copying a comment. This isn't an issue, you can remove it.

bitswap.go Outdated
@@ -469,60 +469,68 @@ func (bs *Bitswap) GetBlocks(ctx context.Context, keys []cid.Cid) (<-chan blocks
func (bs *Bitswap) HasBlock(ctx context.Context, blk blocks.Block) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document that the block must exist in the blockstore.

@Stebalien Stebalien dismissed their stale review July 11, 2022 16:10

I need to re-review this. I missed a few things.

@Stebalien
Copy link
Member

Unfortunately, it looks like this will act like we have a block before that block actually gets written locally. I.e., on receive, we'll:

  1. Announce the block to the network.
  2. Tell the engine that we have it (enqueuing tasks for it).
  3. Then send the block to the blockservice without actually writing it to the blockstore.

This creates a race where we might, in some cases (performance, canceled requests, etc.) fail to send blocks to peer when we have them.

As discussed in person (written here so we're on the same page), a simple solution is to just not tell the network anything until the blockservice calls HasBlock. Basically:

  1. Blockservice asks the exchange for the block.
  2. Exchange does its thing, updates the engine to record receipt of the block, but doesn't actually enqueue any work.
  3. The blockservice then calls HasBlock on the exchange.
  4. The exchange then serves the block.

Basically, this is the same as this PR but taken to the extreme. The receive side is completely disconnected from the send side (except for the bandwidth ledger).

Changes required:

  1. Split Engine.ReceiveFrom into Engine.ReceivedBlocks (from network) and Engine.HaveBlocks (or something like that). The former will update ledgers, etc. The latter will enqueue work for sending blocks to peers.
  2. Make the blockservice put blocks read from the exchange into the blockstore, and notify the exchange by calling Has on read.

Tricky parts:

  • Ideally, bitswap wouldn't announce anything. The exchange should handle sending and deduplicating provider records, but that's a larger change. The new problem is that this change may cause us to repeatedly announce the same block.
  • This is a silent breaking change and will need to be advertised accordingly. I'd consider making breaking changes to the exchange interface at the same time (maybe add a HasBlocks method, and/or rename to NotifyNewBlocks?).

@MichaelMure
Copy link
Contributor Author

  • This is a silent breaking change and will need to be advertised accordingly. I'd consider making breaking changes to the exchange interface at the same time (maybe add a HasBlocks method, and/or rename to NotifyNewBlocks?).

I like that idea. Does that sounds good to you? Anything else?

 // Interface defines the functionality of the IPFS block exchange protocol.
 type Interface interface { // type Exchanger interface
        Fetcher
 
-       // TODO Should callers be concerned with whether the block was made
-       // available on the network?
-       HasBlock(context.Context, blocks.Block) error
+       // NotifyNewBlocks tells the exchange that a new block is available and can be served.
+       NotifyNewBlocks(ctx context.Context, block blocks.Block) error
 
        IsOnline() bool
 
        io.Closer
 }

@Stebalien
Copy link
Member

I'd make it take multiple blocks (or have a second function that takes multiple blocks), but yeah, something like that.

MichaelMure added a commit to MichaelMure/go-ipfs-exchange-interface that referenced this pull request Jul 13, 2022
The goal here is twofold:
- allows for batched handling of multiple blocks at once
- act as a noisy signal for the breaking change that Bitswap is not adding blocks in the blockstore itself anymore (see ipfs/go-bitswap#571)
@MichaelMure
Copy link
Contributor Author

@Stebalien let's make it variadic then: ipfs/go-ipfs-exchange-interface#23

MichaelMure added a commit to MichaelMure/go-ipfs-exchange-offline that referenced this pull request Jul 14, 2022
@MichaelMure MichaelMure force-pushed the no-block-add branch 3 times, most recently from 20d8197 to 2c2a461 Compare July 14, 2022 22:56
@MichaelMure
Copy link
Contributor Author

@Stebalien I think I'm done now, tests are green at least on some platform, but I don't fully know what I'm doing here.

Jorropo pushed a commit to MichaelMure/go-ipfs-exchange-offline that referenced this pull request Jul 27, 2022
Copy link
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not at all confident that this code works completely correctly,
However I think it's good enough to be dog-fooded in Kubo.
LGTM

MichaelMure and others added 3 commits July 28, 2022 04:44
This leave the responsibility and choice to do so to the caller, typically go-blockservice.

This has several benefit:
- untangle the code
- allow to use an exchange as pure block retrieval
- avoid double add

Close ipfs/kubo#7956
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BlockService: AddBlock & AddBlocks write twice to datastore when used with Bitswap
3 participants