Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitswap sessions don't treat independent gets as independent #4674

Open
Stebalien opened this issue Feb 8, 2018 · 13 comments
Open

Bitswap sessions don't treat independent gets as independent #4674

Stebalien opened this issue Feb 8, 2018 · 13 comments
Labels
kind/bug A bug in existing code (including security flaws) topic/bitswap Topic bitswap

Comments

@Stebalien
Copy link
Member

Scenario:

  1. Construct a session.
  2. Call Get in one thread for key K with ctxA
  3. Call Get in another thread for key K with ctxB.

Canceling ctxA will cause the second Get to block forever.

We need to reference count interest.

(Note: By inspection, I haven't confirmed this)

@Stebalien Stebalien added kind/bug A bug in existing code (including security flaws) topic/bitswap Topic bitswap labels Feb 8, 2018
@Stebalien
Copy link
Member Author

Also, it looks like a timeout on a context passed to GetBlocks will not remove the blocks from the wantlist.

@whyrusleeping
Copy link
Member

Hrm... Youre right. I didnt account for that, I generally didn't put a lot of thought into multiple callers into a single session.

@Stebalien
Copy link
Member Author

Also, it looks like canceling a get won't actually cancel the active wants.

@Stebalien
Copy link
Member Author

So, fixing this and the related bugs is starting to look like a hopeless never-ending pile of wasted work. Given that we're going to be pushing graphsync next quarter, I vote to punt and replace. Thoughts @whyrusleeping?

Lessons learned:

  • Be very careful of how the exchange interacts with the provider system.
  • The exchange should build on the dagservice, not the blockservice.
  • The exchange should not be entangled with any of the storage services (dag, block or otherwise). As far as I can tell, we're currently writing every block to disk twice (no wasted disk space, just IO).
  • Handling the edge case of requesting a block from the network while adding it locally is really tricky. Handle it with care. IMO, the best solution would be to handle this at a higher layer (i.e., not bitswap).
  • Separate session reactors are elegant in theory but add complexity in practice as they aren't truly independent: they want the same blocks, should not be issuing duplicate provider requests, etc.
  • Beware of premature abstractions. Separating out the networking layer sounds good in theory but, due to a lack any motivating reasons to do this, our abstractions have leaked into each other quite a bit.

@whyrusleeping
Copy link
Member

@Stebalien unsure what you mean with "punt and replace". I agree with your lessons learned, but on the duplicate providers bit, I think that what we're missing is a smarter layer between the DHT and bitswap. A properly implemented providers layer should coalesce and cache identical requests, and also handle multiple outgoing provides correctly (and not make multiple DHT provide calls for the same object within a given time span).

I also agree that bitswap should likely not interface with any sort of storage. Except where it has to (for getting blocks to send to peers to fulfill their requests). On the request side, the wantmanager correctly treats multiple independent requests for the same blocks as a single external request. The hard part is figuring out where to make it put that block to the blockstore, the current method of bitswap putting each block to disk as it receives them is bad. It makes the add local vs fetch remote problem hard, and makes storing multiple incoming blocks using a single batch put nearly impossible. The batching problem is becoming significantly more pressing, its causing a bottleneck when fetching data over the network.

@whyrusleeping
Copy link
Member

@Stebalien also, the problem described in the issue comment is not an issue. This is properly handled (as evidenced by the TestMultipleSessions test).

@Stebalien
Copy link
Member Author

Present @whyrusleeping,

also, the problem described in the issue comment is not an issue. This is properly handled (as evidenced by the TestMultipleSessions test).

Meet past @whyrusleeping,

Hrm... Youre right. I didnt account for that, I generally didn't put a lot of thought into multiple callers into a single session.

(this issue is about multiple Get calls on a single session)


unsure what you mean with "punt and replace".

I mean that, given that we will need to do a large refactor anyways to make graphsync work, we might as well write that from scratch (keeping the lessons learned from bitswap in mind) as bitswap has grown quite complicated. Basically, I'm wondering if sinking any more time into bitswap is really worth it at this point. There's a lot of technical debt there.

but on the duplicate providers bit, I think that what we're missing is a smarter layer between the DHT and bitswap

You're absolutely right about that. That'll actually be important when we have multiple exchanges. Actually, in light of multiple exchanges..., having separate sessions kind of makes sense.

The hard part is figuring out where to make it put that block to the blockstore.

So, I'd kind of like to drive this from the top down with a dispatching DAGService that treats the exchange as a slow DAG and the local DAG as a cache (recall ipfs/notes#255).

@whyrusleeping
Copy link
Member

(this issue is about multiple Get calls on a single session)

Ah, right. Multiple gets on the same session. Though to be clear, my comment you quoted was referring to the wantlist issue.

I think that if we were to rewrite everything from scratch, we would end up with roughly what we have here. The code as it exists now has been rewritten from scratch several times already, Aside from some sessions issues, the core bitswap code works as intended. Though I will admit with the current structure its difficult to do some things we want to do, like multiple blocks per message, and batching puts to disk.

@Stebalien
Copy link
Member Author

Though to be clear, my comment you quoted was referring to the wantlist issue.

Got it.

I think that if we were to rewrite everything from scratch, we would end up with roughly what we have here. The code as it exists now has been rewritten from scratch several times already, Aside from some sessions issues, the core bitswap code works as intended.

I may have rushed to conclusions out of frustration when trying to track down the issues in #4658. I'll take another look. I noticed we did a lot of bookkeeping in different places and it looks like a lot of this bookkeeping could be simplified and consolidated (but, I haven't tried writing anything yet so...).

@magik6k
Copy link
Member

magik6k commented May 14, 2019

cc @hannahhoward should this issue be closed given all the recent fixes?

@Stebalien
Copy link
Member Author

It looks like this is still an issue, as far as I can tell.

@hannahhoward
Copy link
Contributor

I am moving this issue to the bitswap backlog since it clearly is a thing :)

@hannahhoward
Copy link
Contributor

And will prioritize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/bitswap Topic bitswap
Projects
None yet
Development

No branches or pull requests

4 participants