Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

refactor: ⎆ Migration to pull-streams (from Node.js Streams) #403

Closed
27 of 28 tasks
dignifiedquire opened this issue Aug 9, 2016 · 19 comments
Closed
27 of 28 tasks

Comments

@dignifiedquire
Copy link
Member

dignifiedquire commented Aug 9, 2016

As noted in #362 I did the due diligence of confirming that using pull-streams all the way through solves our issues with secio. The result of this can be seen in libp2p/js-libp2p-switch#67. (Warning lots of module linking required to get this running)

The next step is to move from the prototype into a fully tested version for all modules.
Below is a list of the modules that will need work.

ipfs

libp2p

Documentation

  • Add a note about pull-streams to every readme
  • Create a FAQ document collecting all important information on how to use pull-streams, how to convert them, why we use them etc.
@daviddias
Copy link
Member

Thank you for captaining this @dignifiedquire :) :) Let's make sure not to forget documentation :)

@dignifiedquire
Copy link
Member Author

Draft for the readme note.

This module uses pull-streams for all stream based interfaces. You can find details of why we use them in this document. If you need to convert them to node-streams, you can do so easily with pull-stream-to-stream like this:

const toStream = require('pull-stream-to-stream')
const pullStream = getPullStream()
const myNodeStream = toStream(pullStream)

@jbenet
Copy link
Member

jbenet commented Aug 11, 2016

before doing the work of chaning every module, i highly recommend you pick a small set and work with them-- see how they go. You wont understand how an abstraction fits a project until you play with it for a while. and def better to find that out with a small set than after changing everything.

@dignifiedquire
Copy link
Member Author

@jbenet we already did that as outlined at the beginning of this issue. I ported the minimal needed set to get secio in swarm working and see how things are working out, and they worked out great. Afterwards I talked with @diasdavid and we decided to move forward. The result of this is this endeveaour of moving all things into high quality ports.

@RichardLitt
Copy link
Member

Sub. Let me know if I can help with READMEs.

@dignifiedquire
Copy link
Member Author

Thanks @RichardLitt ❤️ will ping you as soon as I have a consistent message ready.

@daviddias
Copy link
Member

daviddias commented Aug 18, 2016

Missing from the list:

@dignifiedquire
Copy link
Member Author

dignifiedquire commented Aug 18, 2016

Missing from the list:

js-multistream

added

@daviddias
Copy link
Member

One more: libp2p/interface-transport#8

@daviddias daviddias changed the title Migration to pull-streams refactor: Migration to pull-streams (from Node.js Streams) Aug 20, 2016
@daviddias daviddias changed the title refactor: Migration to pull-streams (from Node.js Streams) refactor: ⎆ Migration to pull-streams (from Node.js Streams) Aug 28, 2016
@daviddias
Copy link
Member

daviddias commented Aug 28, 2016

Status Update - The enchanted adventure of the missing backpressure mechanism.

The migration to pull-streams is almost complete, we currently have PR's across every single module that required to be updated in order for the migration to be complete. These PR include the code, documentation is still lacking.

There is one major blocker though, although we've been sucessful porting almost every single module to pull-streams, our stream-muxing stress tests are failing (fortunately we had them before this endeavour). We are still investigating the issue, having ruled out a bunch of the possible causes, we now have a strong conviction that the backpressure mechanism is failing when we convert back and forth pull and Node.js streams.

Me and @dignifiedquire have been spending a considerable amount of time debugging, if you would like to help, I've created 3 PR with different experiments and have added a way to test it, so you can observe the failures by yourself. All of these are documented:

tests stress tests mega stress tests
Mode 0 (Current implementation, just Node.js Streams ✔️ ✔️ ✔️
Mode A (Wrap pull streams into Node.js Streams) ✔️ 𝙓 (passes if throttled) 𝙓 (and OOM)
Mode B (Using Node.js Streams for transport) ✔️ ✔️ OOM
Mode C (Mimics real use) ✔️ 𝙓 (passes if throttled) 𝙓 (and OOM)

In addition to the fact that (apparently), backpressure stops working properly when too much wrapping happens, we also see the process running Out Of Memory when tested exhaustively. This can be because the backpressure mechanism doesn't work as it should (keeping a lot of memory floating around) or simply because pull-streams have a high memory fingerprint, making the memory run out just because of having too many instances (less probable).

@dignifiedquire
Copy link
Member Author

I finally had a breakthrough just now! I figured out why the pausing happens. The reason for it is that at some point the pull-stream side actually has too much data, and says "hey slow down" to the spdy-transport stream (the framer to be exact). This is happens via the .write call returning false (https://github.com/pull-stream/pull-stream-to-stream/blob/master/index.js#L36).

After this everything comes to a halt quickly due to the fact that the framer is never resumed.

The only somewhat viable solution I found so far was to add add

if (this.isPaused()) {
  this.resume()
}

to the read call in in the (Scheduler.prototype._read](https://github.com/indutny/spdy-transport/blob/master/lib/spdy-transport/protocol/base/scheduler.js#L91) which is the base class of the framer. The explanation as to why this makes sense, is that the framer is piped to the underlying transport. In the case the underlying transport exerts backpressure onto the framer it calls .pause. When it is ready to continue writing it will call .read, triggering the ._read method on the source, i.e. this is the sign that the socket is ready to receive more data and as such the data flow can be resumed.

I am not certain in any way that this is the correct fix, so I'm very happy to receive more input here. One important step I have not done yet is to create a test case using only node streams to recreate this problem with backpressure on the underlying socket with spdy-transport.

@dignifiedquire
Copy link
Member Author

I've created the above mentioned test and fix here: dignifiedquire/spdy-transport@0e9c847

@daviddias
Copy link
Member

@dignifiedquire where are the PR's for:

- [ ] js-libp2p-ipfs (Only testing)
- [ ]  js-libp2p-ipfs-browser (Only testing)

@daviddias
Copy link
Member

@dignifiedquire are there PR for the ipld-service?

@dignifiedquire
Copy link
Member Author

@diasdavid no there is not, I would suggest changing that when we actually integrate it.

@daviddias
Copy link
Member

daviddias commented Sep 7, 2016

@dignifiedquire we've been directing some groups to check it out, we have to make sure it continues to work

@dignifiedquire
Copy link
Member Author

we've been directing some groups to check it out

We did? Sure if people are actually using it we can update it now. I did not expect that

@daviddias
Copy link
Member

@dignifiedquire yep :)

I've made my CR to all the remaining repos, once all of those points are taken care (very little stuff), I'll do the merge and release \o/ :D

@daviddias
Copy link
Member

Woooo! All the Pull Streams are in :D

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants