Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

fix: streaming cat over http api #1760

Merged
merged 2 commits into from
Dec 11, 2018
Merged

fix: streaming cat over http api #1760

merged 2 commits into from
Dec 11, 2018

Conversation

alanshaw
Copy link
Member

@alanshaw alanshaw commented Dec 7, 2018

/api/v0/cat calls ipfs.cat, but ipfs.cat returns a buffer of file output. This changes /api/v0/cat to actually stream output by calling ipfs.catPullStream instead.

It uses pull-pushable in order to catch an initial error in the stream before it starts flowing and instead return a plain JSON response with an appropriate HTTP status.

This isn't ideal as it means there's no backpressure - if the consumer doesn't consume fast enough then data will start to get buffered into memory. However this is significantly better than buffering everything into memory before replying.

This reduces memory usage and time to first byte significantly. e.g.

$ jsipfs object stat QmTtBrkubV9M914kgYrjTDCWTkN4TjnwYDJYb3dwK7ErXR
NumLinks: 22
BlockSize: 1110
LinksSize: 992
DataSize: 118
CumulativeSize: 965374843 # i.e. ~965MB

# I modified jsipfs cat to print out the time it takes before it receives the first byte:
# (note we are talking to the daemon here so are going over the HTTP API)

# Before:
$ jsipfs cat QmTtBrkubV9M914kgYrjTDCWTkN4TjnwYDJYb3dwK7ErXR
time to first chunk: 2961.353ms

# After:
$ jsipfs cat QmTtBrkubV9M914kgYrjTDCWTkN4TjnwYDJYb3dwK7ErXR
time to first chunk: 127.899ms

resolves #1755

`/api/v0/cat` calls `ipfs.cat`, but `ipfs.cat` returns a buffer of file output. This changes `/api/v0/cat` to actually stream output by calling `ipfs.catPullStream` instead.

It uses `pull-pushable` in order to catch an initial error in the stream before it starts flowing and instead return a plain JSON response with an appropriate HTTP status.

This isn't ideal as it means there's no backpressure - if the consumer doesn't consume fast enough then data will start to get buffered into memory. However this is significantly better than buffering _everything_ into memory before replying.

License: MIT
Signed-off-by: Alan Shaw <[email protected]>
@ghost ghost assigned alanshaw Dec 7, 2018
@ghost ghost added the status/in-progress In progress label Dec 7, 2018
@daviddias
Copy link
Member

🤯🤯🤯🤯🤯🤯

const content = file.content
if (!content && file.type === 'dir') {

if (!file.content && file.type === 'dir') {
Copy link
Member

@achingbrain achingbrain Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions might be simpler as:

Suggested change
if (!file.content && file.type === 'dir') {
if (file.type !== 'file') {
return d.abort(new Error('this dag node was not a file')
}
if (!file.content) {
return d.abort(new Error('this dag node had no content')
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that companion depends on the error message "this dag node is a directory" (it is the same for go ipfs) - it is cheaper to call cat and receive this error message than do two calls, one to figure it out. I'd rather not introduce another breaking change for this release (it's really breaky already)!

The interesting bit is a null file.content. When type is file, is that possible? I'll add this test in because that could break the pull pipeline if we resolve the deferred with something that isn't a pull stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I guess once all the errors have codes we can stop depending on message content.

If file.type === 'file' then file.content should always have a value. It's weird that there is a check for that in the first place 🤷‍♂️.

License: MIT
Signed-off-by: Alan Shaw <[email protected]>
@alanshaw
Copy link
Member Author

Merging as test failures seem to be because of unrelated IPNS tests.

@alanshaw alanshaw merged commit 3ded576 into master Dec 11, 2018
@ghost ghost removed the status/in-progress In progress label Dec 11, 2018
@alanshaw alanshaw deleted the fix/streaming-http-cat branch December 11, 2018 23:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/api/v0/cat buffers output
3 participants