Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concurrency.iterator_to_async, tests and docs #457

Closed
wants to merge 1 commit into from

Conversation

tiangolo
Copy link
Member

Add concurrency.iterator_to_async.

This is another utility function similar to run_in_threadpool (and uses it inside).

It takes a normal (non-async) iterator and returns an async generator.

There are some libraries that return a stream object that can be iterated normally but not with async.

Wrapping those objects with iterator_to_async would allow returning them in StreamingResponses.

For example, file-like and similar objects, especially when they don't have a physical path that can be passed to FileResponses.

A notable example is the kinds of objects from cloud storage providers (Minio, Amazon S3, Google Cloud Storage, Azure Storage).


Then you can use it with a `StreamingResponse`.

This is specially useful for synchronous <a href="https://docs.python.org/3/glossary.html#term-file-like-object" target="_blank">file-like</a> or streaming objects, like those provided by cloud storage providers.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomchristie This line might be too verbose, let me know if you want me to remove it (or anything else).

@tomchristie
Copy link
Member

Okay - that's really interesting.

At this point in time I'd like to close it off - we don't use it internallly, and it's not yet sufficiently clear to me exactly what kind of third party libraries I'd expect it to integrate against. I'd be a bit wary of folks not understanding the use-case for it.

Do you have something in particalar that you're intergrating against that you're using this approach with?

@tomchristie tomchristie closed this Apr 2, 2019
@tiangolo
Copy link
Member Author

tiangolo commented Apr 2, 2019

Do you have something in particalar that you're intergrating against that you're using this approach with?

Yes, there are two specific use cases that I'm already using it in:

I'm integrating it with Minio (AWS S3 like), I get an object that I can iterate over, but is not async. Then I want to return directly the stream to the user, without having to save the stream file on disk, all passing just through (RAM) memory (as disk is several times slower than RAM).

I checked with the docs for Google Cloud Storage, AWS S3, and Azure, all of them allow having the stream be saved to a file-like object. Then that file-like object could be returned as a stream, without having to save it on disk. But these file-like objects are standard iterators, not async iterators. I think S3, *Google Cloud Storage, and Azure Storage are probably going to be the main use cases for this functionality.

So, returning one of these iterators (or file-like objects) in a StreamingResponse wouldn't work.

The second use case, is doing computer vision over images and sending them as a stream. But all these computer vision tools work with standard (non-async) streams/iterators, so, I have to convert them to async iterators. In this case, I can't even do the "hack" of saving the file locally to return it later, as it's an infinite stream of frames in memory.

A third use case that I have thought about (but haven't implement in a real project yet):

Storing some binary in a database, maybe a NoSQL, like Couchbase or Mongo. Or even Redis. Then getting that binary data and streaming it directly. As the drivers/SDKs/connections for these tools still tend to be synchronous, it wouldn't be possible to take the stream in memory and return it in a streaming response directly.


Let me suggest/ask/propose. What if I implement a SyncStreamingResponse that uses it and wraps StreamingResponse?

Then, if you prefer, I can just document SyncStreamingResponse and remove iterator_to_async from the docs.

This would also work for returning file-like objects.

About aiofiles, it uses internally open(), so it only works with files on disk with a physical path. There's no way to convert a file-like object (already in memory, in a temporal location, or simply already open()) to an aiofiles async object. So, there's no way to return a FileResponse or StreamResponse with a file-like object without saving it first to disk. So, SyncStreamingResponse would be able to receive those things.


I'm using Starlette's run_in_threadpool in several places, not only inside of FastAPI, but also directly in projects. I see there's no official way to convert an iterator to an async iterator, not in the standard Python library, nor in curio/trio, etc.

I see this iterator_to_async as a "sibling" to run_in_threadpool for cases a bit more complex, so I would love to have it directly in Starlette, I think it could benefit other people/projects apart from FastAPI.

Let me know what you think.

@tomchristie
Copy link
Member

Okay, so something I'd be more inclinded to accept would be...

Have StreamingResponse use this implemention, to allow it to be able to accept either an async iterator, or a standard sync iterator. (Inspect the object itself to determine what kind it is, and wrap it in iterator_to_async if required.)

(I'd also be interested in if there's any nice ways around we could use something like this and drop aiofiles - since it's also just a threadpool based wrapper, right, but that's probably a different question)

@tomchristie
Copy link
Member

It makes more sense to me to have it in, if we're also using it within StreamingResponse when required.

@tiangolo
Copy link
Member Author

tiangolo commented Apr 2, 2019

Awesome! I'm on it 😁 🤓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants