Add concurrency.iterator_to_async, tests and docs #457

tiangolo · 2019-03-31T14:52:22Z

Add concurrency.iterator_to_async.

This is another utility function similar to run_in_threadpool (and uses it inside).

It takes a normal (non-async) iterator and returns an async generator.

There are some libraries that return a stream object that can be iterated normally but not with async.

Wrapping those objects with iterator_to_async would allow returning them in StreamingResponses.

For example, file-like and similar objects, especially when they don't have a physical path that can be passed to FileResponses.

A notable example is the kinds of objects from cloud storage providers (Minio, Amazon S3, Google Cloud Storage, Azure Storage).

tiangolo · 2019-03-31T14:54:52Z

docs/responses.md

+
+Then you can use it with a `StreamingResponse`.
+
+This is specially useful for synchronous <a href="https://docs.python.org/3/glossary.html#term-file-like-object" target="_blank">file-like</a> or streaming objects, like those provided by cloud storage providers.


@tomchristie This line might be too verbose, let me know if you want me to remove it (or anything else).

tomchristie · 2019-04-02T09:40:47Z

Okay - that's really interesting.

At this point in time I'd like to close it off - we don't use it internallly, and it's not yet sufficiently clear to me exactly what kind of third party libraries I'd expect it to integrate against. I'd be a bit wary of folks not understanding the use-case for it.

Do you have something in particalar that you're intergrating against that you're using this approach with?

tiangolo · 2019-04-02T11:24:58Z

Do you have something in particalar that you're intergrating against that you're using this approach with?

Yes, there are two specific use cases that I'm already using it in:

I'm integrating it with Minio (AWS S3 like), I get an object that I can iterate over, but is not async. Then I want to return directly the stream to the user, without having to save the stream file on disk, all passing just through (RAM) memory (as disk is several times slower than RAM).

I checked with the docs for Google Cloud Storage, AWS S3, and Azure, all of them allow having the stream be saved to a file-like object. Then that file-like object could be returned as a stream, without having to save it on disk. But these file-like objects are standard iterators, not async iterators. I think S3, *Google Cloud Storage, and Azure Storage are probably going to be the main use cases for this functionality.

So, returning one of these iterators (or file-like objects) in a StreamingResponse wouldn't work.

The second use case, is doing computer vision over images and sending them as a stream. But all these computer vision tools work with standard (non-async) streams/iterators, so, I have to convert them to async iterators. In this case, I can't even do the "hack" of saving the file locally to return it later, as it's an infinite stream of frames in memory.

A third use case that I have thought about (but haven't implement in a real project yet):

Storing some binary in a database, maybe a NoSQL, like Couchbase or Mongo. Or even Redis. Then getting that binary data and streaming it directly. As the drivers/SDKs/connections for these tools still tend to be synchronous, it wouldn't be possible to take the stream in memory and return it in a streaming response directly.

Let me suggest/ask/propose. What if I implement a SyncStreamingResponse that uses it and wraps StreamingResponse?

Then, if you prefer, I can just document SyncStreamingResponse and remove iterator_to_async from the docs.

This would also work for returning file-like objects.

About aiofiles, it uses internally open(), so it only works with files on disk with a physical path. There's no way to convert a file-like object (already in memory, in a temporal location, or simply already open()) to an aiofiles async object. So, there's no way to return a FileResponse or StreamResponse with a file-like object without saving it first to disk. So, SyncStreamingResponse would be able to receive those things.

I'm using Starlette's run_in_threadpool in several places, not only inside of FastAPI, but also directly in projects. I see there's no official way to convert an iterator to an async iterator, not in the standard Python library, nor in curio/trio, etc.

I see this iterator_to_async as a "sibling" to run_in_threadpool for cases a bit more complex, so I would love to have it directly in Starlette, I think it could benefit other people/projects apart from FastAPI.

Let me know what you think.

tomchristie · 2019-04-02T12:02:46Z

Okay, so something I'd be more inclinded to accept would be...

Have StreamingResponse use this implemention, to allow it to be able to accept either an async iterator, or a standard sync iterator. (Inspect the object itself to determine what kind it is, and wrap it in iterator_to_async if required.)

(I'd also be interested in if there's any nice ways around we could use something like this and drop aiofiles - since it's also just a threadpool based wrapper, right, but that's probably a different question)

tomchristie · 2019-04-02T12:03:19Z

It makes more sense to me to have it in, if we're also using it within StreamingResponse when required.

tiangolo · 2019-04-02T12:07:35Z

Awesome! I'm on it 😁 🤓

Add concurrency.iterator_to_async, tests and docs

ed9b8fa

tiangolo commented Mar 31, 2019

View reviewed changes

tomchristie closed this Apr 2, 2019

tiangolo mentioned this pull request Apr 2, 2019

Make StreamingResponse compatible with normal iterators #461

Closed

tiangolo mentioned this pull request Apr 11, 2019

Implement file handling independent of aiofiles, add support for file-like objects #480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add concurrency.iterator_to_async, tests and docs #457

Add concurrency.iterator_to_async, tests and docs #457

tiangolo commented Mar 31, 2019

tiangolo Mar 31, 2019

tomchristie commented Apr 2, 2019

tiangolo commented Apr 2, 2019

tomchristie commented Apr 2, 2019

tomchristie commented Apr 2, 2019

tiangolo commented Apr 2, 2019


		Then you can use it with a `StreamingResponse`.

		This is specially useful for synchronous <a href="https://docs.python.org/3/glossary.html#term-file-like-object" target="_blank">file-like</a> or streaming objects, like those provided by cloud storage providers.

Add concurrency.iterator_to_async, tests and docs #457

Add concurrency.iterator_to_async, tests and docs #457

Conversation

tiangolo commented Mar 31, 2019

tiangolo Mar 31, 2019

Choose a reason for hiding this comment

tomchristie commented Apr 2, 2019

tiangolo commented Apr 2, 2019

tomchristie commented Apr 2, 2019

tomchristie commented Apr 2, 2019

tiangolo commented Apr 2, 2019