Support text-to-speech (TTS) streaming functionality #177

bqm · 2024-01-08T19:50:01Z

Problem

The current speech(...) and transcribe(...) functions as part of the Audio implementation do not support a streaming mode.

This is particularly useful for any real-time application to simulate interactivity.

Proposal

Implement a speech_stream and transcribe_stream mimicking the create_stream functionality.

Is someone already working on this? If not, I can give it a go.

The text was updated successfully, but these errors were encountered:

64bit · 2024-01-08T21:11:47Z

OpenAI doesn't seem to have native streaming APIs for Audio https://platform.openai.com/docs/api-reference/audio

Can you elaborate more on what you mean by streaming functionality for TTS & STT?

bqm · 2024-01-08T21:45:41Z

From what I can tell, this might be a documentation oversight? https://help.openai.com/en/articles/8555505-tts-api mentions:

Is it possible to stream audio?
Yes! By setting stream=True, you can chunk the returned audio file.

And people reports that it is indeed working for them: openai/openai-python#864. It also matches the iOS app conversational behavior if you tried that.

I did a couple of postman requests which gave me a "Transfer-Encoding: chunked" header in the response so that might work out of the box without any specific "stream" key set to true. The rust library would need to expose that if that is true.

I have not tried the transcriptions endpoint so cannot comment on that yet.

I can do some further research and share back unless you are tackling it.

64bit · 2024-01-09T00:51:43Z

Thank you for sharing additional information. The help article suggest that they have officially made it public and so I assume its safe to consider it as not an internal feature flag in their API.

Of course I'm not tackling it - as I learned it from your comment.

Given that, you're welcome to send a PR! stream suffix in method name does sound reasonable as its consistent with other streaming APIs.

In addition, having a working example for this would be very helpful for me to test and other folks to use.

64bit · 2024-01-09T00:58:30Z

Upstream spec was updated after I had released v0.18.0. So perhaps it may have this or may not, but worth a look.

bqm · 2024-01-09T22:14:28Z

Ok great, I should be able to push a PR tomorrow I think (for the speech_stream endpoint initially and can do a second PR for the transcribe endpoint after that?)

64bit · 2024-01-10T04:51:35Z

Sounds like a good plan to me, thank you for offering to contribute!

bqm · 2024-01-10T21:33:00Z

Just added a PR - I had a quick look for the STT use case - I don't think that streaming is actually supported for the transcription endpoint, looking at the OpenAI documentation and openai-python code.

I will drop it from the scope for now and reduce the scope to TTS streaming.

Boscop · 2024-01-14T02:14:30Z

Thanks for adding this, looking forward to using this :)

Do you know by any chance how to set stream to true when using the OpenAI API from TypeScript (either with the openai-node package or otherwise)?
I don't see a stream param here:
https://github.com/openai/openai-openapi/blob/f4a2833d00e92c4b1cb531d437da88a03de997d8/openapi.yaml#L6860-L6894
or here:
https://github.com/openai/openai-node/blob/d67c11b40deee82110d8bef18931ebafbe58bf8a/src/resources/audio/speech.ts#L17-L47

bqm · 2024-01-14T17:00:43Z

@Boscop I am not familiar with openai-node but what I saw in the linked files you provided is consistent with what I observed: there is no stream parameter actually - the /audio/speech is always streamed from OpenAI no matter what.

There is actually an example of that behavior in openai-node in the examples folder: https://github.com/openai/openai-node/blob/master/examples/audio.ts#L19C16-L19C33 (I found that via openai/openai-node#487).

bqm · 2024-01-17T22:23:21Z

Unclear on how to move forward at this point as feedback on pull request cannot be actioned. Marking this as won't fix for now - happy to restart the thread if the conditions change.

64bit · 2024-01-17T23:02:40Z

Thank you for your contributions.

I'll update contribution guidelines with minimum expectations including testing, documentation etc. for basic hygiene - it would fill the missing communication gap in the project.

Its easier to take if it compiles it works philosophy in Rust, but as we found in PR for this, its not always the case.

I'm sorry that you had a poor experience here, and I agree my last comment on PR was not actionable and I'm sorry about that.

If you wish to, you're very welcome to continue, to get your work shipped I gave it another review and left a comment. From the options that you have listed I think (3) most appropriate.

I hope you continue and I'd be happy to see your work get shipped. Thank you again for contributions!

64bit · 2024-01-18T00:12:38Z

Updated guidelines: https://github.com/64bit/async-openai#contributing

This issue falls outside the official docs API Reference and OpenAPI spec, and since you already worked on it before guidelines were in place you're welcome to get it shipped.

Please feel free to reach out if you have any concerns.

64bit added the enhancement New feature or request label Jan 9, 2024

bqm mentioned this issue Jan 10, 2024

Add support for streaming TTS response from the /audio/speech endpoint #179

Closed

bqm changed the title ~~Support text-to-speech (TTS) and speech-to-text (STT) streaming functionality~~ Support text-to-speech (TTS) streaming functionality Jan 10, 2024

bqm closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support text-to-speech (TTS) streaming functionality #177

Support text-to-speech (TTS) streaming functionality #177

bqm commented Jan 8, 2024 •

edited

Loading

64bit commented Jan 8, 2024

bqm commented Jan 8, 2024 •

edited

Loading

64bit commented Jan 9, 2024 •

edited

Loading

64bit commented Jan 9, 2024

bqm commented Jan 9, 2024

64bit commented Jan 10, 2024

bqm commented Jan 10, 2024

Boscop commented Jan 14, 2024 •

edited

Loading

bqm commented Jan 14, 2024

bqm commented Jan 17, 2024

64bit commented Jan 17, 2024

64bit commented Jan 18, 2024

Support text-to-speech (TTS) streaming functionality #177

Support text-to-speech (TTS) streaming functionality #177

Comments

bqm commented Jan 8, 2024 • edited Loading

64bit commented Jan 8, 2024

bqm commented Jan 8, 2024 • edited Loading

64bit commented Jan 9, 2024 • edited Loading

64bit commented Jan 9, 2024

bqm commented Jan 9, 2024

64bit commented Jan 10, 2024

bqm commented Jan 10, 2024

Boscop commented Jan 14, 2024 • edited Loading

bqm commented Jan 14, 2024

bqm commented Jan 17, 2024

64bit commented Jan 17, 2024

64bit commented Jan 18, 2024

bqm commented Jan 8, 2024 •

edited

Loading

bqm commented Jan 8, 2024 •

edited

Loading

64bit commented Jan 9, 2024 •

edited

Loading

Boscop commented Jan 14, 2024 •

edited

Loading