Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] OpenAI Realtime API #5672

Open
lloydzhou opened this issue Oct 15, 2024 · 12 comments
Open

[Feature Request] OpenAI Realtime API #5672

lloydzhou opened this issue Oct 15, 2024 · 12 comments
Assignees
Labels
enhancement New feature or request

Comments

@lloydzhou
Copy link
Contributor

🥰 需求描述

https://openai.com/index/introducing-the-realtime-api/

https://platform.openai.com/docs/api-reference/realtime

https://github.com/openai/openai-realtime-console/blob/main/readme/realtime-console-demo.png

image

🧐 解决方案

逻辑

  1. realtime api,使用websocket接入
  2. api本身内置了sessions, conversation等概念,session支持配置modalities, instructions, voice, input_audio_format, output_audio_format, turn_detection, input_audio_transcription, tools等,支持function call
  3. 支持input_audio_buffer.append以及input_audio_buffer.commit方式上传音频,再通过response.create开始生成结果(turn_detection如果开启,可以不用手动调用)
  4. 支持客户端发送conversation.item.create将上下文的内容直接添加到当前的conversation,如果是历史记录,需要设置status=completed
  5. conversation.item.truncate支持打断输入
  6. 通过监听事件response.audio.delta拿到base64 audio data,通过response.text.delta同步拿到文本。
  7. 通过监听事件response.output_item.added拿到是否是function call, 通过监听response.function_call_arguments.delta拿到function call参数。或者直接在response.done里面拿function call相关信息?

交互

  1. 可能会新增OpenAI客户端一样的语音交互页面直接调用realtime api。
  2. 当前的语音交互界面,默认全屏,支持缩小到输入框大小(替换输入框位置)。同时保留语音输入界面以及chat history页面(保留这里,可以支持展示插件执行生成的中间结果等,例如中间调用插件生成一张图,语音是无法直接描述的)。
  3. 语音通话生成的结果(audio buffer)以及同时拿到的文本信息,需要持久化到sessions里面
  4. 语音通话支持选择voice,format,detection模式,tools等(这些按钮需要保留,或者在语音界面重新布局)

讨论

  1. realtime是一个新的model,但是这个model明显和之前的model是不对等的。应该怎么放?
  2. realtime api也支持modalities只填写text,会将语音给屏蔽掉(只是屏蔽语音,但还是支持一整套的通过websocket调用这个模型)。

📝 补充信息

价格
image

@Dogtiti
Copy link
Member

Dogtiti commented Nov 7, 2024

#5786

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


#5786

@Dogtiti
Copy link
Member

Dogtiti commented Nov 11, 2024

设置面板配置参数
image

@coderabbitai coderabbitai bot mentioned this issue Nov 11, 2024
10 tasks
@Dogtiti
Copy link
Member

Dogtiti commented Nov 11, 2024

@kitaev-chen
Copy link

请问这个有免费模型可用吗?还没聊1分钟就0.1$了。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Is there a free model available for this? It’s only 0.1$ after chatting for 1 minute.

@dustookk
Copy link
Contributor

dustookk commented Nov 29, 2024

用了上述配置方式配置了自己的参数

无法启动realtime 麦克风一直为禁用状态,也无法启用

update:

  发现是因为azure的 deployment 前面多加了一个空格,改了以后在电脑上测试成功了。
  
  但是手机上还是没有成功, 抓包并未看到请求azure或者open的wss://协议 

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


image

Use the above configuration method to configure your own parameters.

Realtime cannot be started. The microphone is always disabled and cannot be enabled.

#5825

@qq1456680570
Copy link

希望能自定义实时聊天的接口地址

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I hope to customize the interface address of real-time chat

@jayjayhust
Copy link

希望能自定义实时聊天的接口地址

是的,minimax也开放了realtime的接口,希望能够自定义接口地址,选择不同的realtime api服务:https://platform.minimaxi.com/document/Realtime?key=640e0c9c5f918b4f6c4e2d58

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


I hope to customize the interface address of real-time chat

Yes, minimax has also opened the realtime interface. I hope to be able to customize the interface address and choose different realtime api services: https://platform.minimaxi.com/document/Realtime?key=640e0c9c5f918b4f6c4e2d58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants