Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] Support Realtime API - Standalone client #1283

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -499,7 +499,7 @@ const credential = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(credential, scope);

const openai = new AzureOpenAI({ azureADTokenProvider });
const openai = new AzureOpenAI({ azureADTokenProvider, apiVersion: "<The API version, e.g. 2024-10-01-preview>" });

const result = await openai.chat.completions.create({
model: 'gpt-4o',
Expand All @@ -509,6 +509,31 @@ const result = await openai.chat.completions.create({
console.log(result.choices[0]!.message?.content);
```

### Realtime API
This SDK provides real-time streaming capabilities for Azure OpenAI through the `AzureOpenAIRealtimeWS` and `AzureOpenAIRealtimeWebSocket` classes. These classes parallel the `OpenAIRealtimeWS` and `OpenAIRealtimeWebSocket` clients described previously, but they are specifically adapted for Azure OpenAI endpoints.

To utilize the real-time features, begin by creating a fully configured `AzureOpenAI` client and passing it into either `AzureOpenAIRealtimeWS` or `AzureOpenAIRealtimeWebSocket`. For example:

```ts
const cred = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const deploymentName = 'gpt-4o-realtime-preview-1001';
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
const client = new AzureOpenAI({
azureADTokenProvider,
apiVersion: '2024-10-01-preview',
deployment: deploymentName,
});
const rt = new AzureOpenAIRealtimeWS(client);
```

Once the real-time client has been created, open its underlying WebSocket connection by invoking the open method:
```ts
await rt.open();
```

With the connection established, you can then begin sending requests and receiving streaming responses in real time.

### Retries

Certain errors will be automatically retried 2 times by default, with a short exponential backoff.
Expand Down
3 changes: 2 additions & 1 deletion examples/azure.ts → examples/azure/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import { AzureOpenAI } from 'openai';
import { getBearerTokenProvider, DefaultAzureCredential } from '@azure/identity';
import 'dotenv/config';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it'd be better to reduce the amount of deps a user would need to add if they copy-paste this example file.

I don't feel super strongly here though, and dotenv is very standard... maybe @kwhinnery-openai has thoughts?


// Corresponds to your Model deployment within your OpenAI resource, e.g. gpt-4-1106-preview
// Navigate to the Azure OpenAI Studio to deploy a model.
Expand All @@ -13,7 +14,7 @@ const azureADTokenProvider = getBearerTokenProvider(credential, scope);

// Make sure to set AZURE_OPENAI_ENDPOINT with the endpoint of your Azure resource.
// You can find it in the Azure Portal.
const openai = new AzureOpenAI({ azureADTokenProvider });
const openai = new AzureOpenAI({ azureADTokenProvider, apiVersion: '2024-10-01-preview' });

async function main() {
console.log('Non-streaming:');
Expand Down
61 changes: 61 additions & 0 deletions examples/azure/websocket.ts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think these examples should mention realtime in the path somewhere, e.g.

  • examples/realtime/azure/websocket.ts
  • examples/azure/realtime/websocket.ts
  • examples/azure/realtime-websocket.ts
    (applies to the ws.ts example as well)

Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import { AzureOpenAIRealtimeWebSocket } from 'openai/beta/realtime/websocket';
import { AzureOpenAI } from 'openai';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import 'dotenv/config';

async function main() {
const cred = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const deploymentName = 'gpt-4o-realtime-preview-1001';
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
const client = new AzureOpenAI({
azureADTokenProvider,
apiVersion: '2024-10-01-preview',
deployment: deploymentName,
});
const rt = new AzureOpenAIRealtimeWebSocket(client);
await rt.open();

// access the underlying `ws.WebSocket` instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// access the underlying `ws.WebSocket` instance
// access the underlying `WebSocket` instance

rt.socket.addEventListener('open', () => {
console.log('Connection opened!');
rt.send({
type: 'session.update',
session: {
modalities: ['text'],
model: 'gpt-4o-realtime-preview',
},
});

rt.send({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
},
});

rt.send({ type: 'response.create' });
});

rt.on('error', (err) => {
// in a real world scenario this should be logged somewhere as you
// likely want to continue procesing events regardless of any errors
throw err;
});

rt.on('session.created', (event) => {
console.log('session created!', event.session);
console.log();
});

rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
rt.on('response.text.done', () => console.log());

rt.on('response.done', () => rt.close());

rt.socket.addEventListener('close', () => console.log('\nConnection closed!'));
}

main();
68 changes: 68 additions & 0 deletions examples/azure/ws.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import { AzureOpenAIRealtimeWS } from 'openai/beta/realtime/ws';
import { AzureOpenAI } from 'openai';
import 'dotenv/config';

async function main() {
const cred = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const deploymentName = 'gpt-4o-realtime-preview-1001';
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
const client = new AzureOpenAI({
azureADTokenProvider,
apiVersion: '2024-10-01-preview',
deployment: deploymentName,
});
const rt = new AzureOpenAIRealtimeWS(client);
await rt.open();

// access the underlying `ws.WebSocket` instance
rt.socket.on('open', () => {
console.log('Connection opened!');
rt.send({
type: 'session.update',
session: {
modalities: ['text'],
model: 'gpt-4o-realtime-preview',
},
});
rt.send({
type: 'session.update',
session: {
modalities: ['text'],
model: 'gpt-4o-realtime-preview',
},
});

rt.send({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
},
});

rt.send({ type: 'response.create' });
});

rt.on('error', (err) => {
// in a real world scenario this should be logged somewhere as you
// likely want to continue procesing events regardless of any errors
throw err;
});

rt.on('session.created', (event) => {
console.log('session created!', event.session);
console.log();
});

rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
rt.on('response.text.done', () => console.log());

rt.on('response.done', () => rt.close());

rt.socket.on('close', () => console.log('\nConnection closed!'));
}

main();
1 change: 1 addition & 0 deletions examples/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"private": true,
"dependencies": {
"@azure/identity": "^4.2.0",
"dotenv": "^16.4.7",
"express": "^4.18.2",
"next": "^14.1.1",
"openai": "file:..",
Expand Down
105 changes: 104 additions & 1 deletion src/beta/realtime/websocket.ts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwhinnery-openai curious if you have any opinions on the location of the azure classes, is it fine / good for them to be in the same file as the OpenAI ones?

It does feel a bit verbose to have to do import { AzureOpenAIRealtimeWebSocket } from 'openai/beta/realtime/azure/websocket';, but there are a couple advantages I can see to splitting them up is that:

  • very small reduction in bundle size for non-azure users
  • it would be easier to add optional peer dependencies for azure specific things down the line if needed

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { OpenAI } from '../../index';
import { AzureOpenAI, OpenAI } from '../../index';
import { OpenAIError } from '../../error';
import * as Core from '../../core';
import type { RealtimeClientEvent, RealtimeServerEvent } from '../../resources/beta/realtime/realtime';
Expand Down Expand Up @@ -95,3 +95,106 @@ export class OpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
}
}
}

export class AzureOpenAIRealtimeWebSocket extends OpenAIRealtimeEmitter {
socket: _WebSocket;
Copy link
Collaborator

@RobertCraigie RobertCraigie Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: does the Azure API support ephemeral session tokens? Asking as the OpenAI API client requires dangerouslyAllowBrowser: true if you're not using an ephemeral session token


constructor(
private client: AzureOpenAI,
private options: {
deploymentName?: string;
} = {},
) {
super();
}

async open(): Promise<void> {
async function getUrl({
apiVersion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you move this function outside of this class? would be much easier to read

baseURL,
deploymentName,
apiKey,
token,
}: {
baseURL: string;
deploymentName: string;
apiVersion: string;
apiKey: string;
token: string | undefined;
}): Promise<URL> {
const path = '/realtime';
const url = new URL(baseURL + (baseURL.endsWith('/') ? path.slice(1) : path));
url.protocol = 'wss';
url.searchParams.set('api-version', apiVersion);
url.searchParams.set('deployment', deploymentName);
if (apiKey !== '<Missing Key>') {
url.searchParams.set('api-key', apiKey);
} else {
if (token) {
url.searchParams.set('Authorization', `Bearer ${token}`);
} else {
throw new Error('AzureOpenAI is not instantiated correctly. No API key or token provided.');
}
}
return url;
}
const deploymentName = this.client.deploymentName ?? this.options.deploymentName;
if (!deploymentName) {
throw new Error('No deployment name provided');
}
const url = await getUrl({
apiVersion: this.client.apiVersion,
baseURL: this.client.baseURL,
deploymentName,
apiKey: this.client.apiKey,
token: await this.client.getAzureADToken(),
});
// @ts-ignore
this.socket = new WebSocket(url, ['realtime', 'openai-beta.realtime-v1']);

this.socket.addEventListener('message', (websocketEvent: MessageEvent) => {
const event = (() => {
try {
return JSON.parse(websocketEvent.data.toString()) as RealtimeServerEvent;
} catch (err) {
this._onError(null, 'could not parse websocket event', err);
return null;
}
})();

if (event) {
this._emit('event', event);

if (event.type === 'error') {
this._onError(event);
} else {
// @ts-expect-error TS isn't smart enough to get the relationship right here
this._emit(event.type, event);
}
}
});

this.socket.addEventListener('error', (event: any) => {
this._onError(null, event.message, null);
});
}

send(event: RealtimeClientEvent) {
if (!this.socket) {
throw new Error('Socket is not open, call open() first');
}
try {
this.socket.send(JSON.stringify(event));
} catch (err) {
this._onError(null, 'could not send data', err);
}
}

close(props?: { code: number; reason: string }) {
try {
this.socket?.close(props?.code ?? 1000, props?.reason ?? 'OK');
} catch (err) {
this._onError(null, 'could not close the connection', err);
}
}
}
Loading
Loading