Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: release voice agent #357

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

feat: release voice agent #357

wants to merge 3 commits into from

Conversation

naomi-lgbt
Copy link
Collaborator

@naomi-lgbt naomi-lgbt commented Jan 27, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Voice Agent functionality with WebSocket-based communication
    • Added support for configurable audio processing and agent interactions
    • Implemented agent event handling for various communication scenarios
  • Technical Improvements

    • Enhanced client configuration options for agent connections
    • Added robust event management for agent interactions
    • Expanded type definitions for agent-related schemas and events
  • Documentation

    • Added type definitions and interfaces for agent communication
    • Introduced new constants and configuration options for agent interactions

feat: finish it up!

fix: oops

Co-authored-by: Luke Oliff <[email protected]>

feat: use agent directly, no live prop

feat: configurable websocket url

fix: can't import with extensions 😩

wip: browser example

feat: i feel like i'm getting nowhere here

The fact that the websocket doesn't say *anything* back is really
not helpful.

chore: add warning about browser

feat: update for ga

fix: expand example
Copy link
Contributor

coderabbitai bot commented Jan 27, 2025

Walkthrough

This pull request introduces comprehensive support for Deepgram's Voice Agent API across multiple files. The changes include adding a new AgentLiveClient class, defining agent-related types and events, and creating an example implementation. The modifications enable developers to interact with voice agents through a structured WebSocket-based communication model, with support for configuring audio input/output, managing conversation flow, and handling various agent events.

Changes

File Change Summary
examples/node-agent-live/.gitignore Added .gitignore entries for chatlog.txt and output-*.wav files
examples/node-agent-live/index.js Added agent() async function for Deepgram Voice Agent interaction
src/DeepgramClient.ts Added agent() method to access AgentLiveClient
src/lib/constants.ts Added DEFAULT_AGENT_URL and DEFAULT_AGENT_OPTIONS constants
src/lib/enums/AgentEvents.ts Created new AgentEvents enum for agent-related events
src/lib/types/AgentLiveSchema.ts Added multiple types and interfaces for agent configuration
src/packages/AgentLiveClient.ts Implemented AgentLiveClient class for WebSocket-based agent communication

Sequence Diagram

sequenceDiagram
    participant Client
    participant AgentLiveClient
    participant DeepgramAPI
    
    Client->>AgentLiveClient: Initialize with options
    AgentLiveClient->>DeepgramAPI: Establish WebSocket connection
    AgentLiveClient->>DeepgramAPI: Configure agent settings
    DeepgramAPI-->>AgentLiveClient: Connection established
    
    loop Agent Interaction
        Client->>AgentLiveClient: Send audio/instructions
        AgentLiveClient->>DeepgramAPI: Transmit data
        DeepgramAPI-->>AgentLiveClient: Process and respond
        AgentLiveClient-->>Client: Emit events (audio, text, etc.)
    end
Loading

Possibly related PRs

Suggested reviewers

  • lukeocodes
  • jpvajda
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@naomi-lgbt naomi-lgbt requested a review from lukeocodes January 27, 2025 23:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (11)
src/packages/AgentLiveClient.ts (3)

14-19: Ensure error handling for .connect() call.
If .connect() fails or the connection cannot be established, consider adding error handling or retries.


35-36: Use a more specific type for the close event.
Instead of using event: any, consider using event: CloseEvent to improve type safety and clarity.


74-74: Switch to a more robust logging strategy for unknown data.
Replace console.log with an abstraction or logger method to maintain consistent logging across environments.

src/lib/constants.ts (1)

48-53: Consider extracting common headers configuration.

The DEFAULT_AGENT_OPTIONS duplicates the headers configuration from DEFAULT_GLOBAL_OPTIONS. Consider extracting this common configuration to reduce duplication.

+const COMMON_TRANSPORT_OPTIONS = {
+  headers: DEFAULT_HEADERS
+};

 export const DEFAULT_AGENT_OPTIONS: Partial<DefaultNamespaceOptions> = {
-  fetch: { options: { url: DEFAULT_URL, headers: DEFAULT_HEADERS } },
+  fetch: { options: { url: DEFAULT_URL, ...COMMON_TRANSPORT_OPTIONS } },
   websocket: {
-    options: { url: DEFAULT_AGENT_URL, _nodeOnlyHeaders: DEFAULT_HEADERS },
+    options: { url: DEFAULT_AGENT_URL, _nodeOnlyHeaders: COMMON_TRANSPORT_OPTIONS.headers },
   },
 };
src/lib/enums/AgentEvents.ts (2)

9-11: Improve documentation clarity for Audio event.

The comment for the Audio event is unclear and ends with a question mark. Consider providing a more detailed description of the event's purpose and payload structure.

-  /**
-   * Audio event?
-   */
+  /**
+   * Represents an audio data event.
+   * { type: "Audio", data: AudioData }
+   */

73-78: Consider adding error handling guidance for Unhandled event.

The Unhandled event's documentation could benefit from guidance on error handling and logging recommendations.

   /**
-   * Catch all for any other message event
+   * Catch all for any unrecognized message events.
+   * Recommended to log these events for debugging purposes.
+   * { type: string, [key: string]: unknown }
    */
examples/node-agent-live/index.js (1)

11-11: Make the audio file URL configurable.

Consider making the audio file URL configurable through environment variables or command-line arguments for better flexibility.

-  const url = "https://dpgr.am/spacewalk.wav";
+  const url = process.env.AUDIO_FILE_URL || "https://dpgr.am/spacewalk.wav";
examples/browser-agent-live/index.html (4)

1-4: Enhance the warning message visibility and information.

Consider converting the HTML comment into a visible warning banner for better visibility. Also, provide more specific information about browser compatibility.

-<!--
-WARNING: This example is currently non-functional. You may encounter issues
-with browser support during the beta release of the Voice Agent API.
--->
+<div class="warning-banner" style="background: #fff3cd; padding: 1rem; margin: 1rem; border: 1px solid #ffeeba;">
+  <strong>⚠️ Beta Warning:</strong>
+  <p>This example is currently in beta testing phase. Known limitations:</p>
+  <ul>
+    <li>Some features may be non-functional</li>
+    <li>Limited browser compatibility during beta release</li>
+    <li>Tested browsers: [list supported browsers]</li>
+  </ul>
+</div>

11-12: Add accessibility attributes to the button.

The button lacks proper accessibility attributes and meaningful text content.

-    Running test... check the developer console.
-    <button type="button">Start</button>
+    <p>Voice Agent Demo - Check the developer console for detailed logs.</p>
+    <button 
+      type="button" 
+      aria-label="Toggle voice recording"
+      id="recordButton"
+    >Start Recording</button>

24-110: Implement structured logging and error handling.

Replace console logs with a proper logging system and implement comprehensive error handling.

+    // Define a logging utility
+    const logger = {
+      info: (message, data) => {
+        console.log(`[${new Date().toISOString()}] INFO: ${message}`, data || '');
+      },
+      error: (message, error) => {
+        console.error(`[${new Date().toISOString()}] ERROR: ${message}`, error);
+        // Add error reporting service integration here
+      }
+    };
+
     connection.on(AgentEvents.Welcome, () => {
-      console.log("WS Connected");
+      logger.info("WebSocket connection established");
     });

1-147: Add documentation and usage examples.

As this is an example file, it would benefit from additional documentation:

  1. Add a description of the example's purpose
  2. Include setup instructions
  3. Document expected behavior and limitations
  4. Add comments explaining key code sections
+<!--
+  Deepgram Voice Agent Browser Example
+  
+  This example demonstrates how to:
+  - Initialize a Deepgram Voice Agent in a browser environment
+  - Handle real-time audio streaming
+  - Process agent responses and audio output
+  
+  Setup:
+  1. Set your Deepgram API key in the environment
+  2. Ensure you have a compatible browser
+  3. Grant microphone permissions when prompted
+  
+  Limitations:
+  - Requires modern browser with WebAudio support
+  - Network connectivity required
+  - May have high latency on slower connections
+-->
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77627d5 and a40604a.

📒 Files selected for processing (14)
  • examples/browser-agent-live/index.html (1 hunks)
  • examples/node-agent-live/.gitignore (1 hunks)
  • examples/node-agent-live/index.js (1 hunks)
  • src/DeepgramClient.ts (2 hunks)
  • src/lib/constants.ts (2 hunks)
  • src/lib/enums/AgentEvents.ts (1 hunks)
  • src/lib/enums/index.ts (1 hunks)
  • src/lib/types/AgentLiveSchema.ts (1 hunks)
  • src/lib/types/DeepgramClientOptions.ts (1 hunks)
  • src/lib/types/FunctionCallResponse.ts (1 hunks)
  • src/lib/types/index.ts (2 hunks)
  • src/packages/AbstractLiveClient.ts (1 hunks)
  • src/packages/AgentLiveClient.ts (1 hunks)
  • src/packages/index.ts (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • src/lib/enums/index.ts
  • examples/node-agent-live/.gitignore
🧰 Additional context used
🪛 ESLint
examples/node-agent-live/index.js

[error] 1-1: A require() style import is forbidden.

(@typescript-eslint/no-require-imports)


[error] 1-1: 'require' is not defined.

(no-undef)


[error] 2-2: A require() style import is forbidden.

(@typescript-eslint/no-require-imports)


[error] 2-2: 'require' is not defined.

(no-undef)


[error] 3-3: A require() style import is forbidden.

(@typescript-eslint/no-require-imports)


[error] 3-3: 'require' is not defined.

(no-undef)


[error] 4-4: A require() style import is forbidden.

(@typescript-eslint/no-require-imports)


[error] 4-4: 'require' is not defined.

(no-undef)


[error] 6-6: 'process' is not defined.

(no-undef)


[error] 9-9: 'Buffer' is not defined.

(no-undef)


[error] 14-14: 'console' is not defined.

(no-undef)


[error] 44-44: 'console' is not defined.

(no-undef)


[error] 46-46: 'setInterval' is not defined.

(no-undef)


[error] 47-47: 'console' is not defined.

(no-undef)


[error] 55-55: 'console' is not defined.

(no-undef)


[error] 62-62: 'console' is not defined.

(no-undef)


[error] 66-66: 'console' is not defined.

(no-undef)


[error] 67-67: 'process' is not defined.

(no-undef)


[error] 71-71: '__dirname' is not defined.

(no-undef)


[error] 76-76: 'console' is not defined.

(no-undef)


[error] 77-77: 'Buffer' is not defined.

(no-undef)


[error] 82-82: 'console' is not defined.

(no-undef)


[error] 86-86: 'console' is not defined.

(no-undef)


[error] 88-88: 'Buffer' is not defined.

(no-undef)


[error] 89-89: 'Buffer' is not defined.

(no-undef)


[error] 93-93: 'console' is not defined.

(no-undef)


[error] 94-94: 'console' is not defined.

(no-undef)


[error] 95-95: 'console' is not defined.

(no-undef)


[error] 99-99: 'console' is not defined.

(no-undef)


[error] 100-100: '__dirname' is not defined.

(no-undef)


[error] 101-101: 'Buffer' is not defined.

(no-undef)


[error] 106-106: 'console' is not defined.

(no-undef)

🪛 Biome (1.9.4)
src/packages/AgentLiveClient.ts

[error] 125-125: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)


[error] 127-127: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)

🔇 Additional comments (11)
src/lib/types/FunctionCallResponse.ts (1)

1-13: LGTM
The interface succinctly captures the properties needed for function call responses.

src/packages/index.ts (1)

4-4: Exporting AgentLiveClient is appropriate.
Great addition to ensure the class is publicly accessible.

src/lib/types/index.ts (1)

12-12: LGTM! Export addition is well-placed.

The export for FunctionCallResponse is correctly placed and aligns with the module's organization.

src/lib/constants.ts (2)

39-39: LGTM! WebSocket URL constant is well-defined.

The WebSocket URL for the agent API is correctly defined and follows the pattern of other URL constants in the file.


57-57: LGTM! Default options updated correctly.

The DEFAULT_OPTIONS object is correctly updated to include the new agent options.

src/lib/types/DeepgramClientOptions.ts (1)

67-67: LGTM! Interface extension is consistent.

The agent property is correctly added to the DeepgramClientOptions interface, following the established pattern of other namespace options.

src/lib/enums/AgentEvents.ts (1)

1-78: LGTM! Well-structured event enumeration.

The AgentEvents enum is well-organized and thoroughly documented. Each event type includes clear descriptions and payload structures where applicable.

src/DeepgramClient.ts (1)

84-92: LGTM! Well-documented beta feature.

The agent method implementation is clean, follows the existing patterns in the codebase, and is properly documented with beta status.

src/lib/types/AgentLiveSchema.ts (2)

1-184: LGTM! Well-structured type definitions.

The type definitions are comprehensive and well-documented with relevant links to the documentation.


117-117: Verify the empty model string for Groq provider.

The Groq provider has an empty model string. Please verify if this is intentional or if specific model identifiers should be added.

✅ Verification successful

Empty model string is valid for Groq provider type definition

The empty model string in the Groq provider type definition is intentional and follows typical TypeScript patterns where type definitions use empty defaults. The actual model value should be provided during runtime configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Groq model references in the codebase
rg -i "groq.*model"

Length of output: 21


Script:

#!/bin/bash
# Search for any Groq-related code
rg -i "groq" -A 3 -B 3

# Search for model type definitions
ast-grep --pattern 'interface $_Model {
  $$$
}'

# Check configuration files
fd -e json -e yaml -e yml -e ts -e js --exec cat {} \; | rg -i "groq"

Length of output: 1067

src/packages/AbstractLiveClient.ts (1)

248-249: LGTM! Improved null safety.

The updated condition !data?.byteLength provides better null safety than the previous check, and the warning message is more accurate.

src/packages/AgentLiveClient.ts Show resolved Hide resolved
src/packages/AgentLiveClient.ts Outdated Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
@jpvajda jpvajda self-requested a review January 28, 2025 20:53
Copy link
Contributor

@jpvajda jpvajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also seeing errors when running the agent example in the browser, I shared what I saw in Slack.

examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/node-agent-live/index.js Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
examples/browser-agent-live/index.html Outdated Show resolved Hide resolved
@jpvajda
Copy link
Contributor

jpvajda commented Jan 29, 2025

I tested the agent example and it works!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/packages/AgentLiveClient.ts (1)

121-129: ⚠️ Potential issue

Fix mismatch between opts construction and final payload.

The code modifies opts, yet the payload uses ...opts after modifications, which could lead to unexpected behavior.

Apply this diff to ensure the modified properties are included:

   const opts: Record<string, any> = { ...options };
   opts.audio.input["sample_rate"] = options.audio.input?.sampleRate;
-  delete opts.audio.input.sampleRate;
+  opts.audio.input.sampleRate = undefined;
   opts.audio.output["sample_rate"] = options.audio.output?.sampleRate;
-  delete opts.audio.output.sampleRate;
+  opts.audio.output.sampleRate = undefined;
   this.send(JSON.stringify({ type: "SettingsConfiguration", ...opts }));
🧰 Tools
🪛 Biome (1.9.4)

[error] 125-125: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)


[error] 127-127: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)

🧹 Nitpick comments (5)
src/packages/AgentLiveClient.ts (5)

14-19: Consider validating options before connection.

The constructor immediately calls connect({}, endpoint) without validating the options or allowing users to configure the connection first. This could lead to unnecessary reconnections if users need to configure the connection after instantiation.

Consider this alternative approach:

 constructor(options: DeepgramClientOptions, endpoint: string = "/agent") {
   super(options);
   this.baseUrl = options.agent?.websocket?.options?.url ?? DEFAULT_AGENT_URL;
-  this.connect({}, endpoint);
 }

+/**
+ * Initiates the WebSocket connection.
+ */
+public connect(): void {
+  super.connect({}, this.endpoint);
+}

35-37: Use proper type for WebSocket close event.

The event parameter is typed as any. For better type safety, use the proper CloseEvent type.

-      this.conn.onclose = (event: any) => {
+      this.conn.onclose = (event: CloseEvent) => {

74-74: Replace console.log with proper error handling.

Using console.log for unknown data types is not ideal for production code. Consider removing it since you're already emitting an error event.

-      console.log("Received unknown data type", event.data);

94-100: Add type safety to handleTextMessage.

The data: any type could be more specific, and there's no validation of the data structure before accessing data.type.

-  protected handleTextMessage(data: any): void {
+  protected handleTextMessage(data: { type: string; [key: string]: unknown }): void {
     if (data.type in AgentEvents) {

170-177: Consider implementing automatic keepAlive.

The documentation states that keepAlive should be sent every 8 seconds, but this is left to the user to implement. Consider adding an automatic keepAlive mechanism.

+  private keepAliveInterval?: NodeJS.Timeout;
+
+  /**
+   * Start automatic keepAlive messages
+   */
+  private startKeepAlive(): void {
+    this.keepAliveInterval = setInterval(() => this.keepAlive(), 7000); // 7 seconds to be safe
+  }
+
+  /**
+   * Stop automatic keepAlive messages
+   */
+  private stopKeepAlive(): void {
+    if (this.keepAliveInterval) {
+      clearInterval(this.keepAliveInterval);
+      this.keepAliveInterval = undefined;
+    }
+  }
+
   public setupConnection(): void {
     if (this.conn) {
       this.conn.onopen = () => {
+        this.startKeepAlive();
         this.emit(AgentEvents.Open, this);
       };
 
       this.conn.onclose = (event: CloseEvent) => {
+        this.stopKeepAlive();
         this.emit(AgentEvents.Close, event);
       };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a40604a and 9507447.

📒 Files selected for processing (1)
  • src/packages/AgentLiveClient.ts (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
src/packages/AgentLiveClient.ts

[error] 125-125: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)


[error] 127-127: Avoid the delete operator which can impact performance.

Unsafe fix: Use an undefined assignment instead.

(lint/performance/noDelete)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Test / OS ubuntu-latest / Node 18
🔇 Additional comments (1)
src/packages/AgentLiveClient.ts (1)

1-13: LGTM! Clean imports and proper class setup.

The imports are well-organized, and the class properly extends AbstractLiveClient with appropriate namespace declaration.

@naomi-lgbt naomi-lgbt requested a review from jpvajda January 30, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants