-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for Agent API GA #497
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces significant enhancements to the Deepgram SDK, focusing on agent functionality, WebSocket communication, and routing mechanisms. The changes include the addition of new router classes ( Changes
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
I will revert back the FunctionCallingMessage > FunctionCalling to Align with how this works: https://developers.deepgram.com/docs/voice-agent-function-calling-message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (14)
deepgram/clients/agent_router.py (1)
25-28
: Update docstring references to reflect actual methods.
The docstring under "Methods" mentionsread
andasyncread
, but the class provideswebsocket
andasyncwebsocket
. Consider revising it to prevent confusion.- Methods: - read: (Preferred) Returns an Threaded AnalyzeClient... - asyncread: ... + Methods: + websocket: Returns an AgentWebSocketClient instance... + asyncwebsocket: Returns an AsyncAgentWebSocketClient instance...deepgram/clients/__init__.py (1)
324-330
: Consider grouping agent imports.
Currently, multiple imports for agent-related classes and events are spread over separate blocks. While not critical, grouping these imports might improve readability.deepgram/clients/agent/v1/websocket/options.py (4)
66-71
: Clarify usage of__getitem__
.Overriding
__getitem__
to transform internal data on the fly could confuse maintainers and end users, since it returns a modified dictionary. Consider introducing a dedicated method or property to retrieve transformed data, preserving__getitem__
for more conventional indexing behavior.
91-118
: Validate typed properties forFunction
.Currently, the
Function
class uses__getitem__
to instantiateParameters
andHeader
objects. Such dynamic transformations may be error-prone and less transparent. To improve readability and reliability, consider parsing or transforming these fields in the constructor or a separate factory method.
243-281
: Implement meaningful checks atcheck()
.The
check()
method inSettingsConfigurationOptions
silently returnsTrue
. If future deprecations or validations are required, add clear logging or raise exceptions to ensure the method fulfills its purpose.
312-320
: Confirm behavior for injection refusal inInjectAgentMessageOptions
.If a request arrives while the user is speaking, the server replies with
InjectionRefused
. It might be beneficial to offer callback hooks or event handlers for refused injections to provide clearer feedback to the client application.deepgram/clients/agent/v1/websocket/client.py (2)
180-300
: Refactor longstart()
method.The
start()
method is quite lengthy and covers multiple concerns (e.g., microphone initialization, threading, keep-alive, configuration checks). For better maintainability, consider splitting it into smaller helper methods (e.g.,_init_audio_devices()
,_apply_settings()
, etc.).
312-320
: Ensure handlers remain consistent on runtime.The
on()
method appends event handlers to_event_handlers
. Repeatedly registering the same handler could cause duplicates. Consider a policy for unique handler registration or the ability to remove handlers, ensuring consistent event dispatch.deepgram/clients/agent/v1/websocket/async_client.py (5)
70-88
: Clarify synchronous vs. asynchronous naming.Naming
_keep_alive_thread
might be misleading for an asyncioTask
. Consider renaming it to_keep_alive_task
or_keep_alive_coroutine
to reflect its nature accurately and reduce confusion for maintainers.
179-307
: Shorten or reorganizestart()
method.Like the synchronous counterpart,
start()
is performing multiple tasks: configuration, device setup, and WebSocket connection. Breaking it into smaller, composable async functions (_prepare_audio()
,_connect_websocket()
, etc.) can help with readability and testing.
310-317
: Duplicate handler registrations.Just like in
AgentWebSocketClient
, multiple calls toon()
may repeatedly append the same handler in_event_handlers
. If this behavior is unintentional, consider adding registration checks or an API to remove handlers.
543-590
: Consider backoff or retry strategy for keep-alive.The
_keep_alive()
method increments a counter and sends keep-alive messages at fixed intervals. If keep-alive fails repeatedly, consider a strategy to either reconnect or escalate the error, rather than silently closing the connection.
615-673
: Ensure resource cleanup withinfinish()
.In
finish()
, you cancel tasks and stop microphone/speaker usage. If other subsequent cleanup steps are needed (e.g., partial audio flush, leftover callbacks, etc.), ensure that they’re explicitly handled here or in a dedicated method to prevent leaks.deepgram/clients/agent/v1/websocket/response.py (1)
89-98
: Consider adding type field to AgentStartedSpeakingResponse.All other response classes include a
type
field, but this one doesn't. For consistency, consider adding it.@dataclass class AgentStartedSpeakingResponse(BaseResponse): """ The server will send an AgentStartedSpeaking message when it begins streaming an agent audio response to the client for playback. """ + type: str total_latency: float tts_latency: float ttt_latency: float
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.gitignore
(2 hunks)deepgram/__init__.py
(2 hunks)deepgram/audio/microphone/microphone.py
(1 hunks)deepgram/audio/speaker/speaker.py
(3 hunks)deepgram/client.py
(4 hunks)deepgram/clients/__init__.py
(2 hunks)deepgram/clients/agent/__init__.py
(1 hunks)deepgram/clients/agent/client.py
(1 hunks)deepgram/clients/agent/enums.py
(1 hunks)deepgram/clients/agent/v1/__init__.py
(1 hunks)deepgram/clients/agent/v1/websocket/__init__.py
(1 hunks)deepgram/clients/agent/v1/websocket/async_client.py
(1 hunks)deepgram/clients/agent/v1/websocket/client.py
(1 hunks)deepgram/clients/agent/v1/websocket/options.py
(1 hunks)deepgram/clients/agent/v1/websocket/response.py
(1 hunks)deepgram/clients/agent_router.py
(1 hunks)deepgram/clients/listen_router.py
(1 hunks)deepgram/clients/read_router.py
(2 hunks)deepgram/clients/speak_router.py
(1 hunks)examples/agent/async_simple/main.py
(1 hunks)examples/agent/simple/main.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- .gitignore
- deepgram/clients/agent/v1/websocket/init.py
- deepgram/clients/agent/client.py
🔇 Additional comments (26)
deepgram/clients/agent_router.py (1)
33-38
: ValidateDeepgramApiKeyError
handling.
Although the docstring states that aDeepgramApiKeyError
is raised if the API key is missing or invalid, there is no explicit check or raise for this error within the constructor. Confirm or add handling if needed.✅ Verification successful
API key validation is correctly implemented
TheDeepgramApiKeyError
is properly handled in theDeepgramClientOptions
class, which validates the API key before theAgentRouter
constructor is called. The validation checks both the constructor parameter and environment variable, raising the error if both are missing. No additional handling is needed in the constructor.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Confirm if DeepgramApiKeyError is handled elsewhere in the codebase. rg -A 3 'DeepgramApiKeyError'Length of output: 2396
Script:
#!/bin/bash # Check the implementation of DeepgramClientOptions ast-grep --pattern 'class DeepgramClientOptions { $$$ }'Length of output: 60
Script:
#!/bin/bash # Check the implementation of DeepgramClientOptions with Python syntax ast-grep --pattern 'class DeepgramClientOptions: $$$'Length of output: 7168
Script:
#!/bin/bash # Search for api_key property implementation rg -A 10 '@property.*api_key' deepgram/options.pyLength of output: 51
Script:
#!/bin/bash # Search for where DeepgramApiKeyError is raised rg -B 5 -A 5 'raise DeepgramApiKeyError' deepgram/options.pyLength of output: 472
deepgram/clients/__init__.py (1)
51-54
: No issues with router imports.
These router class imports align with the new modular approach. No further concerns noted.deepgram/__init__.py (2)
37-37
: No issues with router imports.
Importing the newAgentRouter
is consistent with the expanded architecture.
327-356
: Verify references to the revertedFunctionCallingMessage
.
You mentioned reverting “FunctionCallingMessage” to “FunctionCalling.” Ensure there are no leftover references to “FunctionCallingMessage” outside of these changes.✅ Verification successful
All references to
FunctionCallingMessage
have been properly updated
The codebase consistently uses the newFunctionCalling
name with no remaining references to the oldFunctionCallingMessage
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Searching for any leftover references to FunctionCallingMessage in the repo rg -A 3 'FunctionCallingMessage'Length of output: 32
Script:
#!/bin/bash # Multiple search strategies to find any possible references echo "=== Case-insensitive search ===" rg -i "functioncallingmessage" echo -e "\n=== Partial match search ===" rg -i "calling.*message" echo -e "\n=== AST-based search for imports and type references ===" ast-grep --pattern 'import { $$$FunctionCallingMessage$$$ }' ast-grep --pattern 'from "$_" import { $$$FunctionCallingMessage$$$ }'Length of output: 882
deepgram/clients/agent/v1/websocket/options.py (1)
20-28
: Confirm default model forListen
.The
Listen
class setsmodel="nova-2"
by default, which might not always suit all use cases. Consider verifying with product owners or user documentation to ensure this default aligns with common usage.deepgram/clients/agent/v1/websocket/client.py (2)
70-86
: Check concurrency impact on class attributes.Multiple mutable fields like
_event_handlers
,_addons
, and_kwargs
are updated without thread synchronization. If these updates can occur concurrently (e.g., from different threads or user code), introduce locks or concurrency-safe data structures to avoid data races.
612-657
: Coordinate finishing between threads.In
finish()
, you join_keep_alive_thread
and_listen_thread
. If these threads trigger events that rely on_event_handlers
, ensure no references remain to data that was potentially torn down (like_microphone
or_speaker
), or you could encounter race conditions or exceptions.deepgram/clients/agent/__init__.py (1)
13-29
: 🛠️ Refactor suggestionRemove duplicate imports.
The following response types are imported twice:
OpenResponse
CloseResponse
ErrorResponse
UnhandledResponse
-from .client import ( - #### common websocket response - OpenResponse, - CloseResponse, - ErrorResponse, - UnhandledResponse, - #### unique +from .client import ( + #### unique WelcomeResponse, SettingsAppliedResponse, ConversationTextResponse, UserStartedSpeakingResponse, AgentThinkingResponse, FunctionCalling, FunctionCallRequest, AgentStartedSpeakingResponse, AgentAudioDoneResponse, )Likely invalid or redundant comment.
deepgram/clients/agent/enums.py (1)
10-37
: LGTM! Well-structured event enumeration.The
AgentWebSocketEvents
enum provides a comprehensive list of events, properly categorized into server and client events, with clear documentation.deepgram/clients/agent/v1/websocket/response.py (1)
70-77
: LGTM!FunctionCalling
class aligns with documentation.The implementation matches the Deepgram documentation for voice agent function calling message, which should resolve the error mentioned in the PR objectives.
deepgram/clients/read_router.py (1)
13-13
: LGTM! Class renaming and docstring updates align with router-based architecture.The changes consistently rename the class to
ReadRouter
and update docstrings to reflect the new terminology, aligning with the broader restructuring of the SDK to introduce router-based architecture.Also applies to: 42-42, 49-49
examples/agent/async_simple/main.py (2)
119-119
: Verify the model name "gpt-4o-mini".The model name "gpt-4o-mini" appears to be incorrect. OpenAI's model naming convention typically uses "gpt-4" or "gpt-3.5-turbo". Please verify the correct model name from Deepgram's documentation.
47-93
: LGTM! Well-structured event handlers with proper error handling.The event handlers are well-organized, properly typed, and include comprehensive error handling.
deepgram/clients/speak_router.py (1)
16-16
: LGTM! Class renaming aligns with router-based architecture.The renaming to
SpeakRouter
is consistent with the broader SDK restructuring.examples/agent/simple/main.py (2)
140-140
: Verify the model name "gpt-4o-mini".The model name appears to be incorrect. Please verify the correct model name from Deepgram's documentation.
88-98
: LGTM! Well-implemented function call request handler.The function call request handler is well-structured with proper error handling and response generation.
deepgram/clients/listen_router.py (3)
21-24
: LGTM! Well-structured router implementation.The renaming from
Listen
toListenRouter
and the class structure follows good design patterns:
- Clear separation of concerns with router functionality
- Comprehensive docstring with attributes and methods
- Proper initialization with configuration options
Line range hint
44-63
: Well-implemented deprecation notices.The deprecated properties are properly marked with:
- Clear deprecation version information
- Removal timeline
- Migration guidance in the details
Line range hint
89-117
: LGTM! Clean implementation of new router properties.The new properties (
rest
,asyncrest
,websocket
,asyncwebsocket
) follow a consistent pattern and provide clear documentation.deepgram/audio/microphone/microphone.py (1)
12-13
: LGTM! Good use of constants for configuration.Moving configuration values to constants improves maintainability and follows the DRY principle.
deepgram/audio/speaker/speaker.py (3)
267-267
: LGTM! Proper async message handling.The addition of
await self._push_callback(message)
ensures proper async message handling.
300-300
: LGTM! Consistent message handling in threaded context.The addition of
self._push_callback(message)
maintains consistency with the async implementation.
369-369
: LGTM! Smart microphone muting logic.The enhanced muting logic improves the user experience by automatically managing microphone state based on speaker activity.
deepgram/client.py (3)
311-363
: LGTM! Comprehensive agent-related imports.The additions include all necessary types and classes for the Agent API implementation:
- WebSocket events and responses
- Configuration options
- Function calling related types
454-454
: LGTM! Consistent router implementation.The router implementations for
listen
,read
, andspeak
follow a consistent pattern, returning their respective router instances.Also applies to: 461-461, 468-468
537-543
: LGTM! Well-documented agent router property.The new
agent
property:
- Follows the established router pattern
- Includes clear documentation
- Returns an
AgentRouter
instance for agent-related functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
examples/agent/async_simple/main.py (1)
19-20
: Avoid the use of a global variable
warning_notice
is declared global, which can complicate unit testing and maintainability. Consider storing it at the object or function level.examples/agent/simple/main.py (2)
33-34
: Avoid global state.Consider refactoring to use a class-based approach instead of global variables. This would make the code more maintainable and testable.
class DeepgramAgentExample: def __init__(self): self.warning_notice = True # ... rest of the implementation
88-98
: Improve error handling in function call request.The current error message is too generic. Consider:
- Adding specific error handling for different types of exceptions
- Including more context in the error message
- Logging the error instead of just printing
def on_function_call_request(self, function_call_request: FunctionCallRequest, **kwargs): print(f"\n\nFunction Call Request: {function_call_request}\n\n") try: response = FunctionCallResponse( function_call_id=function_call_request.function_call_id, output="Function response here" ) dg_connection.send_function_call_response(response) - except Exception as e: - print(f"Error in function call: {e}") + except ValueError as e: + print(f"Invalid function call request: {e}") + except ConnectionError as e: + print(f"Failed to send function call response: {e}") + except Exception as e: + print(f"Unexpected error in function call request handler: {type(e).__name__}: {e}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
deepgram/clients/agent/enums.py
(1 hunks)deepgram/clients/agent/v1/websocket/async_client.py
(1 hunks)deepgram/clients/agent/v1/websocket/client.py
(1 hunks)examples/agent/async_simple/main.py
(1 hunks)examples/agent/simple/main.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepgram/clients/agent/enums.py
🔇 Additional comments (10)
deepgram/clients/agent/v1/websocket/client.py (1)
547-594
: Keep-alive thread synchronization
This mirrors a previous concern about the_keep_alive()
method's reliance on_exit_event
to exit the loop. Ensure_exit_event
is triggered in all termination scenarios (both normal and abnormal) to prevent runaway threads or early thread termination.deepgram/clients/agent/v1/websocket/async_client.py (2)
102-104
: Potential restriction for self-hosted usage
Previously flagged: forcing_config.url = "agent.deepgram.com"
may limit or prevent self-hosted agent server configurations. Consider providing a way to override the URL for non-cloud deployments.
555-602
: Keep-alive task synchronization
Similar to the synchronous version,_keep_alive()
depends on_exit_event
to break its loop. Confirm that_exit_event
is consistently set to avoid lingering coroutines or abrupt termination.examples/agent/async_simple/main.py (1)
42-42
: Empty API key usage
Reiterating a previous warning: initializingDeepgramClient
with an empty API key will cause authentication errors in production.examples/agent/simple/main.py (6)
17-30
: Remove redundant import verification.The debug prints for import verification are redundant since the imports are already declared at the top of the file.
50-50
: Empty API key in DeepgramClient initialization.The DeepgramClient is initialized with an empty API key (""). This will likely cause authentication failures in production.
85-87
: Update event handler to use FunctionCallRequest.Based on the PR objectives to fix the
FunctionCalling
error, this event handler should useFunctionCallRequest
instead ofFunctionCalling
.
151-159
: LGTM! Comprehensive error handling.The error handling is well-structured with:
- Specific handling for ImportError with detailed context
- Generic exception handling as a fallback
- Informative error messages
161-162
: LGTM! Proper main entry point.Correctly uses the standard Python idiom for the main entry point.
7-15
: 🛠️ Refactor suggestionRemove duplicate imports.
FunctionCallRequest
andFunctionCallResponse
are imported twice - here and in the verification section below. Keep only one import statement.from deepgram import ( DeepgramClient, DeepgramClientOptions, AgentWebSocketEvents, SettingsConfigurationOptions, FunctionCalling, FunctionCallRequest, FunctionCallResponse, )Likely invalid or redundant comment.
I still need to add injectRejected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (8)
deepgram/clients/agent/v1/websocket/client.py (4)
100-104
: Consider allowing custom agent server URLs.Currently, the code unconditionally overrides
self._config.url
with"agent.deepgram.com"
. If self-hosted customers need to specify a custom URL, this hardcoded approach might limit flexibility.-# override the endpoint since it needs to be "wss://agent.deepgram.com/agent" -self._config.url = "agent.deepgram.com" +# override the endpoint by default but allow overriding for self-hosted deployments +if not self._config.url: + self._config.url = "agent.deepgram.com"
229-310
: Reduce broad exception handling and separate concerns for clarity.The
try/except
block instart()
catches all exceptions broadly. This obscures the root cause of potential errors. Consider narrowing the exception scope or separating out distinct concerns (e.g., separate speaker logic, connection logic, and configuration logic) to improve debuggability.
344-541
: Avoid using broadexcept Exception
in_process_text
.Catching all exceptions, including
Exception
, can mask critical errors and make debugging more difficult. Consider handling specific exception types or re-raising exceptions after logging them if they are unexpected.
606-676
: Consider promoting maintainability infinish()
.While the method closes threads and cleans up resources, adding clear docstring explanations and a more modular approach (e.g., separate private methods for stopping audio devices, keep-alive, listener threads, etc.) could reduce complexity and make future changes safer.
deepgram/clients/agent/v1/websocket/async_client.py (3)
100-104
: Allow custom agent server URLs for self-hosted scenarios.Similar to the sync client, this code sets
self._config.url = "agent.deepgram.com"
by default. Consider allowing users to override this for on-premise or custom deployments.-# override the endpoint since it needs to be "wss://agent.deepgram.com/agent" -self._config.url = "agent.deepgram.com" +# override the endpoint by default, but allow override in config for self-hosted setups +if not self._config.url: + self._config.url = "agent.deepgram.com"
180-308
: Separate business logic from connection logic.The
start()
method includes microphone, speaker, keep-alive, and error-handling logic. Extracting pieces into helper methods (e.g.,_init_audio_devices()
,_start_keep_alive_task()
) can improve readability and maintainability.
350-551
: Use more specific exception handling in_process_text
.Catching
Exception
broadly could suppress important error details. Either handle only known exceptions or re-raise them after logging to avoid masking critical failures.deepgram/clients/__init__.py (1)
322-374
: Consider grouping related imports.The agent-related imports are comprehensive but could be better organized. Consider grouping them by functionality (e.g., responses, options, models) using clear section comments, similar to how other imports are organized in this file.
Example organization:
# agent responses from .agent import ( WelcomeResponse, SettingsAppliedResponse, ... ) # agent options from .agent import ( SettingsConfigurationOptions, UpdateInstructionsOptions, ... ) # agent models from .agent import ( Listen, Speak, Header, ... )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
deepgram/__init__.py
(2 hunks)deepgram/client.py
(4 hunks)deepgram/clients/__init__.py
(2 hunks)deepgram/clients/agent/__init__.py
(1 hunks)deepgram/clients/agent/client.py
(1 hunks)deepgram/clients/agent/enums.py
(1 hunks)deepgram/clients/agent/v1/__init__.py
(1 hunks)deepgram/clients/agent/v1/websocket/__init__.py
(1 hunks)deepgram/clients/agent/v1/websocket/async_client.py
(1 hunks)deepgram/clients/agent/v1/websocket/client.py
(1 hunks)deepgram/clients/agent/v1/websocket/response.py
(1 hunks)examples/agent/simple/main.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- deepgram/clients/agent/v1/websocket/response.py
- examples/agent/simple/main.py
- deepgram/clients/agent/v1/init.py
- deepgram/clients/agent/init.py
- deepgram/clients/agent/client.py
🔇 Additional comments (13)
deepgram/clients/agent/v1/websocket/client.py (1)
558-605
: Keep-alive thread synchronization revisited.This implementation is similar to a previous review comment about ensuring
_exit_event
is set under all error-handling paths to avoid orphan threads. Double-check that every potential early exit (e.g.,_process_text
or_process_binary
failures) calls_signal_exit()
and that_exit_event
is always set. Otherwise,_keep_alive
could loop indefinitely.deepgram/clients/agent/v1/websocket/async_client.py (1)
568-615
: Confirm keep-alive cancellation and exception paths.Check that every exit condition sets
_exit_event
before_keep_alive
continues looping. Otherwise, the task could fail to exit in error scenarios. This logic nearly duplicates the sync client path.deepgram/clients/agent/v1/websocket/__init__.py (2)
5-6
: Exported classes are clearly distinguished.The import statements make the
AgentWebSocketClient
andAsyncAgentWebSocketClient
readily available for library consumers. No immediate issues noted.
8-50
: Good re-export of response and option classes.Centralizing these imports in
__init__.py
simplifies usage for downstream consumers. This approach helps ensure clients can access needed classes without diving into submodules.deepgram/clients/agent/enums.py (4)
1-8
: LGTM!The license header and import statement are correctly implemented.
10-14
: LGTM!The class definition and docstring clearly explain the purpose of the enumeration.
15-29
: LGTM!The server events section comprehensively covers all necessary events for the Agent API.
31-37
: LGTM!The client events section is complete and includes the previously missing
InjectionRefused
event.deepgram/clients/__init__.py (1)
51-54
: LGTM!The router imports are correctly organized and include the new
AgentRouter
.deepgram/__init__.py (2)
37-37
: LGTM!The router imports are correctly organized.
Line range hint
306-374
: Organization needs improvement.The agent-related imports need better organization, similar to the suggestion made for
deepgram/clients/__init__.py
.deepgram/client.py (2)
58-58
: LGTM!The router imports are correctly organized.
312-363
: Organization needs improvement.The agent-related imports need better organization, similar to the suggestion made for
deepgram/clients/__init__.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
README.md (2)
178-179
: Enhance environment variable setup instructions.Consider adding more specific instructions about setting up the environment variable, especially for different operating systems.
Add the following examples:
To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example with: `python main.py` or `python3 main.py`. + +For Unix-based systems: +```bash +export DEEPGRAM_API_KEY='your-api-key' +``` + +For Windows Command Prompt: +```cmd +set DEEPGRAM_API_KEY=your-api-key +``` + +For Windows PowerShell: +```powershell +$env:DEEPGRAM_API_KEY='your-api-key' +```
181-184
: Add description for the Agent API section.To help users understand the purpose and capabilities of the Agent API, consider adding a brief description.
Add the following description:
### Agent + +The Agent API allows you to create conversational AI agents that can understand and respond to spoken input in real-time. These examples demonstrate: - Simple - [examples/agent/simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/simple/main.py) - Async Simple - [examples/agent/async_simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/async_simple/main.py)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Check - mdlint
README.md
[warning] 180-180: Multiple consecutive blank lines found. Expected: 1; Actual: 2
[warning] 186-186: Trailing punctuation ':' found in heading
[warning] 191-191: Trailing punctuation ':' found in heading
[warning] 198-198: Trailing punctuation ':' found in heading
[warning] 207-207: Trailing punctuation ':' found in heading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (5)
deepgram/clients/agent/v1/websocket/options.py (5)
20-27
: Enhance Listen class documentation.The docstring should include information about the default model and its purpose.
""" - This class defines any configuration settings for the Listen model. + This class defines configuration settings for the Listen model. + + Attributes: + model: The name of the model to use for speech recognition. + Defaults to "nova-2". """
29-47
: Consider making model names configurable via constants.The hardcoded model names like "aura-asteria-en" and "nova-2" should ideally be defined as constants or pulled from a configuration file for better maintainability.
+# At the top of the file +DEFAULT_LISTEN_MODEL = "nova-2" +DEFAULT_SPEAK_MODEL = "aura-asteria-en" + @dataclass class Listen(BaseResponse): - model: Optional[str] = field(default="nova-2") + model: Optional[str] = field(default=DEFAULT_LISTEN_MODEL) @dataclass class Speak(BaseResponse): model: Optional[str] = field( - default="aura-asteria-en", + default=DEFAULT_SPEAK_MODEL, metadata=dataclass_config(exclude=lambda f: f is None) )
90-92
: Add type validation for Parameters class.The 'type' field should be validated against allowed JSON Schema types.
+ALLOWED_PARAMETER_TYPES = {"object", "array", "string", "number", "boolean", "null"} + @dataclass class Parameters(BaseResponse): - type: str + type: str = field(metadata={"validator": lambda x: x in ALLOWED_PARAMETER_TYPES})
199-200
: Define audio configuration constants.Audio encoding and sample rate defaults should be defined as constants.
+DEFAULT_AUDIO_ENCODING = "linear16" +DEFAULT_SAMPLE_RATE = 16000 +DEFAULT_CONTAINER = "none" + @dataclass class Input(BaseResponse): - encoding: Optional[str] = field(default="linear16") - sample_rate: int = field(default=16000) + encoding: Optional[str] = field(default=DEFAULT_AUDIO_ENCODING) + sample_rate: int = field(default=DEFAULT_SAMPLE_RATE) @dataclass class Output(BaseResponse): - encoding: Optional[str] = field(default="linear16") - sample_rate: Optional[int] = field(default=16000) + encoding: Optional[str] = field(default=DEFAULT_AUDIO_ENCODING) + sample_rate: Optional[int] = field(default=DEFAULT_SAMPLE_RATE) bitrate: Optional[int] = field( default=None, metadata=dataclass_config(exclude=lambda f: f is None) ) - container: Optional[str] = field(default="none") + container: Optional[str] = field(default=DEFAULT_CONTAINER)Also applies to: 209-214
303-304
: Add input validation for message fields.The message fields in various classes should be validated to ensure they're not empty when required.
+def validate_non_empty(value: str) -> bool: + return bool(value and value.strip()) + @dataclass class UpdateInstructionsOptions(BaseResponse): type: str = str(AgentWebSocketEvents.UpdateInstructions) - instructions: str = field(default="") + instructions: str = field(default="", metadata={"validator": validate_non_empty}) @dataclass class UpdateSpeakOptions(BaseResponse): type: str = str(AgentWebSocketEvents.UpdateSpeak) - model: str = field(default="") + model: str = field(default="", metadata={"validator": validate_non_empty}) @dataclass class InjectAgentMessageOptions(BaseResponse): type: str = str(AgentWebSocketEvents.InjectAgentMessage) - message: str = field(default="") + message: str = field(default="", metadata={"validator": validate_non_empty}) @dataclass class FunctionCallResponse(BaseResponse): type: str = "FunctionCallResponse" - function_call_id: str = field(default="") - output: str = field(default="") + function_call_id: str = field(default="", metadata={"validator": validate_non_empty}) + output: str = field(default="", metadata={"validator": validate_non_empty})Also applies to: 316-317, 329-330, 340-342
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepgram/clients/agent/v1/websocket/options.py
(1 hunks)
🔇 Additional comments (1)
deepgram/clients/agent/v1/websocket/options.py (1)
1-16
: LGTM! Well-organized imports and proper license headers.The file starts with appropriate copyright notices and includes all necessary imports organized in a logical manner.
def __getitem__(self, key): | ||
_dict = self.to_dict() | ||
if "input" in _dict: | ||
_dict["input"] = [Input.from_dict(input) for input in _dict["input"]] | ||
if "output" in _dict: | ||
_dict["output"] = [Output.from_dict(output) for output in _dict["output"]] | ||
return _dict[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect list conversion in Audio.getitem.
The method incorrectly converts single instances to lists.
def __getitem__(self, key):
_dict = self.to_dict()
if "input" in _dict:
- _dict["input"] = [Input.from_dict(input) for input in _dict["input"]]
+ _dict["input"] = Input.from_dict(_dict["input"])
if "output" in _dict:
- _dict["output"] = [Output.from_dict(output) for output in _dict["output"]]
+ _dict["output"] = Output.from_dict(_dict["output"])
return _dict[key]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def __getitem__(self, key): | |
_dict = self.to_dict() | |
if "input" in _dict: | |
_dict["input"] = [Input.from_dict(input) for input in _dict["input"]] | |
if "output" in _dict: | |
_dict["output"] = [Output.from_dict(output) for output in _dict["output"]] | |
return _dict[key] | |
def __getitem__(self, key): | |
_dict = self.to_dict() | |
if "input" in _dict: | |
_dict["input"] = Input.from_dict(_dict["input"]) | |
if "output" in _dict: | |
_dict["output"] = Output.from_dict(_dict["output"]) | |
return _dict[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
README.md (1)
178-179
: Add missing comma for better readability.Add a comma after "example" to improve the sentence structure.
-To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example with: `python main.py` or `python3 main.py`. +To run each example, set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example with: `python main.py` or `python3 main.py`.🧰 Tools
🪛 LanguageTool
[uncategorized] ~178-~178: A comma might be missing here.
Context: ...uirements-examples.txt ``` To run each example set theDEEPGRAM_API_KEY
as an enviro...(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
README.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~178-~178: A comma might be missing here.
Context: ...uirements-examples.txt ``` To run each example set the DEEPGRAM_API_KEY
as an enviro...
(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)
🪛 GitHub Actions: Check - mdlint
README.md
[error] 180-180: Multiple consecutive blank lines found. Expected 1 blank line but found 2 (MD012/no-multiple-blanks)
🔇 Additional comments (3)
README.md (3)
181-184
: LGTM! Well-structured Agent section.The new Agent section is well-organized and consistent with the documentation style. The examples are properly linked and follow the established pattern.
180-180
: Multiple consecutive blank lines.This issue was previously identified. Please refer to the existing comment about markdown linting issues.
🧰 Tools
🪛 GitHub Actions: Check - mdlint
[error] 180-180: Multiple consecutive blank lines found. Expected 1 blank line but found 2 (MD012/no-multiple-blanks)
193-196
: Incorrect example paths in Analyze Text section.This issue was previously identified. Please refer to the existing comment about incorrect example paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
deepgram/clients/agent/v1/websocket/client.py (1)
562-609
:⚠️ Potential issue[Duplicate Comment] Keep-alive thread synchronization.
The_keep_alive()
method runs an infinite loop, relying on_exit_event
. Ensure that_exit_event
is set in all termination paths and that any exceptions triggering exit also set this event, preventing the keep-alive thread from continuing indefinitely.deepgram/clients/agent/v1/websocket/async_client.py (1)
566-613
:⚠️ Potential issue[Duplicate Comment] Keep-alive task synchronization.
The_keep_alive()
method continuously sleeps and checks_exit_event
. Make sure_exit_event
is triggered in all error termination paths, avoiding an unintentional infinite loop in this coroutine.deepgram/clients/agent/v1/websocket/options.py (5)
76-80
:⚠️ Potential issue[Duplicate Comment] Fix mismatch between
item
definition and list usage inProperties.__getitem__()
.
Currently, the code converts"item"
into a list comprehension, butProperties
definesitem
as a singleItem
. Either changeitem
to a list type or remove the list comprehension logic to maintain internal consistency.- item: Item + items: List[Item] = field(default_factory=list) - if "item" in _dict: - _dict["item"] = [Item.from_dict(item) for item in _dict["item"]] + if "items" in _dict: + _dict["items"] = [Item.from_dict(i) for i in _dict["items"]]
116-127
:⚠️ Potential issue[Duplicate Comment] Adjust list conversion in
Function.__getitem__()
.
The method incorrectly treatsparameters
as a list. IfFunction.parameters
is defined as a single instance, you should remove the list comprehension. Similar action is needed forheaders
if it’s meant to always be a list.
180-189
:⚠️ Potential issue[Duplicate Comment] Resolve single vs. list mismatch in
Agent.__getitem__()
.
listen
,think
, andspeak
fields are single instances, but they’re being cast to lists in__getitem__()
. This causes a discrepancy in how the fields are handled.
224-231
:⚠️ Potential issue[Duplicate Comment] Validate
Audio.__getitem__()
logic.
input
andoutput
are single instances but interpreted as lists here. Fix to align with the underlyingAudio
class definition.
276-290
:⚠️ Potential issue[Duplicate Comment] Implement real checks or remove
check()
method.
Currently,check()
logs atERROR
and then exits without actual validation. Consider adding logic to confirm valid or deprecated options, or remove the method entirely to avoid confusion.
🧹 Nitpick comments (6)
deepgram/clients/agent/v1/websocket/client.py (3)
60-61
: Consider extracting shared logic into a helper class or module.
The classAgentWebSocketClient
contains extensive logic for both audio management and WebSocket event handling, which may be a bit large and harder to maintain. Splitting responsibilities into smaller, more focused classes or modules (e.g., a separate module for audio device setup) can improve readability and maintainability.
180-301
: Reduce code complexity instart()
method.
Thestart()
method is quite long and branches extensively for microphone, speaker, and keep-alive initialization. Consider refactoring some of the branching logic, such as device setup, into separate helper functions. This will make the code more modular, easier to test, and more readable.
311-321
: Add safeguards for unrecognized events inon
method.
When registering event handlers, you rely onAgentWebSocketEvents.__members__.values()
. While that prevents invalid events, consider adding a defensive check to inform the user if an unsupported event is passed (e.g., a typed mistake). This can help with debugging and user error transparency.deepgram/clients/agent/v1/websocket/async_client.py (2)
180-299
: Revisit error handling instart()
method.
Current implementation raises a broadDeepgramError
for invalid settings and logs other exceptions without immediate rethrows. Consider more granular exception types to provide better clarity, or at least consistent logging and rethrow patterns for critical issues.
350-549
: Asynchronous_process_text()
complexity.
Similar to the synchronous version,_process_text()
is large and handles multiple event types in a single match block. Consider extracting event-specific logic or callbacks into dedicated handlers to improve readability and maintainability.README.md (1)
178-178
: Add missing comma for better readability.Add a comma after "To run each example" to improve readability.
-To run each example set the `DEEPGRAM_API_KEY` as an environment variable +To run each example, set the `DEEPGRAM_API_KEY` as an environment variable🧰 Tools
🪛 LanguageTool
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...uirements-examples.txt ``` To run each example set theDEEPGRAM_API_KEY
as an enviro...(AI_HYDRA_LEO_MISSING_COMMA)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
README.md
(2 hunks)deepgram/clients/agent/v1/websocket/async_client.py
(1 hunks)deepgram/clients/agent/v1/websocket/client.py
(1 hunks)deepgram/clients/agent/v1/websocket/options.py
(1 hunks)deepgram/clients/agent/v1/websocket/response.py
(1 hunks)deepgram/options.py
(1 hunks)examples/agent/simple/main.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/agent/simple/main.py
- deepgram/clients/agent/v1/websocket/response.py
🧰 Additional context used
🪛 LanguageTool
README.md
[uncategorized] ~178-~178: Possible missing comma found.
Context: ...uirements-examples.txt ``` To run each example set the DEEPGRAM_API_KEY
as an enviro...
(AI_HYDRA_LEO_MISSING_COMMA)
🔇 Additional comments (6)
deepgram/options.py (1)
91-93
: LGTM! The multi-line formatting improves readability.The User-Agent header formatting follows the standard pattern and correctly includes both the SDK and Python version information.
deepgram/clients/agent/v1/websocket/client.py (1)
548-560
: Validate potential audio injection logic in_process_binary()
.
While you emit anAudioData
event with raw bytes, consider validating or filtering the audio data if future use cases might require it (e.g., ensuring correct sampling rate). This might reduce the risk of passing invalid audio data downstream.deepgram/clients/agent/v1/websocket/async_client.py (1)
311-318
: Check event registration inon
method.
When registering asynchronous callbacks, confirm all handlers are truly awaitable. If a handler is not async, you may need to wrap it in a coroutine or handle it differently. This prevents unexpected runtime errors or event dispatch slowdowns.deepgram/clients/agent/v1/websocket/options.py (1)
157-168
:⚠️ Potential issuePotential mismatch with
Think
properties.
InThink.__getitem__()
,provider
can also be incorrectly converted to a list instead of a single instance. Ensure consistent definitions or remove the list comprehension logic if only oneprovider
is expected.- if "provider" in _dict: - _dict["provider"] = [ - Provider.from_dict(provider) for provider in _dict["provider"] - ] + if "provider" in _dict: + _dict["provider"] = Provider.from_dict(_dict["provider"])Likely invalid or redundant comment.
README.md (2)
180-184
: LGTM! New Agent section is well-structured.The new Agent section is properly formatted and includes clear links to both simple and async examples, which aligns well with the PR objectives for Agent API GA.
192-195
: Fix incorrect example paths in Analyze Text section.The example paths for sentiment, summarization, and topic detection are incorrect.
- Intent Recognition - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/intent/main.py) -- Sentiment Analysis - [examples/sentiment/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/sentiment/main.py) -- Summarization - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/summary/main.py) -- Topic Detection - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/topic/main.py) +- Sentiment Analysis - [examples/analyze/sentiment](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/sentiment/main.py) +- Summarization - [examples/analyze/summary](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/summary/main.py) +- Topic Detection - [examples/analyze/topic](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/topic/main.py)
- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/stream_file/main.py) | ||
- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/async_url/main.py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect example paths in Live Audio Transcription section.
The example paths for microphone and HTTP endpoint streaming are incorrect.
-- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/stream_file/main.py)
-- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/async_url/main.py)
+- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/microphone/main.py)
+- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/http/main.py)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/stream_file/main.py) | |
- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/async_url/main.py) | |
- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/microphone/main.py) | |
- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/streaming/http/main.py) |
To add Agent API to Python SDK for GA.
Summary by CodeRabbit
Summary by CodeRabbit
Based on the comprehensive changes, here are the release notes:
New Features
Improvements
Bug Fixes
Chores