Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for Agent API GA #497

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ venv/
venv.bak/
.vscode/
.DS_Store
Pipfile
Pipfile.lock

# python artifacts
__pycache__
Expand All @@ -18,3 +20,4 @@ dist/
# build
build/
poetry.lock

17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,19 +175,26 @@ Before running any of these examples, then you need to take a look at the README
pip install -r examples/requirements-examples.txt
```

Text to Speech:
To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example with: `python main.py` or `python3 main.py`.

### Agent

- Simple - [examples/agent/simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/simple/main.py)
- Async Simple - [examples/agent/async_simple](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/agent/async_simple/main.py)

### Text to Speech

- Asynchronous - [examples/text-to-speech](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/text-to-speech/rest/file/async_hello_world/main.py)
- Synchronous - [examples/text-to-speech](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/text-to-speech/rest/file/hello_world/main.py)

Analyze Text:
### Analyze Text

- Intent Recognition - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/intent/main.py)
- Sentiment Analysis - [examples/sentiment/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/sentiment/main.py)
- Summarization - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/summary/main.py)
- Topic Detection - [examples/analyze/intent](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/analyze/topic/main.py)
jpvajda marked this conversation as resolved.
Show resolved Hide resolved

PreRecorded Audio:
### PreRecorded Audio

- Transcription From an Audio File - [examples/prerecorded/file](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/file/main.py)
- Transcription From an URL - [examples/prerecorded/url](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/url/main.py)
Expand All @@ -196,7 +203,7 @@ PreRecorded Audio:
- Summarization - [examples/speech-to-text/rest/summary](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/summary/main.py)
- Topic Detection - [examples/speech-to-text/rest/topic](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/topic/main.py)

Live Audio Transcription:
### Live Audio Transcription

- From a Microphone - [examples/streaming/microphone](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/stream_file/main.py)
- From an HTTP Endpoint - [examples/streaming/http](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/speech-to-text/rest/async_url/main.py)
jpvajda marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -211,8 +218,6 @@ Management API exercise the full [CRUD](https://en.wikipedia.org/wiki/Create,_re
- Scopes - [examples/manage/scopes](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/manage/scopes/main.py)
- Usage - [examples/manage/usage](https://github.com/deepgram/deepgram-python-sdk/blob/main/examples/manage/usage/main.py)

To run each example set the `DEEPGRAM_API_KEY` as an environment variable, then `cd` into each example folder and execute the example: `go run main.py`.

## Logging

This SDK provides logging as a means to troubleshoot and debug issues encountered. By default, this SDK will enable `Information` level messages and higher (ie `Warning`, `Error`, etc) when you initialize the library as follows:
Expand Down
56 changes: 55 additions & 1 deletion deepgram/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
from .errors import DeepgramApiKeyError

# listen/read client
from .client import Listen, Read
from .client import ListenRouter, ReadRouter, SpeakRouter, AgentRouter

# common
from .client import (
Expand Down Expand Up @@ -302,6 +302,60 @@
AsyncSelfHostedClient,
)


# agent
from .client import AgentWebSocketEvents

# websocket
from .client import (
AgentWebSocketClient,
AsyncAgentWebSocketClient,
)

from .client import (
#### common websocket response
# OpenResponse,
# CloseResponse,
# ErrorResponse,
# UnhandledResponse,
#### unique
WelcomeResponse,
SettingsAppliedResponse,
ConversationTextResponse,
UserStartedSpeakingResponse,
AgentThinkingResponse,
FunctionCalling,
FunctionCallRequest,
AgentStartedSpeakingResponse,
AgentAudioDoneResponse,
InjectionRefusedResponse,
)

from .client import (
# top level
SettingsConfigurationOptions,
UpdateInstructionsOptions,
UpdateSpeakOptions,
InjectAgentMessageOptions,
FunctionCallResponse,
AgentKeepAlive,
# sub level
Listen,
Speak,
Header,
Item,
Properties,
Parameters,
Function,
Provider,
Think,
Agent,
Input,
Output,
Audio,
Context,
)

# utilities
# pylint: disable=wrong-import-position
from .audio import Microphone, DeepgramMicrophoneError
Expand Down
1 change: 1 addition & 0 deletions deepgram/audio/microphone/microphone.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import logging

from ...utils import verboselogs

from .constants import LOGGING, CHANNELS, RATE, CHUNK

if TYPE_CHECKING:
Expand Down
4 changes: 3 additions & 1 deletion deepgram/audio/speaker/speaker.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ class Speaker: # pylint: disable=too-many-instance-attributes
# _asyncio_loop: asyncio.AbstractEventLoop
# _asyncio_thread: threading.Thread
_receiver_thread: Optional[threading.Thread] = None

_loop: Optional[asyncio.AbstractEventLoop] = None

_push_callback_org: Optional[Callable] = None
Expand Down Expand Up @@ -265,6 +264,7 @@ async def _start_asyncio_receiver(self):
await self._push_callback(message)
elif isinstance(message, bytes):
self._logger.verbose("Received audio data...")
await self._push_callback(message)
self.add_audio_to_queue(message)
except websockets.exceptions.ConnectionClosedOK as e:
self._logger.debug("send() exiting gracefully: %d", e.code)
Expand Down Expand Up @@ -297,6 +297,7 @@ def _start_threaded_receiver(self):
self._push_callback(message)
elif isinstance(message, bytes):
self._logger.verbose("Received audio data...")
self._push_callback(message)
self.add_audio_to_queue(message)
except Exception as e: # pylint: disable=broad-except
self._logger.notice("_start_threaded_receiver exception: %s", str(e))
Expand Down Expand Up @@ -365,6 +366,7 @@ def _play(self, audio_out, stream, stop):
"LastPlay delta is greater than threshold. Unmute!"
)
self._microphone.unmute()

data = audio_out.get(True, TIMEOUT)
with self._lock_wait:
self._last_datagram = datetime.now()
Expand Down
70 changes: 66 additions & 4 deletions deepgram/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
)

# listen client
from .clients import Listen, Read, Speak
from .clients import ListenRouter, ReadRouter, SpeakRouter, AgentRouter

# speech-to-text
from .clients import LiveClient, AsyncLiveClient # backward compat
Expand Down Expand Up @@ -308,6 +308,61 @@
AsyncSelfHostedClient,
)


# agent
from .clients import AgentWebSocketEvents

# websocket
from .clients import (
AgentWebSocketClient,
AsyncAgentWebSocketClient,
)

from .clients import (
#### common websocket response
# OpenResponse,
# CloseResponse,
# ErrorResponse,
# UnhandledResponse,
#### unique
WelcomeResponse,
SettingsAppliedResponse,
ConversationTextResponse,
UserStartedSpeakingResponse,
AgentThinkingResponse,
FunctionCalling,
FunctionCallRequest,
AgentStartedSpeakingResponse,
AgentAudioDoneResponse,
InjectionRefusedResponse,
)

from .clients import (
# top level
SettingsConfigurationOptions,
UpdateInstructionsOptions,
UpdateSpeakOptions,
InjectAgentMessageOptions,
FunctionCallResponse,
AgentKeepAlive,
# sub level
Listen,
Speak,
Header,
Item,
Properties,
Parameters,
Function,
Provider,
Think,
Agent,
Input,
Output,
Audio,
Context,
)


# client errors and options
from .options import DeepgramClientOptions, ClientOptionsFromEnv
from .errors import DeepgramApiKeyError
Expand Down Expand Up @@ -397,21 +452,21 @@ def listen(self):
"""
Returns a Listen dot-notation router for interacting with Deepgram's transcription services.
"""
return Listen(self._config)
return ListenRouter(self._config)

@property
def read(self):
"""
Returns a Read dot-notation router for interacting with Deepgram's read services.
"""
return Read(self._config)
return ReadRouter(self._config)

@property
def speak(self):
"""
Returns a Speak dot-notation router for interacting with Deepgram's speak services.
"""
return Speak(self._config)
return SpeakRouter(self._config)

@property
@deprecation.deprecated(
Expand Down Expand Up @@ -480,6 +535,13 @@ def asyncselfhosted(self):
"""
return self.Version(self._config, "asyncselfhosted")

@property
def agent(self):
"""
Returns a Agent dot-notation router for interacting with Deepgram's speak services.
"""
return AgentRouter(self._config)

jpvajda marked this conversation as resolved.
Show resolved Hide resolved
# INTERNAL CLASSES
class Version:
"""
Expand Down
60 changes: 57 additions & 3 deletions deepgram/clients/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,10 @@
)
from .errors import DeepgramModuleError

from .listen_router import Listen
from .read_router import Read
from .speak_router import Speak
from .listen_router import ListenRouter
from .read_router import ReadRouter
from .speak_router import SpeakRouter
from .agent_router import AgentRouter

# listen
from .listen import LiveTranscriptionEvents
Expand Down Expand Up @@ -318,3 +319,56 @@
SelfHostedClient,
AsyncSelfHostedClient,
)

# agent
from .agent import AgentWebSocketEvents

# websocket
from .agent import (
AgentWebSocketClient,
AsyncAgentWebSocketClient,
)

from .agent import (
#### common websocket response
# OpenResponse,
# CloseResponse,
# ErrorResponse,
# UnhandledResponse,
#### unique
WelcomeResponse,
SettingsAppliedResponse,
ConversationTextResponse,
UserStartedSpeakingResponse,
AgentThinkingResponse,
FunctionCalling,
FunctionCallRequest,
AgentStartedSpeakingResponse,
AgentAudioDoneResponse,
InjectionRefusedResponse,
)

from .agent import (
# top level
SettingsConfigurationOptions,
UpdateInstructionsOptions,
UpdateSpeakOptions,
InjectAgentMessageOptions,
FunctionCallResponse,
AgentKeepAlive,
# sub level
Listen,
Speak,
Header,
Item,
Properties,
Parameters,
Function,
Provider,
Think,
Agent,
Input,
Output,
Audio,
Context,
)
Loading