-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88
base: main
Are you sure you want to change the base?
Conversation
…or handling, faster image encoding, and UI status updates This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding. - **Faster, Robust Image Encoding:** Enhanced `encode_image` with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV. - **Synchronous Twilio Check (Pre-UI):** Implemented synchronous Twilio TURN server availability check *before* Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads. - **UI Status Updates:** - Added immediate Twilio status update on UI load. - Gemini connection status is displayed and updated to inform users. - **Robust Gemini Connection:** Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures. - **Improved Shutdown:** The `GeminiHandler.shutdown` method is more robust to ensure proper cleanup and prevent lingering issues. - **API key validation:** Added API key validation to improve the user experience.
Also, in the gradio_webrtc==0.0.28 version that worked best, I got a run of the app to succeed for a couple of minutes without crashes when on a network where packets drop much less often. |
The infinite spinner should be fixed now if you install the latest version (0.0.9!) |
Thank @freddyaboulton, with the lastest version (0.0.9) the infinite spinner is fixed. But there is a new bug: "fastrtc.utils.WebRTCError: timed out during handshake" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ahundt ! Thanks for the contribution and sorry for the delay in getting this review to you.
There were some api improvements made to the library since you started working on the original gradio_webrtc demo that should make this code easier. Some are already present in the current gemini_audio_video/app.py
file and I'd like them to be incorporated into this demo before merging. Namely
- No need for
async def generator()
andasync def receive_audio()
anymore. Theasync def generator()
becomesasync def startup()
and there's no need forreceive_audio
or (generator()
) anymore. - Instead of catching
Cancelled
errors in the emit functions you can usewait_for_item
. Also errors are automatically propagated to the UI now so you should not need to catch and return None - The demo will not run on spaces if the twilio credentials are not set so I don't think you need to do the twilio set. And you don't need to use twilio locally. You can use this pattern for only calling them in spaces
- To close the connection in shutdown, you can do
await self.connection._websocket.close()
.Shutdown
can now be async.
Separately, does update_gemini_status_sync
work? 👀 I'd be surprised if you could update a gradio component like that from the stream.
Also, can you move the encode image/audio files to a separate utils page?
Thank you!
Note, while this code worked with gradio_webrtc==0.0.28 (modulo the bugs previously discussed googleapis/python-genai#380 and aiortc/aiortc#1258 ), it currently crashes with fastrtc==0.0.6 when run locally on an m3 mac.

with this version info, while running on an m3 mac:
And the output doesn't betray any major errors:
However, the current app.py also fails similarly on fastrtc==0.0.6 when run locally, as did this suggested huggingface spaces version b88286b.
Continuing from this discussion:
https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat/discussions/1
See also the bugs previously discussed:
googleapis/python-genai#380 and aiortc/aiortc#1258
This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding.
encode_image
with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV.GeminiHandler.shutdown
method is more robust to ensure proper cleanup and prevent lingering issues.