app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

ahundt · 2025-02-26T21:57:55Z

Note, while this code worked with gradio_webrtc==0.0.28 (modulo the bugs previously discussed googleapis/python-genai#380 and aiortc/aiortc#1258 ), it currently crashes with fastrtc==0.0.6 when run locally on an m3 mac.

with this version info, while running on an m3 mac:

[project]
name = "gemini-audio-video-chat"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "fastrtc[vad, tts]==0.0.6",
    "google-genai==0.3.0",
    "twilio",
    "opencv-python",
    "dotenv",
]

And the output doesn't betray any major errors:

athundt@Andrews2024MBP|~/source/gemini-audio-video-chat on ui_improvements!?
± uv run app.py
/Users/athundt/source/gemini-audio-video-chat/.venv/lib/python3.13/site-packages/google_crc32c/__init__.py:29: RuntimeWarning: As the c extension couldn't be imported, `google-crc32c` is using a pure python implementation that is significantly slower. If possible, please configure a c build environment and compile the extension
  warnings.warn(_SLOW_CRC32C_WARNING, RuntimeWarning)
2025-02-26 16:53:48,133 - INFO - Attempting to get Twilio credentials (attempt 1)...
2025-02-26 16:53:48,190 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,190 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts//Tokens.json
2025-02-26 16:53:48,190 - INFO - Headers:
2025-02-26 16:53:48,190 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,190 - INFO - Accept : application/json
2025-02-26 16:53:48,190 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,190 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,190 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,190 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,499 - INFO - Response Status Code: 201
2025-02-26 16:53:48,499 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ16281ff7d4de919554b87046cba1e036', 'Twilio-Request-Duration': '0.049', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 1fecb697c6f121d7ce54a35628ac154e.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': '1dr27ZIYkNBQo-G61YyOD_cwC3txTht7xO5zdrFQMw5zbBtR-eGJFA==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
2025-02-26 16:53:48,499 - INFO - Twilio credentials response: {'iceServers': [{'url': 'stun:global.stun.twilio.com:3478', 'urls': 'stun:global.stun.twilio.com:3478'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=udp', 'urls': 'turn:global.turn.twilio.com:3478?transport=udp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:3478?transport=tcp', 'urls': 'turn:global.turn.twilio.com:3478?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}, {'credential': 'ZdosbIThoHiWTOOjDOt0T4wBygdWlfzjXjJOocGWu3Y=', 'url': 'turn:global.turn.twilio.com:443?transport=tcp', 'urls': 'turn:global.turn.twilio.com:443?transport=tcp', 'username': 'c9136edbb903bdf9a66799be17f23526e45f2b87155497dad4b9ba4ef97a44a1'}], 'iceTransportPolicy': 'relay'}
2025-02-26 16:53:48,499 - INFO - Twilio TURN server available.
2025-02-26 16:53:48,566 - INFO - -- BEGIN Twilio API Request --
2025-02-26 16:53:48,566 - INFO - POST Request: https://api.twilio.com/2010-04-01/Accounts/\/Tokens.json
2025-02-26 16:53:48,566 - INFO - Headers:
2025-02-26 16:53:48,566 - INFO - Content-Type : application/x-www-form-urlencoded
2025-02-26 16:53:48,566 - INFO - Accept : application/json
2025-02-26 16:53:48,566 - INFO - User-Agent : twilio-python/9.4.6 (Darwin x86_64) Python/3.13.2
2025-02-26 16:53:48,566 - INFO - X-Twilio-Client : python-9.4.6
2025-02-26 16:53:48,566 - INFO - Accept-Charset : utf-8
2025-02-26 16:53:48,566 - INFO - -- END Twilio API Request --
2025-02-26 16:53:48,689 - INFO - Response Status Code: 201
2025-02-26 16:53:48,689 - INFO - Response Headers: {'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '1192', 'Connection': 'keep-alive', 'Date': 'Wed, 26 Feb 2025 21:53:48 GMT', 'Twilio-Concurrent-Requests': '1', 'Twilio-Request-Id': 'RQ07d1dc4e5762a2408a3cbdd683b7513b', 'Twilio-Request-Duration': '0.058', 'X-Home-Region': 'us1', 'X-API-Domain': 'api.twilio.com', 'Strict-Transport-Security': 'max-age=31536000', 'X-Cache': 'Miss from cloudfront', 'Via': '1.1 7c52bc60e0da5f557ed6047264a41c18.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'IAD61-P2', 'X-Amz-Cf-Id': 'DzRq4ZHXRB2auwP9sAUGH160f8FYLqBIaRBDiLyxi0k8AmWMLymIRQ==', 'X-Powered-By': 'AT-5000', 'X-Shenanigans': 'none', 'Vary': 'Origin'}
* Running on local URL:  http://127.0.0.1:7860
2025-02-26 16:53:48,831 - INFO - HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-02-26 16:53:48,845 - INFO - HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
2025-02-26 16:53:48,855 - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"

However, the current app.py also fails similarly on fastrtc==0.0.6 when run locally, as did this suggested huggingface spaces version b88286b.

Continuing from this discussion:
https://huggingface.co/spaces/freddyaboulton/gemini-audio-video-chat/discussions/1

See also the bugs previously discussed:
googleapis/python-genai#380 and aiortc/aiortc#1258

This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding.

Faster, Robust Image Encoding: Enhanced encode_image with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV.
Synchronous Twilio Check (Pre-UI): Implemented synchronous Twilio TURN server availability check before Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads.
UI Status Updates:
- Added immediate Twilio status update on UI load.
- Gemini connection status is displayed and updated to inform users.
Robust Gemini Connection: Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures.
Improved Shutdown: The GeminiHandler.shutdown method is more robust to ensure proper cleanup and prevent lingering issues.
API key validation: Added API key validation to improve the user experience.

…or handling, faster image encoding, and UI status updates This commit improves the Gemini and Twilio integration with a focus on better error handling, UI feedback, connection stability, and faster image encoding. - **Faster, Robust Image Encoding:** Enhanced `encode_image` with comprehensive input validation (NaN/Inf, shape), normalization, and faster JPEG encoding error handling using OpenCV. - **Synchronous Twilio Check (Pre-UI):** Implemented synchronous Twilio TURN server availability check *before* Gradio initialization to avoid race conditions. Includes retry logic with exponential backoff. This ensures accurate status before the UI loads. - **UI Status Updates:** - Added immediate Twilio status update on UI load. - Gemini connection status is displayed and updated to inform users. - **Robust Gemini Connection:** Improved Gemini connection logic with more comprehensive error handling and UI feedback on connection failures. - **Improved Shutdown:** The `GeminiHandler.shutdown` method is more robust to ensure proper cleanup and prevent lingering issues. - **API key validation:** Added API key validation to improve the user experience.

ahundt · 2025-02-26T22:10:10Z

Also, in the gradio_webrtc==0.0.28 version that worked best, I got a run of the app to succeed for a couple of minutes without crashes when on a network where packets drop much less often.

freddyaboulton · 2025-02-27T19:53:55Z

The infinite spinner should be fixed now if you install the latest version (0.0.9!)

tanquangduong · 2025-02-28T14:18:25Z

Thank @freddyaboulton, with the lastest version (0.0.9) the infinite spinner is fixed. But there is a new bug: "fastrtc.utils.WebRTCError: timed out during handshake"

freddyaboulton

Hi @ahundt ! Thanks for the contribution and sorry for the delay in getting this review to you.

There were some api improvements made to the library since you started working on the original gradio_webrtc demo that should make this code easier. Some are already present in the current gemini_audio_video/app.py file and I'd like them to be incorporated into this demo before merging. Namely

No need for async def generator() and async def receive_audio() anymore. The async def generator() becomes async def startup() and there's no need for receive_audio or (generator()) anymore.
Instead of catching Cancelled errors in the emit functions you can use wait_for_item. Also errors are automatically propagated to the UI now so you should not need to catch and return None
The demo will not run on spaces if the twilio credentials are not set so I don't think you need to do the twilio set. And you don't need to use twilio locally. You can use this pattern for only calling them in spaces
To close the connection in shutdown, you can do await self.connection._websocket.close(). Shutdown can now be async.

Separately, does update_gemini_status_sync work? 👀 I'd be surprised if you could update a gradio component like that from the stream.

Also, can you move the encode image/audio files to a separate utils page?

Thank you!

freddyaboulton reviewed Feb 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

ahundt commented Feb 26, 2025 •

edited by freddyaboulton

Loading

ahundt commented Feb 26, 2025

freddyaboulton commented Feb 27, 2025

tanquangduong commented Feb 28, 2025

freddyaboulton left a comment

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

Are you sure you want to change the base?

app.py Gemini/Twilio w/ robust error handling, faster image encoding, and UI status updates #88

Conversation

ahundt commented Feb 26, 2025 • edited by freddyaboulton Loading

ahundt commented Feb 26, 2025

freddyaboulton commented Feb 27, 2025

tanquangduong commented Feb 28, 2025

freddyaboulton left a comment

Choose a reason for hiding this comment

ahundt commented Feb 26, 2025 •

edited by freddyaboulton

Loading