The project consists of a FastAPI server (server.app
) and a frontend component with Python scripts (front-end) for various audio-related operations. It also utilizes AWS S3 for storing audio files and includes some external services such as VAD (Voice Activity Detection) and ASR (Automatic Speech Recognition).
- Download release from
https://github.com/snakers4/silero-vad/releases
- Unzip it to
backend/app/ml_models/vad
- Copy files from
files/
tovad/
- Update
utils.py
from the VAD repo, if required.
Any ASR model can be instantiated by implementing the class ASR
from backend/app/src/asr.py
- The project can be run locally using
Localstack
to simulate the creation of the AWS resources.
docker-compose up
- Access
http://localhost:8501/
to see the website.
This project is open-source and available under the MIT License. You are free to use, modify, and distribute it as per the terms of the license.