SpeakSieve is a project that aims to transcribe audio files and provide filtered dialogue based on the speaker. It has a frontend built with React and a backend built with FastAPI.
- Python 3.7 or higher
- Node.js 12.0 or higher
- Clone the repository
- Create a virtual environment and activate it
cd CSE508_Winter2023_Group2_Project/
python -m venv env
source env/bin/activate # for Linux/Mac
env/Scripts/activate # for Windows
- Install required python packages
pip install -r requirements.txt
- Install required node packages
cd frontend
npm install
- Start the backend server
cd backend
python main.py
- In a seperate terminal launch the reach app
cd frontend
npm start
- Open your browser and navigate to http://localhost:3000/ to access the SpeakSieve app.
- Upload an audio file in supported format (mp3)
- Choose a model size from the dropdown. The default being used is base.
- Choose language of the audio. (English/Any)
- Enter number of speakers. Default = 1
- Wait for transcription to finish. (This step might take time depending on the duration of the audio and the model size chosen)
├─ .git
├─ .gitignore
├─ backend
│ ├─ audio.wav
│ ├─ audio_files
│ ├─ extract_phrases.py
│ ├─ get_all_dialogues.py
│ ├─ main.py
│ ├─ speaker_tags_generator.py
│ ├─ transcript-word.csv
│ ├─ transcript-word_bleeped.csv
│ ├─ transcript.csv
│ ├─ transcript.txt
│ └─ voice_censoring_api.py
├─ Censoring
│ ├─ VOSK.ipynb
│ └─ vosk.py
├─ Extract_Phrase
│ └─ extract_phrases.py
├─ frontend
│ ├─ .gitignore
│ ├─ package-lock.json
│ ├─ package.json
│ ├─ public
│ │ ├─ favicon.ico
│ │ ├─ index.html
│ │ ├─ logo192.png
│ │ ├─ logo512.png
│ │ ├─ manifest.json
│ │ └─ robots.txt
│ ├─ README.md
│ └─ src
│ ├─ App.css
│ ├─ App.js
│ ├─ App.test.js
│ ├─ components
│ │ ├─ ConfirmedPage.css
│ │ ├─ ConfirmedPage.jsx
│ │ ├─ CustomNavbar.jsx
│ │ ├─ Home.css
│ │ ├─ Home.jsx
│ │ ├─ sample.mp3
│ │ └─ TranscriptionPage.jsx
│ ├─ index.css
│ ├─ index.js
│ ├─ logo.svg
│ ├─ reportWebVitals.js
│ └─ setupTests.js
├─ model-final
│ ├─ environment.yml
│ ├─ hailhydra1.mp3
│ ├─ speaker-separate.py
│ ├─ speakerTags.py
│ └─ speakerTags2.py
├─ model-testing
│ ├─ audio.wav
│ ├─ Baseline Results.ipynb
│ ├─ female-female-mixture.wav
│ ├─ female-female-mixture_est1.wav
│ ├─ female-female-mixture_est2.wav
│ ├─ female-male-mixture.wav
│ ├─ mono_audio.wav
│ ├─ mono_audio_est1.wav
│ ├─ mono_audio_est2.wav
│ ├─ single-source-transcribe.wav
│ ├─ transcript.txt
│ ├─ transcript2.txt
│ └─ transcripts_with_speaker_names.ipynb
├─ README.md
└─ requirements.txt