Skip to content

A web service that utilises Deep Learning to carry out various Audio based tasks, such as speaker separation, transcription, censorship, etc

Notifications You must be signed in to change notification settings

utkar22/SpeakSieve

Repository files navigation

SpeakSieve

SpeakSieve is a project that aims to transcribe audio files and provide filtered dialogue based on the speaker. It has a frontend built with React and a backend built with FastAPI.

Getting Started

Prerequisites

  • Python 3.7 or higher
  • Node.js 12.0 or higher

Installation

  1. Clone the repository
https://github.com/utkar22/CSE508_Winter2023_Group2_Project.git
  1. Create a virtual environment and activate it
cd CSE508_Winter2023_Group2_Project/
python -m venv env
source env/bin/activate # for Linux/Mac
env/Scripts/activate # for Windows
  1. Install required python packages
pip install -r requirements.txt
  1. Install required node packages
cd frontend
npm install

Running the project

  1. Start the backend server
cd backend
python main.py
  1. In a seperate terminal launch the reach app
cd frontend
npm start
  1. Open your browser and navigate to http://localhost:3000/ to access the SpeakSieve app.

Usage

  1. Upload an audio file in supported format (mp3)
  2. Choose a model size from the dropdown. The default being used is base.
  3. Choose language of the audio. (English/Any)
  4. Enter number of speakers. Default = 1
  5. Wait for transcription to finish. (This step might take time depending on the duration of the audio and the model size chosen)

Project Structure

CSE508_Winter2023_Group2_Project
├─ .git
├─ .gitignore
├─ backend
│  ├─ audio.wav
│  ├─ audio_files
│  ├─ extract_phrases.py
│  ├─ get_all_dialogues.py
│  ├─ main.py
│  ├─ speaker_tags_generator.py
│  ├─ transcript-word.csv
│  ├─ transcript-word_bleeped.csv
│  ├─ transcript.csv
│  ├─ transcript.txt
│  └─ voice_censoring_api.py
├─ Censoring
│  ├─ VOSK.ipynb
│  └─ vosk.py
├─ Extract_Phrase
│  └─ extract_phrases.py
├─ frontend
│  ├─ .gitignore
│  ├─ package-lock.json
│  ├─ package.json
│  ├─ public
│  │  ├─ favicon.ico
│  │  ├─ index.html
│  │  ├─ logo192.png
│  │  ├─ logo512.png
│  │  ├─ manifest.json
│  │  └─ robots.txt
│  ├─ README.md
│  └─ src
│     ├─ App.css
│     ├─ App.js
│     ├─ App.test.js
│     ├─ components
│     │  ├─ ConfirmedPage.css
│     │  ├─ ConfirmedPage.jsx
│     │  ├─ CustomNavbar.jsx
│     │  ├─ Home.css
│     │  ├─ Home.jsx
│     │  ├─ sample.mp3
│     │  └─ TranscriptionPage.jsx
│     ├─ index.css
│     ├─ index.js
│     ├─ logo.svg
│     ├─ reportWebVitals.js
│     └─ setupTests.js
├─ model-final
│  ├─ environment.yml
│  ├─ hailhydra1.mp3
│  ├─ speaker-separate.py
│  ├─ speakerTags.py
│  └─ speakerTags2.py
├─ model-testing
│  ├─ audio.wav
│  ├─ Baseline Results.ipynb
│  ├─ female-female-mixture.wav
│  ├─ female-female-mixture_est1.wav
│  ├─ female-female-mixture_est2.wav
│  ├─ female-male-mixture.wav
│  ├─ mono_audio.wav
│  ├─ mono_audio_est1.wav
│  ├─ mono_audio_est2.wav
│  ├─ single-source-transcribe.wav
│  ├─ transcript.txt
│  ├─ transcript2.txt
│  └─ transcripts_with_speaker_names.ipynb
├─ README.md
└─ requirements.txt

About

A web service that utilises Deep Learning to carry out various Audio based tasks, such as speaker separation, transcription, censorship, etc

Resources

Stars

Watchers

Forks

Packages

No packages published