A powerful, offline Text-to-Speech (TTS) solution based on the Kokoro-82M model, featuring 44 high-quality voices across multiple languages and accents. This local implementation provides fast, reliable text-to-speech conversion with support for multiple output formats (WAV, MP3, AAC) and real-time generation progress display.
- 🎙️ 44 high-quality voices across American English, British English, and other languages
- 💻 Completely offline operation - no internet needed after initial setup
- 📚 Support for PDF and TXT file input
- 🎵 Multiple output formats (WAV, MP3, AAC)
- ⚡ Real-time generation with progress display
- 🎛️ Adjustable speech speed (0.5x to 2.0x)
- 📊 Automatic text chunking for optimal processing
- 🎯 Easy-to-use interactive CLI interface
Before installing Kokoro TTS Local, ensure you have the following prerequisites installed from the below guide:
- Python 3.10.0 or higher
- FFmpeg (for MP3/AAC conversion)
- CUDA-compatible GPU (optional, for faster generation)
- Git (for version control and package management)
-
Download Git installer:
winget install --id Git.Git -e --source winget
Alternatively, download from Git for Windows
-
Verify installation:
git --version
# Ubuntu/Debian
sudo apt update
sudo apt install git
# Fedora
sudo dnf install git
# Arch Linux
sudo pacman -S git
# Verify installation
git --version
# Using Homebrew
brew install git
# Verify installation
git --version
-
iex (irm ffmpeg.tc.ht)
- Verify installation by opening a new Command Prompt:
ffmpeg -version
# Ubuntu/Debian
sudo apt update
sudo apt install ffmpeg
# Fedora
sudo dnf install ffmpeg
# Arch Linux
sudo pacman -S ffmpeg
# Using Homebrew
brew install ffmpeg
-
Check your GPU compatibility:
- Open Command Prompt and run:
dxdiag
- Go to the "Display" tab
- Note your GPU model
- Open Command Prompt and run:
-
Download CUDA Toolkit:
- Visit NVIDIA CUDA Downloads
- Select Windows and your version
- Choose "exe (network)" installer
- Download and run the installer
-
Installation steps:
- Run the downloaded installer
- Choose "Express Installation"
- Wait for the installation to complete
- Restart your computer
-
Verify installation:
nvidia-smi nvcc --version
-
Check your GPU compatibility:
lspci | grep -i nvidia
-
Remove old NVIDIA drivers (if any):
sudo apt-get purge nvidia*
-
Add NVIDIA package repositories:
# Ubuntu 22.04 LTS wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
-
Install CUDA drivers:
sudo apt-get update sudo apt-get -y install cuda-drivers
-
Install CUDA Toolkit:
sudo apt-get install cuda
-
Add CUDA to PATH:
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc source ~/.bashrc
-
Verify installation:
nvidia-smi nvcc --version
Note: For macOS, CUDA is not supported natively. The model will run on CPU only.
-
Install Python 3.10.0
- Download the installer from Python's official website
- During installation, check "Add Python to PATH"
- Verify installation:
python --version
-
Install espeak-ng
- Download the latest release from espeak-ng releases
- Run the installer and follow the prompts
- Add espeak-ng to your system PATH if not done automatically
-
Clone the repository:
git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local
-
Create and activate a virtual environment:
python -m venv venv .\venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
install espeak
# Install espeak-ng sudo apt-get install espeak-ng
-
Install Dependencies
# Install system dependencies sudo apt-get update sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \ libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \ libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev \ liblzma-dev python-openssl git
-
Install pyenv
# Install pyenv curl https://pyenv.run | bash # Add to ~/.bashrc echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc echo 'eval "$(pyenv init -)"' >> ~/.bashrc # Reload shell exec "$SHELL"
-
Install Python 3.10.0
# Install Python 3.10.0 pyenv install 3.10.0 # Clone repository git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local # Set local Python version pyenv local 3.10.0
-
Create and activate virtual environment
# Create virtual environment python -m venv venv # Activate virtual environment source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Install Dependencies
# Install Homebrew if not already installed /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install system dependencies brew install openssl readline sqlite3 xz zlib tcl-tk git # Install espeak-ng brew install espeak-ng
-
Install pyenv
# Install pyenv brew install pyenv # Add to ~/.zshrc (or ~/.bashrc if using bash) echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc echo 'eval "$(pyenv init -)"' >> ~/.zshrc # Reload shell exec "$SHELL"
-
Install Python 3.10.0
# Install Python 3.10.0 pyenv install 3.10.0 # Clone repository git clone https://github.com/solveditnpc/kokoro-tts-local.git cd kokoro-tts-local # Set local Python version pyenv local 3.10.0
-
Create and activate virtual environment
# Create virtual environment python -m venv venv # Activate virtual environment source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
The system includes 44 different voices across various categories:
-
Female (af_*):
- af_alloy: Alloy - Clear and professional
- af_aoede: Aoede - Smooth and melodic
- af_bella: Bella - Warm and friendly
- af_jessica: Jessica - Natural and engaging
- af_kore: Kore - Bright and energetic
- af_nicole: Nicole - Professional and articulate
- af_nova: Nova - Modern and dynamic
- af_river: River - Soft and flowing
- af_sarah: Sarah - Casual and approachable
- af_sky: Sky - Light and airy
-
Male (am_*):
- am_adam: Adam - Strong and confident
- am_echo: Echo - Resonant and clear
- am_eric: Eric - Professional and authoritative
- am_fenrir: Fenrir - Deep and powerful
- am_liam: Liam - Friendly and conversational
- am_michael: Michael - Warm and trustworthy
- am_onyx: Onyx - Rich and sophisticated
- am_puck: Puck - Playful and energetic
-
Female (bf_*):
- bf_alice: Alice - Refined and elegant
- bf_emma: Emma - Warm and professional
- bf_isabella: Isabella - Sophisticated and clear
- bf_lily: Lily - Sweet and gentle
-
Male (bm_*):
- bm_daniel: Daniel - Polished and professional
- bm_fable: Fable - Storytelling and engaging
- bm_george: George - Classic British accent
- bm_lewis: Lewis - Modern British accent
-
French Female (ff_*):
- ff_siwis: Siwis - French accent
-
High-pitched Voices:
- Female (hf_*):
- hf_alpha: Alpha - Higher female pitch
- hf_beta: Beta - Alternative high female pitch
- Male (hm_*):
- hm_omega: Omega - Higher male pitch
- hm_psi: Psi - Alternative high male pitch
- Female (hf_*):
.
├── .cache/ # Cache directory for downloaded models
│ └── huggingface/ # Hugging Face model cache
├── .git/ # Git repository data
├── .gitignore # Git ignore rules
├── __pycache__/ # Python cache files
├── voices/ # Voice model files (downloaded on demand)
│ └── *.pt # Individual voice files
├── venv/ # Python virtual environment
├── outputs/ # Generated audio files directory
├── LICENSE # Apache 2.0 License file
├── README.md # Project documentation
├── models.py # Core TTS model implementation
├── gradio_interface.py # Web interface implementation
├── config.json # Model configuration file
├── requirements.txt # Python dependencies
└── tts_demo.py # CLI implementation
The project uses the latest Kokoro model from Hugging Face:
- Repository: hexgrad/Kokoro-82M
- Model file:
kokoro-v1_0.pth
(downloaded automatically) - Sample rate: 24kHz
- Voice files: Located in the
voices/
directory (downloaded automatically) - Available voices: 44 voices across multiple categories
- Languages: American English ('a'), British English ('b')
- Model size: 82M parameters
Common issues and solutions:
-
Model Download Issues
- Ensure stable internet connection
- Check Hugging Face is accessible
- Verify sufficient disk space
- Try clearing the
.cache/huggingface
directory
-
CUDA/GPU Issues
- Verify CUDA installation with
nvidia-smi
- Update GPU drivers
- Check PyTorch CUDA compatibility
- Fall back to CPU if needed
- Verify CUDA installation with
-
Audio Output Issues
- Check system audio settings
- Verify output directory permissions
- Install FFmpeg for MP3/AAC support
- Try different output formats
-
Voice File Issues
- Delete and let system redownload voice files
- Check
voices/
directory permissions - Verify voice file integrity
- Try using a different voice
Feel free to contribute by:
- Opening issues for bugs or feature requests
- Submitting pull requests with improvements
- Helping with documentation
- Testing different voices and reporting issues
- Suggesting new features or optimizations
- Testing on different platforms and reporting results
Apache 2.0 - See LICENSE file for details