Kokoro TTS Local

A powerful, offline Text-to-Speech (TTS) solution based on the Kokoro-82M model, featuring 44 high-quality voices across multiple languages and accents. This local implementation provides fast, reliable text-to-speech conversion with support for multiple output formats (WAV, MP3, AAC) and real-time generation progress display.

🌟 Key Features

🎙️ 44 high-quality voices across American English, British English, and other languages
💻 Completely offline operation - no internet needed after initial setup
📚 Support for PDF and TXT file input
🎵 Multiple output formats (WAV, MP3, AAC)
⚡ Real-time generation with progress display
🎛️ Adjustable speech speed (0.5x to 2.0x)
📊 Automatic text chunking for optimal processing
🎯 Easy-to-use interactive CLI interface

Installing Prerequisites

Before installing Kokoro TTS Local, ensure you have the following prerequisites installed from the below guide:

Python 3.10.0 or higher
FFmpeg (for MP3/AAC conversion)
CUDA-compatible GPU (optional, for faster generation)
Git (for version control and package management)

Installing Git

Windows

Download Git installer:
```
winget install --id Git.Git -e --source winget
```
Alternatively, download from Git for Windows
Verify installation:
```
git --version
```

Linux

# Ubuntu/Debian
sudo apt update
sudo apt install git

# Fedora
sudo dnf install git

# Arch Linux
sudo pacman -S git

# Verify installation
git --version

macOS

# Using Homebrew
brew install git

# Verify installation
git --version

Installing FFmpeg

Windows

```
iex (irm ffmpeg.tc.ht)
```
Verify installation by opening a new Command Prompt:
```
ffmpeg -version
```

Linux

# Ubuntu/Debian
sudo apt update
sudo apt install ffmpeg

# Fedora
sudo dnf install ffmpeg

# Arch Linux
sudo pacman -S ffmpeg

macOS

# Using Homebrew
brew install ffmpeg

Installing CUDA Drivers

Windows

Check your GPU compatibility:
- Open Command Prompt and run: dxdiag
- Go to the "Display" tab
- Note your GPU model
Download CUDA Toolkit:
- Visit NVIDIA CUDA Downloads
- Select Windows and your version
- Choose "exe (network)" installer
- Download and run the installer
Installation steps:
- Run the downloaded installer
- Choose "Express Installation"
- Wait for the installation to complete
- Restart your computer
Verify installation:
```
nvidia-smi
nvcc --version
```

Linux

Check your GPU compatibility:
```
lspci | grep -i nvidia
```
Remove old NVIDIA drivers (if any):
```
sudo apt-get purge nvidia*
```

Add NVIDIA package repositories:

# Ubuntu 22.04 LTS
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.1-545.23.08-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/

Install CUDA drivers:

sudo apt-get update
sudo apt-get -y install cuda-drivers

Install CUDA Toolkit:
```
sudo apt-get install cuda
```

Add CUDA to PATH:

echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

Verify installation:
```
nvidia-smi
nvcc --version
```

Note: For macOS, CUDA is not supported natively. The model will run on CPU only.

Installation

Windows Installation

Install Python 3.10.0
- Download the installer from Python's official website
- During installation, check "Add Python to PATH"
- Verify installation: python --version
Install espeak-ng
- Download the latest release from espeak-ng releases
- Run the installer and follow the prompts
- Add espeak-ng to your system PATH if not done automatically

Clone the repository:

git clone https://github.com/solveditnpc/kokoro-tts-local.git
cd kokoro-tts-local

Create and activate a virtual environment:

python -m venv venv
.\venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Linux Installation

install espeak

# Install espeak-ng
sudo apt-get install espeak-ng

Install Dependencies

# Install system dependencies
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev \
liblzma-dev python-openssl git

Install pyenv

# Install pyenv
curl https://pyenv.run | bash

# Add to ~/.bashrc
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

# Reload shell
exec "$SHELL"

Install Python 3.10.0

# Install Python 3.10.0
pyenv install 3.10.0

# Clone repository
git clone https://github.com/solveditnpc/kokoro-tts-local.git
cd kokoro-tts-local

# Set local Python version
pyenv local 3.10.0

Create and activate virtual environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

macOS Installation

Install Dependencies

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install system dependencies
brew install openssl readline sqlite3 xz zlib tcl-tk git

# Install espeak-ng
brew install espeak-ng

Install pyenv

# Install pyenv
brew install pyenv

# Add to ~/.zshrc (or ~/.bashrc if using bash)
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc

# Reload shell
exec "$SHELL"

Install Python 3.10.0

# Install Python 3.10.0
pyenv install 3.10.0

# Clone repository
git clone https://github.com/solveditnpc/kokoro-tts-local.git
cd kokoro-tts-local

# Set local Python version
pyenv local 3.10.0

Create and activate virtual environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Available Voices

The system includes 44 different voices across various categories:

American English Voices

Female (af_*):
- af_alloy: Alloy - Clear and professional
- af_aoede: Aoede - Smooth and melodic
- af_bella: Bella - Warm and friendly
- af_jessica: Jessica - Natural and engaging
- af_kore: Kore - Bright and energetic
- af_nicole: Nicole - Professional and articulate
- af_nova: Nova - Modern and dynamic
- af_river: River - Soft and flowing
- af_sarah: Sarah - Casual and approachable
- af_sky: Sky - Light and airy
Male (am_*):
- am_adam: Adam - Strong and confident
- am_echo: Echo - Resonant and clear
- am_eric: Eric - Professional and authoritative
- am_fenrir: Fenrir - Deep and powerful
- am_liam: Liam - Friendly and conversational
- am_michael: Michael - Warm and trustworthy
- am_onyx: Onyx - Rich and sophisticated
- am_puck: Puck - Playful and energetic

British English Voices

Female (bf_*):
- bf_alice: Alice - Refined and elegant
- bf_emma: Emma - Warm and professional
- bf_isabella: Isabella - Sophisticated and clear
- bf_lily: Lily - Sweet and gentle
Male (bm_*):
- bm_daniel: Daniel - Polished and professional
- bm_fable: Fable - Storytelling and engaging
- bm_george: George - Classic British accent
- bm_lewis: Lewis - Modern British accent

Special Voices

French Female (ff_*):
- ff_siwis: Siwis - French accent
High-pitched Voices:
- Female (hf_*):
  - hf_alpha: Alpha - Higher female pitch
  - hf_beta: Beta - Alternative high female pitch
- Male (hm_*):
  - hm_omega: Omega - Higher male pitch
  - hm_psi: Psi - Alternative high male pitch

Project Structure

.
├── .cache/                 # Cache directory for downloaded models
│   └── huggingface/       # Hugging Face model cache
├── .git/                   # Git repository data
├── .gitignore             # Git ignore rules
├── __pycache__/           # Python cache files
├── voices/                # Voice model files (downloaded on demand)
│   └── *.pt              # Individual voice files
├── venv/                  # Python virtual environment
├── outputs/               # Generated audio files directory
├── LICENSE                # Apache 2.0 License file
├── README.md             # Project documentation
├── models.py             # Core TTS model implementation
├── gradio_interface.py   # Web interface implementation
├── config.json           # Model configuration file
├── requirements.txt      # Python dependencies
└── tts_demo.py          # CLI implementation

Model Information

The project uses the latest Kokoro model from Hugging Face:

Repository: hexgrad/Kokoro-82M
Model file: kokoro-v1_0.pth (downloaded automatically)
Sample rate: 24kHz
Voice files: Located in the voices/ directory (downloaded automatically)
Available voices: 44 voices across multiple categories
Languages: American English ('a'), British English ('b')
Model size: 82M parameters

Troubleshooting

Common issues and solutions:

Model Download Issues
- Ensure stable internet connection
- Check Hugging Face is accessible
- Verify sufficient disk space
- Try clearing the .cache/huggingface directory
CUDA/GPU Issues
- Verify CUDA installation with nvidia-smi
- Update GPU drivers
- Check PyTorch CUDA compatibility
- Fall back to CPU if needed
Audio Output Issues
- Check system audio settings
- Verify output directory permissions
- Install FFmpeg for MP3/AAC support
- Try different output formats
Voice File Issues
- Delete and let system redownload voice files
- Check voices/ directory permissions
- Verify voice file integrity
- Try using a different voice

Contributing

Feel free to contribute by:

Opening issues for bugs or feature requests
Submitting pull requests with improvements
Helping with documentation
Testing different voices and reporting issues
Suggesting new features or optimizations
Testing on different platforms and reporting results

License

Apache 2.0 - See LICENSE file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS Local

🌟 Key Features

Installing Prerequisites

Installing Git

Windows

Linux

macOS

Installing FFmpeg

Windows

Linux

macOS

Installing CUDA Drivers

Windows

Linux

Installation

Windows Installation

Linux Installation

macOS Installation

Available Voices

American English Voices

British English Voices

Special Voices

Project Structure

Model Information

Troubleshooting

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
input		input
outputs		outputs
LICENSE		LICENSE
README.md		README.md
audio_book.py		audio_book.py
config.json		config.json
models.py		models.py
requirements.txt		requirements.txt
tts_demo.py		tts_demo.py

License

solveditnpc/Kokoro-82M-audiobooks

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS Local

🌟 Key Features

Installing Prerequisites

Installing Git

Windows

Linux

macOS

Installing FFmpeg

Windows

Linux

macOS

Installing CUDA Drivers

Windows

Linux

Installation

Windows Installation

Linux Installation

macOS Installation

Available Voices

American English Voices

British English Voices

Special Voices

Project Structure

Model Information

Troubleshooting

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages