Audio Transcriber

The audio-transcriber is a command-line tool designed to transcribe audio files using the advanced Whisper model. It offers flexible support for different GPU backends, allowing you to leverage hardware acceleration with Vulkan, CUDA, HIPBLAS, and Metal.

Introduction
Features
- Whisper Integration
- Multi-backend Support
  - Vulkan
  - CUDA
  - HIPBLAS
  - Metal
Prerequisites
- Rust Toolchain
- FFmpeg
Installation
- Using Cargo
- Supported Backends Installation
  - Vulkan
  - CUDA
  - HIPBLAS
  - Metal
Usage
Dependencies
- Crate Dependencies
Contributing
License

Introduction

The audio-transcriber is a command-line tool designed to transcribe audio files using the advanced Whisper model. It offers flexible support for different GPU backends, allowing you to leverage hardware acceleration with Vulkan, CUDA, HIPBLAS, and Metal.

Features

Whisper Integration: Utilizes the whisper-rs library for accurate transcription.
Multi-backend Support:
- Vulkan: Leverages GPU acceleration using the Vulkan API. Suitable for cross-platform applications and modern GPUs.
- CUDA: Optimized for NVIDIA GPUs with CUDA support. Ideal for high-performance computing on NVIDIA hardware.
- HIPBLAS: Utilizes AMD GPUs with HIPBLAS for high-performance linear algebra operations. Best suited for AMD GPU users.
- Metal: Supports Apple's Metal API for optimized performance on macOS systems.

Prerequisites

Before installing and running the audio-transcriber, ensure you have the following prerequisites:

Rust Toolchain:
- Ensure Rust is installed. You can download it from https://www.rust-lang.org/tools/install.
FFmpeg:
- The program checks for FFmpeg's presence and downloads it if missing. However, manual installation is possible on specific operating systems.
- Manual Installation:
  - Windows: Download FFmpeg from here and extract the binaries to a directory in your PATH.
  - macOS: Install via Homebrew:
```
brew install ffmpeg
```
  - Linux: Install via package manager, e.g., on Ubuntu:
```
sudo apt-get update && sudo apt-get install ffmpeg
```

Installation

Using Cargo

The recommended method to install the audio-transcriber is via Cargo, Rust's package manager.

Clone the Repository

git clone https://github.com/woutermans/audio-transcriber.git
cd audio-transcriber

Install Dependencies and Compile with Backend Features

To compile with specific backend support (e.g., Vulkan), enable the corresponding feature flag.
- For Vulkan
```
cargo build --release --features vulkan
```
- For CUDA
```
cargo build --release --features cuda
```
- For HIPBLAS
```
cargo build --release --features hipblas
```
- For Metal
```
cargo build --release --features metal
```
Using cargo install

Alternatively, you can install it globally using Cargo's install command with a specified backend:
```
cargo install audio-transcriber --features vulkan
```

Supported Backends Installation

The audio-transcriber supports the following GPU backends for enhanced performance. Enabling a feature will enable the corresponding backend during compilation.

Vulkan
- Utilizes GPU acceleration using the Vulkan API.
- Requires system libraries and drivers compatible with Vulkan.
CUDA
- Optimized for NVIDIA GPUs.
- Requires CUDA toolkit installation on your system.
HIPBLAS
- Leverages AMD GPUs with HIPBLAS support.
- Requires ROCm or other compatible GPU drivers.
Metal
- Supports Apple's Metal API for optimized performance on macOS systems.

Additional Notes:

Ensure that your system meets the hardware and software requirements for each backend. Refer to the documentation provided by NVIDIA, AMD, or Apple for installation guides specific to each GPU architecture.

Usage

Basic Transcription
```
cargo run --release --features vulkan /path/to/audio [model_path]
```
Replace /path/to/audio with the path to your audio file and optionally provide a custom model path (default is ggml-large-v3-turbo.bin).
Handling Multiple Backends

To switch between backends, enable the desired feature flag during compilation or use environment variables if supported.

Examples

Transcribe an audio file using CUDA:

cargo run --release --features cuda /path/to/audio

Transcribe an audio file using Metal on macOS:

cargo run --release --features metal /path/to/audio

Dependencies

The project relies on several crates for functionality:

hound: For reading WAV files.
whisper-rs: Integration with the Whisper model.
reqwest: Handles HTTP requests for downloading FFmpeg.
tempfile and zip: Manage temporary files and compress/decompress archives.
indicatif: Displays progress bars during transcription.

Contributing

Contributions are welcome! Please follow these steps:

Fork the Repository
Create a New Branch
Make Your Changes
Run Tests
```
cargo test
```
Submit a Pull Request

Ensure your code adheres to Rust's best practices and the project's coding standards.

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.

Contribution Workflow

Fork the Repository
Create a New Branch
Make Your Changes
Run Tests
Submit a Pull Request

License

This project is released under the Unlicense - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
src		src
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Table of Contents

Introduction

Features

Prerequisites

Installation

Using Cargo

Supported Backends Installation

Usage

Examples

Dependencies

Contributing

Code of Conduct

Contribution Workflow

License

About

Releases

Packages

Languages

License

woutermans/audio-transcriber

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Table of Contents

Introduction

Features

Prerequisites

Installation

Using Cargo

Supported Backends Installation

Usage

Examples

Dependencies

Contributing

Code of Conduct

Contribution Workflow

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages