Skip to content

woutermans/audio-transcriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Transcriber

License

The audio-transcriber is a command-line tool designed to transcribe audio files using the advanced Whisper model. It offers flexible support for different GPU backends, allowing you to leverage hardware acceleration with Vulkan, CUDA, HIPBLAS, and Metal.

Table of Contents

Introduction

The audio-transcriber is a command-line tool designed to transcribe audio files using the advanced Whisper model. It offers flexible support for different GPU backends, allowing you to leverage hardware acceleration with Vulkan, CUDA, HIPBLAS, and Metal.

Features

  • Whisper Integration: Utilizes the whisper-rs library for accurate transcription.
  • Multi-backend Support:
    • Vulkan: Leverages GPU acceleration using the Vulkan API. Suitable for cross-platform applications and modern GPUs.
    • CUDA: Optimized for NVIDIA GPUs with CUDA support. Ideal for high-performance computing on NVIDIA hardware.
    • HIPBLAS: Utilizes AMD GPUs with HIPBLAS for high-performance linear algebra operations. Best suited for AMD GPU users.
    • Metal: Supports Apple's Metal API for optimized performance on macOS systems.

Prerequisites

Before installing and running the audio-transcriber, ensure you have the following prerequisites:

  • Rust Toolchain:

  • FFmpeg:

    • The program checks for FFmpeg's presence and downloads it if missing. However, manual installation is possible on specific operating systems.
    • Manual Installation:
      • Windows: Download FFmpeg from here and extract the binaries to a directory in your PATH.
      • macOS: Install via Homebrew:
        brew install ffmpeg
      • Linux: Install via package manager, e.g., on Ubuntu:
        sudo apt-get update && sudo apt-get install ffmpeg

Installation

Using Cargo

The recommended method to install the audio-transcriber is via Cargo, Rust's package manager.

  1. Clone the Repository

    git clone https://github.com/woutermans/audio-transcriber.git
    cd audio-transcriber
  2. Install Dependencies and Compile with Backend Features

    To compile with specific backend support (e.g., Vulkan), enable the corresponding feature flag.

    • For Vulkan

      cargo build --release --features vulkan
    • For CUDA

      cargo build --release --features cuda
    • For HIPBLAS

      cargo build --release --features hipblas
    • For Metal

      cargo build --release --features metal
  3. Using cargo install

    Alternatively, you can install it globally using Cargo's install command with a specified backend:

    cargo install audio-transcriber --features vulkan

Supported Backends Installation

The audio-transcriber supports the following GPU backends for enhanced performance. Enabling a feature will enable the corresponding backend during compilation.

  • Vulkan

    • Utilizes GPU acceleration using the Vulkan API.
    • Requires system libraries and drivers compatible with Vulkan.
  • CUDA

    • Optimized for NVIDIA GPUs.
    • Requires CUDA toolkit installation on your system.
  • HIPBLAS

    • Leverages AMD GPUs with HIPBLAS support.
    • Requires ROCm or other compatible GPU drivers.
  • Metal

    • Supports Apple's Metal API for optimized performance on macOS systems.

Additional Notes:

Ensure that your system meets the hardware and software requirements for each backend. Refer to the documentation provided by NVIDIA, AMD, or Apple for installation guides specific to each GPU architecture.

Usage

  1. Basic Transcription

    cargo run --release --features vulkan /path/to/audio [model_path]

    Replace /path/to/audio with the path to your audio file and optionally provide a custom model path (default is ggml-large-v3-turbo.bin).

  2. Handling Multiple Backends

    To switch between backends, enable the desired feature flag during compilation or use environment variables if supported.

Examples

  • Transcribe an audio file using CUDA:

    cargo run --release --features cuda /path/to/audio
  • Transcribe an audio file using Metal on macOS:

    cargo run --release --features metal /path/to/audio

Dependencies

The project relies on several crates for functionality:

  • hound: For reading WAV files.
  • whisper-rs: Integration with the Whisper model.
  • reqwest: Handles HTTP requests for downloading FFmpeg.
  • tempfile and zip: Manage temporary files and compress/decompress archives.
  • indicatif: Displays progress bars during transcription.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the Repository
  2. Create a New Branch
  3. Make Your Changes
  4. Run Tests
    cargo test
  5. Submit a Pull Request

Ensure your code adheres to Rust's best practices and the project's coding standards.

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code.

Contribution Workflow

  1. Fork the Repository
  2. Create a New Branch
  3. Make Your Changes
  4. Run Tests
  5. Submit a Pull Request

License

This project is released under the Unlicense - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages