Skip to content

Gemini AI Audio Transcription python Module

ي edited this page Jan 28, 2025 · 1 revision

Overview

The gemini_audio_text.py module is designed to transcribe audio files using Google's Gemini Pro model. It includes functionality to load environment variables, configure the Google API, and handle audio transcription.

Functions

1. load_environment()

Description: Loads environment variables from a .env file.

def load_environment():
    load_dotenv()
    logger.info("Environment variables loaded successfully.")

2. configure_google_api()

Description: Configures the Google Gemini API for audio transcription. Raises: ValueError if the GEMINI_API_KEY environment variable is not set.

def configure_google_api():
    api_key = os.getenv("GEMINI_API_KEY")
    if not api_key:
        error_message = "Google API key not found. Please set the GEMINI_API_KEY environment variable."
        logger.error(error_message)
        raise ValueError(error_message)
    
    genai.configure(api_key=api_key)
    logger.info("Google Gemini API configured successfully.")

3. transcribe_audio(audio_file_path)

Description: Transcribes audio using Google's Gemini Pro model. Args:

  • audio_file_path (str): The path to the audio file to be transcribed. Returns:
  • str: The transcribed text from the audio. Returns None if transcription fails. Raises:
  • FileNotFoundError if the audio file is not found.
def transcribe_audio(audio_file_path):
    try:
        load_environment()
        configure_google_api()

        logger.info(f"Attempting to transcribe audio file: {audio_file_path}")

        if not os.path.exists(audio_file_path):
            error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
            logger.error(error_message)
            raise FileNotFoundError(error_message)

        model = genai.GenerativeModel(model_name="gemini-1.5-flash")

        try:
            audio_file = genai.upload_file(audio_file_path)
            logger.info(f"Audio file uploaded successfully: {audio_file=}")
        except FileNotFoundError:
            error_message = f"FileNotFoundError: The audio file at {audio_file_path} does not exist."
            logger.error(error_message)
            raise FileNotFoundError(error_message) 
        except Exception as e:
            logger.error(f"Error uploading audio file: {e}")
            return None

        try:
            response = model.generate_content([
                "Transcribe the following audio:",
                audio_file
            ])

            if response and hasattr(response, 'text'):
                transcript = response.text
                logger.info(f"Transcription successful:\n{transcript}")
                return transcript
            else:
                logger.warning("Transcription failed: Invalid or empty response from API.")
                return None

        except Exception as e:
            logger.error(f"Error during transcription: {e}")
            return None

    except Exception as e:
        logger.error(f"An unexpected error occurred: {e}")
        return None

Usage

  1. Ensure you have a .env file with the following environment variables:
    • GEMINI_API_KEY: Your Google API key.
  2. Call the transcribe_audio function with the path to your audio file:
    transcript = transcribe_audio("path/to/your/audio/file.wav")

Dependencies

  • os
  • sys
  • google.generativeai
  • dotenv
  • loguru

Logging

The module uses the loguru library for logging to the console with colorized and formatted messages.