Skip to content

Chat-With-Video is an application designed to allow users to ask questions about YouTube videos without having to watch them in full.

License

Notifications You must be signed in to change notification settings

slavaheroes/chat-with-video

Repository files navigation

Chat-With-Video

Table of Contents

  1. Problem Statement
  2. System Architecture
  3. Reproducibility
  4. Evaluation of RAG Flow
  5. Demo
  6. Future Improvements

Problem Statement

Many of us often encounter long, informative YouTube videos that we'd like to explore, but lack the time to watch in their entirety. This common dilemma inspired the creation of Chat-With-Video.

Chat-With-Video is an application designed to allow users to ask questions about YouTube videos without having to watch them in full. By leveraging advanced Large Language Models and retrieval techniques, our system provides quick, relevant answers based on video transcripts.

Key Features

The primary features of our app are:

  • Real-time question answering about YouTube video content
  • Efficient retrieval of relevant information from video transcripts
  • High-quality responses generated by a state-of-the-art language model
  • Continuous evaluation and improvement of system performance
  • User feedback collection for ongoing refinement
  • Comprehensive monitoring and analytics

System Architecture

Our system utilizes a Retrieval-Augmented Generation (RAG) approach, combining efficient information retrieval with powerful language models. The key components include:

  1. TF-IDF Retriever: For fast and efficient retrieval of relevant transcript segments
  2. GPT-4o-mini Language Model: To generate accurate and contextually appropriate responses
  3. Streamlit Frontend: For an intuitive user interface
  4. PostgreSQL Database: To store conversation history and evaluation metrics
  5. Airflow: For scheduling and running evaluation tasks
  6. Grafana: For monitoring system performance and user interactions

Reproducibility

To set up and run the Chat-With-Video system, follow these steps:

  1. Clone the repository and navigate to the project directory.

    • Change env.env filename to .env and update API variables.
  2. Set up the repo:

    make setup
  3. Download the required CSV files from Drive and place them in the data folder.

  4. Start the Docker containers:

    make compose-up # or make compose-up-force
  5. To simulate user traffic:

    make attach-app
    python src/simulate_traffic.py
  6. Access the monitoring dashboard:

  7. View and manage Airflow DAGs:

  8. Manual Evaluation:

    • You can manually trigger the evaluation DAG in Airflow.
    • This process retrieves unevaluated conversations from the database, uses the LLM to assess answer and context relevance, and inserts the results into the evaluation table.

Evaluation of RAG Flow

We conducted extensive experiments to optimize our RAG (Retrieval-Augmented Generation) flow. Our evaluation process covered various aspects of the system, including:

  1. Retrieval method performance
  2. Language model selection
  3. Context window size optimization

For detailed information about our evaluation process, results, and conclusions, please refer to ./evaluation/README.md.

Key findings:

  • TF-IDF was selected as our retrieval method due to its balance of accuracy and efficiency.
  • GPT-4o-mini outperformed other tested language models in our specific use case.
  • Providing the top 3 retrieved documents as context yielded the best results.

Demo

For visual demonstrations of Chat-With-Video in action, please see our DEMO.md file. This includes screenshots of:

  • The main chat interface
  • Example conversations
  • Grafana monitoring dashboard
  • Airflow DAG view

These screenshots provide a quick overview of the system's functionality and user interface.

Future Improvements

While Chat-With-Video currently performs well, there are several areas for enhancement:

  1. Implement more advanced retrieval methods that maintain TF-IDF's efficiency while improving accuracy.
    • Chunking techniques and multi-language transcript format.
    • Query re-writing.
  2. Make the code scalable to handle multiple videos and concurrent users
  3. Explore cloud deployment options and develop a Telegram bot interface. This will make the service more accessible to users across different platforms and devices.

Claude Sonnet 3.5 was used to generate this README.md file

This app was created as a course-project of LLM Zoomcamp course organized by DataTalksClub.

About

Chat-With-Video is an application designed to allow users to ask questions about YouTube videos without having to watch them in full.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages