Experimental Platform for Local AI Models

This project is an innovative platform designed to explore the potential of running locally hosted AI models for answering questions based on document embeddings. With support for Retrieval-Augmented Generation (RAG), it provides a hands-on environment to test various tools, frameworks, and techniques for working with document-based AI tasks.

A web-based interface built using Streamlit complements the command-line tool, offering an intuitive way to interact with the system.

Features

Local Model Hosting: Utilize locally hosted large language models for quick and secure processing.
Document Embedding: Convert PDFs and Markdown files into vector embeddings using Ollama.
Customizable Models: Supports multiple LLMs and embedding frameworks.
Interactive Web UI: Streamlit-powered interface for easy interaction with the system.
Extensible Design: Built to be a flexible testing ground for new ideas and tools.

Prerequisites

Ollama: Version 0.1.26 or higher for hosting and embedding models locally.
Python: Version 3.8 or later.
Pip: Installed for managing Python packages.

Setup Instructions

Step 1: Clone the Repository

Begin by cloning the project repository to your local machine:

git clone <repository_url>
cd <repository_directory>

Step 2: Create a Virtual Environment

Set up an isolated environment to manage dependencies:

python3 -m venv .venv
source .venv/bin/activate   # For Unix/MacOS
.\.venv\Scripts\activate  # For Windows

Step 3: Install Dependencies

Install the required Python packages:

pip install -r requirements.txt

Running the Application

Command-Line Interface

Activate the virtual environment:

source .venv/bin/activate   # For Unix/MacOS
.\.venv\Scripts\activate  # For Windows

Execute the main script:
```
python app.py -m <model_name> -p <path_to_documents>
```
- Model: Specify the LLM to use. Defaults to Mistral if not specified.
- Document Path: Specify the directory containing PDFs or Markdown files. Defaults to a sample Research folder in the repository if not provided.
- Embedding Model: Optionally specify an embedding model with -e <embedding_model_name>. Defaults to nomic-embed-text.
Once executed, the application processes documents, generates embeddings, and queries the collection to answer predefined or user-defined questions.

Example Command

python app.py -m mistral -p ./documents -e nomic-embed-text

Running the Web Interface

Activate the virtual environment:

source .venv/bin/activate   # For Unix/MacOS
.\.venv\Scripts\activate  # For Windows

Launch the Streamlit application:
```
streamlit run ui.py
```
This starts a local web server and opens the Streamlit UI in your default browser. Use the interface to:
- Select a model and embedding technique.
- Upload or select document directories.
- Query the system interactively.

System Workflow

Document Loading:
- PDFs and Markdown files are loaded from the specified directory.
- The system ensures that all relevant files are processed.
Embedding Creation:
- Embeddings are generated using the chosen embedding model.
- These embeddings are stored temporarily for querying.
Query Processing:
- User queries are matched against the embeddings to retrieve relevant information.
- Responses are generated using the selected LLM.
Result Presentation:
- Results are displayed either in the terminal (CLI) or the Streamlit interface (Web UI).

Key Technologies

LangChain: Framework for building applications powered by LLMs.
Ollama: For hosting and embedding AI models locally.
Chroma: Vector database for managing and querying embeddings.
PyPDF2: Library for parsing and processing PDF files.
Streamlit: Framework for building interactive web applications.

Future Work

Persistent Embeddings: Implement a mechanism to store embeddings between sessions.
Enhanced Model Support: Add compatibility with more LLMs and embedding techniques.
Scalability Improvements: Optimize the workflow for larger datasets and concurrent queries.
- Reloading Embeddings: Embeddings are reloaded each time the application starts. This is for simplicity in testing but can be optimized.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app-2.py		app-2.py
document_loader.py		document_loader.py
llm.py		llm.py
models.py		models.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
ui.py		ui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental Platform for Local AI Models

Features

Prerequisites

Setup Instructions

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Install Dependencies

Running the Application

Command-Line Interface

Example Command

Running the Web Interface

System Workflow

Key Technologies

Future Work

About

Releases

Packages

Languages

sneha1012/Doc-Query

Folders and files

Latest commit

History

Repository files navigation

Experimental Platform for Local AI Models

Features

Prerequisites

Setup Instructions

Step 1: Clone the Repository

Step 2: Create a Virtual Environment

Step 3: Install Dependencies

Running the Application

Command-Line Interface

Example Command

Running the Web Interface

System Workflow

Key Technologies

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages