This repository is part of Bix Tech "Semana de Dados" a.k.a Data Week. For a further explanation aboutwhat is RAG and this tutorial, watch the video below:
🎥 Assista ao tutorial no YouTube
A Question-Answering system built using LangChain and ChromaDB that allows users to query their documents using natural language. The system uses OpenAI's language models to provide context-aware answers based on the content of the indexed documents.
This system allows you to:
- Index text documents
- Create and manage a vector store using ChromaDB
- Interact with your documents through a chat interface
- Get answers with source references
- View and manage the document store
rag_system/
├── documents/ # Place your text files here
├── logs/ # System logs are stored here
├── db/ # Vector store database (created automatically)
├── core.py # Core RAG system implementation
├── interface.py # Interactive command-line interface
├── requirements.txt # Project dependencies
└── .env # Environment variables configuration
For Windows:
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
.venv\Scripts\activate
For Linux/Mac:
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
source .venv/bin/activate
# Install all required packages
pip install -r requirements.txt
Create a .env
file in the root directory with the following content:
OPENAI_API_KEY=your-api-key-here
MODEL_NAME=gpt-3.5-turbo
COLLECTION_NAME=my_documents
PERSIST_DIRECTORY=db
Replace your-api-key-here
with your actual OpenAI API key.
- Create a
documents
directory if it doesn't exist:
mkdir documents
- Place your text files (
.txt
) in thedocuments
directory. These are the documents that will be indexed and used for answering questions.
# Run the interactive interface
python interface.py
Once running, the system provides the following options:
- Index documents: Processes and indexes all text files in the
documents
directory - Check total number of documents: Shows how many documents are currently indexed
- Delete document store: Removes all indexed documents
- Start RAG chat: Begins an interactive Q&A session
- Exit: Closes the application
When in chat mode:
- Type your questions normally and press Enter
- Type 'sources' to see detailed source documents for the last answer
- Type 'quit', 'exit', or 'q' to return to the main menu
- Python 3.8 or higher
- OpenAI API key
- Sufficient disk space for document storage
- Internet connection for API access
Main libraries used:
- langchain
- langchain-openai
- langchain-community
- langchain-chroma
- chromadb
- python-dotenv
- rich
If you encounter any issues:
- Check the logs in the
logs
directory - Ensure your OpenAI API key is valid
- Verify that your documents are text files (.txt)
- Make sure all required directories exist
- Check your internet connection
- The system creates necessary directories automatically
- Logs are timestamped and stored in the
logs
directory - The vector store is persistent and stored in the
db
directory - All text files should be in UTF-8 or compatible encoding
For any other issues or questions, please refer to the logs or create an issue in the repository.