The RAG QA Bot is a Streamlit-based application that allows users to upload documents, index their contents, and ask questions using Retrieval-Augmented Generation (RAG). It integrates the LlamaIndex library to connect large language models (LLMs) with document retrieval capabilities, enabling context-aware responses. The app uses Pinecone as the vector store for document embeddings and a Hugging Face LLM model for generating responses.
- Document Upload: Users can upload files, which are then processed and split into smaller chunks for indexing.
- Vector Store Initialization: The app integrates with Pinecone to store document embeddings and enables efficient document retrieval.
- Generative AI Responses: A Hugging Face LLM is used to generate responses based on user queries and relevant document segments retrieved ny LLamaIndex.
- Real-Time Q&A: Users can input questions, and the app retrieves relevant information from the documents and generates responses in real-time.
- Conversation History: The chat history is maintained during the session, allowing users to scroll through their interactions.
-
Python 3.8+
-
Packages:
streamlit
,transformers
,llama-index
,torch
,llama_index-embeddings-huggingface
,pinecone[grpc]
,llama_index-vector_stores-pinecone
,accelerate
,bitsandbytes
,llama_index-llms-huggingface
-
API Keys for Hugging Face and Pinecone:
- Hugging Face API Key for using the LLM
- Pinecone API Key for vector store management
Clone the application repository (or download the code):
git clone https://github.com/sultanasabiha/RAG
cd RAG
Install the required Python packages using pip
:
pip install -r requirements.txt
You need to set up the following environment variables for the app to work:
- Hugging Face API Key:
HF_TOKEN
- Pinecone API Key:
PC_API_KEY
You can add these to your google colab secrets and enable access to the notebook to readily set the environment.
Run the Streamlit app by executing the following command:
streamlit run app.py
If you want to share the app with others remotely, you can expose the local server using LocalTunnel. First, ensure LocalTunnel is installed globally:
npm install -g localtunnel
Then run the app and start LocalTunnel:
streamlit run app.py
npx localtunnel --port 8501
You will get a public URL that can be shared with others to access the app remotely.
- The app starts by configuring the page with a header and options for users to clear the conversation.
- API keys are loaded from the environment for Hugging Face and Pinecone.
- Users can upload a document (e.g., a PDF or text file) via the file uploader.
- The uploaded file is saved in the server, and its content is read and chunked into smaller sections using
SimpleDirectoryReader
.
- Once a file is uploaded, a Pinecone vector store is initialized.
- The app checks if a vector index with the given name exists in Pinecone. If not, a new index is created.
- Document embeddings are generated using Hugging Face’s embeddings model and stored in Pinecone.
- A Hugging Face model (
mistralai/Mistral-7B-Instruct
) is used for generating responses. - The model is loaded in quantized mode (4-bit) to save memory, making it efficient for GPU use.
- Users can input a question via the chat interface.
- The app retrieves relevant document segments from Pinecone and displays them in the sidebar.
- Using the selected LLM, the app generates a response, streaming it back to the user in real-time.
- Both the query and the generated response are stored in the session state for conversation history.
- Users can clear the chat history using a button in the sidebar, resetting the stored messages.
- Upload a Document:
- Select a file to upload, and the app will process and index it.
- Ask a Question:
- Type a question in the input box, and the app will retrieve relevant information from the document and provide an AI-generated answer.
- Review Document Segments:
- The app will display the relevant document sections used to answer your question in the sidebar.
- GPU Support: The app is optimized for environments with GPU support to enable efficient model loading and inference.
- API Keys: Ensure that valid Hugging Face and Pinecone API keys are set up.
- This error typically occurs when the local server is not responding correctly through LocalTunnel, leading to a communication issue.
- Refresh the page: The issue may occur intermittently due to LocalTunnel connection issues. A simple page refresh might resolve it.
- Restart LocalTunnel: Close and reopen the LocalTunnel session to reset the connection.
- Verify that the environment variables for
HF_TOKEN
andPC_API_KEY
are correctly set.
- Make sure that the document is successfully uploaded and indexed.
- Verify that the Pinecone service is running and accessible.