Chat with the Document
GenAI-Chatbot is a Streamlit-based application that allows users to upload PDF and PNG files, process the text content, and interact with a generative AI model to ask questions about the uploaded documents.
- Upload PDF and PNG files: Users can upload multiple PDF and PNG files.
- Text Extraction: Extracts text from PDF files and images using PyPDF2 and pytesseract.
- Text Chunking: Splits extracted text into manageable chunks using
RecursiveCharacterTextSplitter
. - Vector Store: Stores text chunks as vectors using
FAISS
andGoogleGenerativeAIEmbeddings
. - Conversational AI: Uses
ChatGoogleGenerativeAI
to answer questions based on the uploaded documents. - Clear Chat: Allows users to clear the chat history and reset the context.
- Streamlit
- PyPDF2
- pytesseract
- FAISS
- Google Generative AI
langchain_google_genai
langchain
Pillow
python-dotenv
pip install virtualenv
python -m venv env
In CMD
.\env\Scripts\activate.bat
-
Install Dependencies:
pip install -r requirements.txt
Using pytesseract
- Ubuntu
sudo-apt install tesseract-ocr
pytesseract.pytesseract.tesseract_cmd = r'/usr/bin/tesseract'
- Windows
- Download
- Set Environment Variable → System Variable → Path →
C:\Program Files\Tesseract-OCR
-
Set up Google API Key:
- You can get Google Api key from Here
Get API key -> Generative Language Client -> Create API key in existing project. - Obtain a Google API key and set it in the
.env
file.
GOOGLE_API_KEY=your_api_key_here
- You can get Google Api key from Here
-
Run the Application:
streamlit run app.py