This repository contains the code for a Content-Based Image Retrieval (CBIR) system, as described in the research paper "Content-Based Image Retrieval System Utilizing Vision Transformer" by Aniket Mahajan, Nirmiti Rane, and Pranav Patil.
The CBIR system presented here allows users to search for images using natural language queries. It combines several key techniques:
- Vision Transformer (ViT): A pre-trained ViT (specifically
google/vit-base-patch16-224
from Hugging Face Transformers) is used for feature extraction, capturing fine-grained image details. A quadtree-based approach divides images into quadrants for more granular feature extraction. - Natural Language Processing (NLP): NLP is used extensively for:
- Query processing: Tokenization, stemming, stop word removal, and negation handling are performed using libraries like
spaCy
andNLTK
. - Synonym generation: WordNet is used to expand the query with synonyms, improving recall.
- Textual feedback processing: User-provided sentences are analyzed using TF-IDF to extract keywords and update image feature probabilities.
- Query processing: Tokenization, stemming, stop word removal, and negation handling are performed using libraries like
- Vector Space Model (VSM): A VSM represents images and queries as vectors in a high-dimensional space. Cosine similarity is used to rank images based on their relevance to the query.
- User Relevance Feedback:
- Binary Feedback: Users can provide "like" or "dislike" feedback on retrieved images. This directly adjusts the probabilities of associated image features.
- Textual Feedback: Users can input sentences describing images. These sentences are processed using TF-IDF, and the extracted keywords are added as new features with their TF-IDF scores as probabilities. This allows the system to learn from user input and improve over time.
- Database Integration (SQLite): The system uses SQLite to store image features, metadata, user feedback, and probabilities. The
SQLmethods
directory contains the database interaction logic.
-
Clone the repository:
git clone <repository_url> cd <repository_name>
-
Create and activate a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Linux/macOS venv\Scripts\activate # On Windows
-
Install backend dependencies:
pip install -r backend/requirements.txt
-
Add your model path in
backend/system/methods/getFeaturesCNN
-
Download NLTK data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"
-
Run the backend server:
cd backend/system python main.py
The backend server will run on
http://localhost:5000
.
-
Navigate to the frontend directory:
cd frontend/CBIR_frontend
-
Install frontend dependencies:
npm install
-
Run the frontend development server:
npm start
The frontend will be accessible at
http://localhost:3000
(or a different port if 3000 is in use).
- Upload Images: Use the drag-and-drop area or click to upload images. The backend will process the images, extract features, and store them in the database.
- Search Images: Enter a natural language query in the search bar and click "Search." The system will process the query, retrieve relevant images, and display them in the grid.
- Provide Feedback:
- Binary Feedback: Click the thumbs-up (like) or thumbs-down (dislike) button below each image to indicate its relevance to the query.
- Textual Feedback: Enter a sentence describing an image in the text input field below the image and click "Submit."
- View All Images: Click the "Home" button (house icon) in the header to view all uploaded images.
- Switch Theme: Click the theme toggle button in the header to switch between light and dark mode.
This project implements the system described in the following research paper:
Aniket Mahajan, Nirmiti Rane, Pranav Patil. "Content-Based Image Retrieval System Utilizing Vision Transformer".