Skip to content

Optimizing the Differentiable Search Index (DSI) with data augmentation (Num2Word, Stopwords Removal, POS-MLM) and parameter-efficient fine-tuning (LoRA, QLoRA, AdaLoRA, ConvoLoRA), improving retrieval accuracy and efficiency while reducing memory and computational overhead. Evaluated on the MS MARCO dataset for scalable performance.

License

Notifications You must be signed in to change notification settings

alessioborgi/NSIO_NeuralSearchIndexingOptimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 NSIO: Neural Search Indexing Optimization: Integrating Augmentation and PEFT for Efficient Retrieval

Copyright © 2025 Alessio Borgi, Eugenio Bugli, Damiano Imola

📌 Overview

This repository provides an optimized implementation of Differentiable Search Indexing (DSI), integrating data augmentation techniques and parameter-efficient fine-tuning (PEFT) methods to improve retrieval accuracy and computational efficiency.

Screenshot

✅ Contributions

Data Augmentation Techniques:

  • Num2Word Transformation: Converts numerical values into their word equivalents.
  • Stopword Removal: Eliminates redundant tokens to improve semantic representation.
  • POS-MLM Augmentation: Combines part-of-speech tagging with masked language modeling to enhance context understanding.
Screenshot

Parameter-Efficient Fine-Tuning (PEFT) Methods:

  • LoRA: Low-Rank Adaptation for lightweight fine-tuning.
  • QLoRA: 4-bit quantization with LoRA for memory efficiency.
  • AdaLoRA: Adaptive LoRA that dynamically adjusts rank.
  • ConvLoRA (LoCon): LoRA extended with depthwise convolution for local feature modeling.

🚀 Our approach enables faster fine-tuning, reduced memory consumption, and enhanced retrieval performance for large-scale search indexing tasks.

About

Optimizing the Differentiable Search Index (DSI) with data augmentation (Num2Word, Stopwords Removal, POS-MLM) and parameter-efficient fine-tuning (LoRA, QLoRA, AdaLoRA, ConvoLoRA), improving retrieval accuracy and efficiency while reducing memory and computational overhead. Evaluated on the MS MARCO dataset for scalable performance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •