MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

This is the official implementation of MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension. This paper has been accepted by EMNLP Main 2024.

✨ Overview

In this paper, we perform an in-depth exploration of parameterefficient transfer learning (PETL) methods for REC tasks. We introduce MaPPER aimed at improving both the effectiveness and efficiency of visual-text alignment, as well as enhancing visual perception by incorporating local visual semantics. We propose the novel Dynamic Prior Adapter (DyPA) and Local Convolution Adapter (LoCA). The former employs aligned prior to dynamically adjust the language encoder, while the latter introduces local visual features for enhancing the visual encoder. Extensive experiments demonstrate that our method can outperform the state-of-the-art (SOTA) methods in REC tasks, with only 1.41% tunable parameters within pre-trained backbones

👉 Installation

Clone this repository.

git clone https://github.com/liuting20/MaPPER.git

Prepare for the running environment.

 conda env create -f environment.yaml      
 pip install -r requirements.txt

👉 Getting Started

Please refer to GETTING_STARTED.md to learn how to prepare the datasets and pretrained checkpoints.

👉Model Zoo

The models are available in [Gdrive]

RefCOCO			RefCOCO+			RefCOCOg
val	testA	testB	val	testA	testB	g-val	u-val	u-test
86.03	88.90	81.19	74.92	81.12	65.68	74.60	76.32	75.81

👉 Training and Evaluation

Training
```
bash train.sh
```
or
```
sbatch run.sh (if you have multiple nodes)
```
We recommend setting --max_query_len to 40 for RefCOCOg, and --max_query_len to 20 for other datasets. We recommend setting --epochs to 180 (--lr_drop 120 acoordingly) for RefCOCO+, and --epochs 90 (--lr_drop 60 acoordingly) for other datasets.
Evaluation
```
bash test.sh
```

👍 Acknowledge

This codebase is partially based on TransVG and DARA.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@inproceedings{liu2024mapper,
  title={MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension},
  author={Liu, Ting and Xu, Zunnan and Hu, Yue and Shi, Liangtao and Wang, Zhiqiang and Yin, Quanjun},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages={4984--4994},
  year={2024}
}

📧 Contact

For any question about our paper or code, please contact Ting Liu.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
datasets		datasets
ln_data		ln_data
models		models
utils		utils
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
data-visual.py		data-visual.py
engine.py		engine.py
environment.yml		environment.yml
eval.py		eval.py
overview.png		overview.png
requirements.txt		requirements.txt
run.sh		run.sh
slurm-train.sh		slurm-train.sh
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

✨ Overview

👉 Installation

👉 Getting Started

👉Model Zoo

👉 Training and Evaluation

👍 Acknowledge

📌 Citation

📧 Contact

About

Releases

Packages

Contributors 2

Languages

liuting20/MaPPER

Folders and files

Latest commit

History

Repository files navigation

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

✨ Overview

👉 Installation

👉 Getting Started

👉Model Zoo

👉 Training and Evaluation

👍 Acknowledge

📌 Citation

📧 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages