This is the official implementation of MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension. This paper has been accepted by EMNLP Main 2024.
In this paper, we perform an in-depth exploration of parameterefficient transfer learning (PETL) methods for REC tasks. We introduce MaPPER aimed at improving both the effectiveness and efficiency of visual-text alignment, as well as enhancing visual perception by incorporating local visual semantics. We propose the novel Dynamic Prior Adapter (DyPA) and Local Convolution Adapter (LoCA). The former employs aligned prior to dynamically adjust the language encoder, while the latter introduces local visual features for enhancing the visual encoder. Extensive experiments demonstrate that our method can outperform the state-of-the-art (SOTA) methods in REC tasks, with only 1.41% tunable parameters within pre-trained backbones
-
Clone this repository.
git clone https://github.com/liuting20/MaPPER.git
-
Prepare for the running environment.
conda env create -f environment.yaml pip install -r requirements.txt
Please refer to GETTING_STARTED.md to learn how to prepare the datasets and pretrained checkpoints.
The models are available in [Gdrive]
RefCOCO | RefCOCO+ | RefCOCOg | ||||||
---|---|---|---|---|---|---|---|---|
val | testA | testB | val | testA | testB | g-val | u-val | u-test |
86.03 | 88.90 | 81.19 | 74.92 | 81.12 | 65.68 | 74.60 | 76.32 | 75.81 |
-
Training
bash train.sh
or
sbatch run.sh (if you have multiple nodes)
We recommend setting
--max_query_len
to40
for RefCOCOg, and--max_query_len
to20
for other datasets. We recommend setting--epochs
to180
(--lr_drop 120
acoordingly) for RefCOCO+, and--epochs 90
(--lr_drop 60
acoordingly) for other datasets. -
Evaluation
bash test.sh
This codebase is partially based on TransVG and DARA.
Please consider citing our paper in your publications, if our findings help your research.
@inproceedings{liu2024mapper,
title={MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension},
author={Liu, Ting and Xu, Zunnan and Hu, Yue and Shi, Liangtao and Wang, Zhiqiang and Yin, Quanjun},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages={4984--4994},
year={2024}
}
For any question about our paper or code, please contact Ting Liu.