Skip to content

[EMNLP 2024 Main] MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

Notifications You must be signed in to change notification settings

liuting20/MaPPER

Repository files navigation

MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

This is the official implementation of MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension. This paper has been accepted by EMNLP Main 2024.

✨ Overview

In this paper, we perform an in-depth exploration of parameterefficient transfer learning (PETL) methods for REC tasks. We introduce MaPPER aimed at improving both the effectiveness and efficiency of visual-text alignment, as well as enhancing visual perception by incorporating local visual semantics. We propose the novel Dynamic Prior Adapter (DyPA) and Local Convolution Adapter (LoCA). The former employs aligned prior to dynamically adjust the language encoder, while the latter introduces local visual features for enhancing the visual encoder. Extensive experiments demonstrate that our method can outperform the state-of-the-art (SOTA) methods in REC tasks, with only 1.41% tunable parameters within pre-trained backbones

👉 Installation

  1. Clone this repository.

    git clone https://github.com/liuting20/MaPPER.git
    
  2. Prepare for the running environment.

     conda env create -f environment.yaml      
     pip install -r requirements.txt
    

👉 Getting Started

Please refer to GETTING_STARTED.md to learn how to prepare the datasets and pretrained checkpoints.

👉Model Zoo

The models are available in [Gdrive]

        RefCOCO         RefCOCO+         RefCOCOg
val testA testB val testA testB g-val u-val u-test
86.03 88.90 81.19 74.92 81.12 65.68 74.60 76.32 75.81

👉 Training and Evaluation

  1. Training

    bash train.sh
    

    or

    sbatch run.sh (if you have multiple nodes)
    

    We recommend setting --max_query_len to 40 for RefCOCOg, and --max_query_len to 20 for other datasets. We recommend setting --epochs to 180 (--lr_drop 120 acoordingly) for RefCOCO+, and --epochs 90 (--lr_drop 60 acoordingly) for other datasets.

  2. Evaluation

    bash test.sh
    

👍 Acknowledge

This codebase is partially based on TransVG and DARA.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@inproceedings{liu2024mapper,
  title={MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension},
  author={Liu, Ting and Xu, Zunnan and Hu, Yue and Shi, Liangtao and Wang, Zhiqiang and Yin, Quanjun},
  booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  pages={4984--4994},
  year={2024}
}

📧 Contact

For any question about our paper or code, please contact Ting Liu.

About

[EMNLP 2024 Main] MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published