Skip to content

haodongze/SVG

Repository files navigation

SEMANTIC-VISUAL GRAPH REASONING FOR VISUAL DIALOG [ICME'24, Oral]

By Dongze Hao, Qunbo Wang and Jing Liu

Introduction

This is the official implementation of the paper. In this paper, we propose a Semantic-Visual Graph Reasoning framework (SVG) for VisDial. Specifically, we first construct a semantic graph to capture the semantic rela- tionships between different entities in the current question and the dialog history. Secondly, we construct a semantics-aware visual graph to capture high-level visual semantics including key objects of the image and their visual relationships. Exten- sive experimental results on the VisDial v0.9 and v1.0 show that our method has shown superior performance compared to the state-of-the-art models across most evaluation metrics.

Architecture

Setup and Dependencies

conda create -n svg python=3.8
conda activate svg
conda conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install tqdm pyyaml nltk setproctitle

Getting Started

  1. Download the data
  • Download the VisDial v0.9 and v1.0 dialog json files from here and keep it under $PROJECT_ROOT/data/v0.9 and $PROJECT_ROOT/data/v1.0 directory, respectively.
  • batra-mlp-lab provides the word counts for VisDial v1.0 train split visdial_1.0_word_counts_train.json. They are used to build the vocabulary. Keep it under $PROJECT_ROOT/data/v1.0 directory.
  • batra-mlp-lab provides Faster-RCNN image features pre-trained on Visual Genome. Keep it under $PROJECT_ROOT/data/visdial_1.0_img and set argument img_feature_type to faster_rcnn_x101 in config/hparams.py file.
  • gicheonkang provides pre-extracted Faster-RCNN image features, which contain bounding boxes information. Set argument img_feature_type to dan_faster_rcnn_x101 in config/hparams.py file.
  1. Preprocess the data
  • Download the GloVe pretrained word vectors from here, and keep glove.6B.300d.txt under $PROJECT_ROOT/data/word_embeddings/glove directory. Run
    python data/preprocess/init_glove.py
  • Preprocesse textual inputs
    python data/data_utils.py
  1. Train the model
    python main.py --model svg --version 1.0
  2. Evaluate the model
     python main.py --model svg --evaluate /path/to/checkpoint.pth --eval_split val --version 1.0

Acknowledgements

This code is reimplemented as a fork of batra-mlp-lab/visdial-challenge-starter-pytorch and yuleiniu/rva.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages