Skip to content

mxingzhang90/MSAT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSAT

This is the code for the paper "Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos". We appreciate the contribution of 2D-TAN.

Framework

alt text

Prerequisites

  • python 3
  • pytorch 1.6.0
  • torchvision 0.7.0
  • torchtext 0.7.0
  • easydict
  • terminaltables

Quick Start

Please download the visual features from box drive and save it to the data/ folder.

Training

Use the following commands for training:

# For ActivityNet Captions
python moment_localization/train.py --cfg experiments/activitynet/MSAT-32.yaml --verbose

# For TACoS
python moment_localization/train.py --cfg experiments/tacos/MSAT-128.yaml --verbose

Testing

Our trained model are provided in Baidu Yun(access code:rc2m). Please download them to the checkpoints folder.

Then, run the following commands for evaluation:

# For ActivityNet Captions
python moment_localization/test.py --cfg experiments/activitynet/MSAT-32.yaml --verbose --split test

# For TACoS
python moment_localization/test.py --cfg experiments/tacos/MSAT-128.yaml --verbose --split test

Citation

If any part of our paper and code is helpful to your work, please generously cite with:

@inproceedings{zhang2021multi,
  title={Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos},
  author={Zhang, Mingxing and Yang, Yang and Chen, Xinghan and Ji, Yanli and Xu, Xing and Li, Jingjing and Shen, Heng Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12669--12678},
  year={2021}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages