Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos
accepted by EMNLP 2022 as an oral presentation
Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu and Jiebo Luo.
Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text. While previous work focuses on building systems on well-aligned video-text pairs, we train our model only on noisy YouTube videos without finetuning on benchmark data and achieved stronger performances across three benchmarks.
- [Oct, 2022] Talk invited by UM-IoS EMNLP 2022 Workshop 😄
- [Oct, 2022] Our paper has been accepted by EMNLP 2022 (Oral). ✨
We provide Docker image for easier reproduction. Please install the following:
- nvidia driver (418+),
- Docker (19.03+),
- nvidia-container-toolkit.
We only support Linux with NVIDIA GPUs. We test on Ubuntu 18.04 and V100 cards.
CUDA_VISIBLE_DEVICES=0,1 source launch_container.sh $PATH_TO_STORAGE/data $PATH_TO_STORAGE/checkpoints $PATH_TO_STORAGE/log
The launch script respects $CUDA_VISIBLE_DEVICES environment variable.
Note that the source code is mounted into the container under /src
instead
of built into the image so that user modification will be reflected without
re-building the image.
Please download the preprocessed data from here to data
,
and here to .cache
.
[Optional] You can also preprocess data from raw captions. Details are described here.
Run the following commands for training:
sh scripts/train.sh
Our trained model are provided here. Please download them to checkpoints
.
Then, run the following commands for evaluation:
sh scripts/test.sh
We preprocess subtitles with the following scripts:
python tools/preprocess_captions.py
python tools/compute_gold_trees.py
python tools/generate_vocabularies.py
If this project is useful for you, please consider citing our paper 📣
@inproceedings{zhang2022training,
title={Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos},
author={Zhang, Songyang and Song, Linfeng and Jin, Lifeng and Mi, Haitao and Xu, Kun and Yu, Dong and Luo, Jiebo},
booktitle={EMNLP},
year={2022}
This repo is developed based on VPCFG, MMC-PCFG and Punctuator2.