Multi-HT100M

Multilingual captions for the HowTo100M dataset

We provide the multilingual captions for the HowTo100M dataset in the following languages:

Language	code	link
Englsish	en	link
German	de	link
French	fr	link
Czech	cs	link
Swahili	sw	link
Russian	ru	link
Vietnamese	vi	link
Spanish	es	link
Chinese	zh	link

Format

The how2_[lang].json file contains the captions for the HowTo100M videos. It can be read into a python dictionary where video_id as the key. Each value of the dictionary is another dictionary with the keys ['text', 'start', 'end']. The value of 'text' is a list of all the captions from the given video_id, and 'start' and 'end' are arrays correspondings to the start and end time timestamp of the captions (in second).

HowTo100M videos

Please refer to here for the list of HowTo100M videos and the video meta data

VTT in 9 Languages

The translated VTT in 9 languages for evaluation is available here

Citation

@inproceedings{huang2021multilingual,
  title={Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models},
  author={Huang, Po-Yao and Patrick, Mandela and Hu, Junjie and Neubig, Graham and Metze, Florian and Hauptmann, Alexander G},
  booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages={2443--2459},
  year={2021},
  url = {https://arxiv.org/abs/2103.08849},
}

Contact

Please feel free to contact Bernie Huang ([email protected] or [email protected]) if you have any questions. Thanks for your interest!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
resource		resource
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-HT100M

Multilingual captions for the HowTo100M dataset

Format

HowTo100M videos

VTT in 9 Languages

Citation

Contact

About

Releases

Packages

berniebear/Multi-HT100M

Folders and files

Latest commit

History

Repository files navigation

Multi-HT100M

Multilingual captions for the HowTo100M dataset

Format

HowTo100M videos

VTT in 9 Languages

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages