en-vi: add IWSLT'15 English-Vietnamese as new problem #611

stefan-it · 2018-02-23T12:37:59Z

Hi,

with this PR a new problem is added: English-Vietnamese using the IWSLT'15 dataset from the Stanford NLP group.

I trained an English to Vietnamese model for 125k steps on a NVIDIA GTX 1060. Here are some nice comparisons of BLEU score on the tst2013 test set. Other BLEU scores are taken from the Towards Neural Phrase-based Machine Translation paper.

Model	BLEU (Beam Search)
Luong & Manning (2015)	23.30
Sequence-to-sequence model with attention	26.10
Neural Phrase-based Machine Translation Huang et. al. (2017)	27.69
Neural Phrase-based Machine Translation + LM Huang et. al. (2017)	28.07
Transformer (Base)	28.12 (cased)
Transformer (Base)	28.97 (uncased)

lukaszkaiser

Wonderful, thanks!

duyvuleo · 2018-04-09T09:15:43Z

Hi, I trained Transformer base for this dataset with 500K steps, evaluated with t2t-bleu and got BLEU scores around 28.47 (cased) and 29.32 (uncased). But when i evaluate this result with multi-bleu.pl script from moses, I got only 27.69 (you can see my training/evaluation signatures from here). That's weird!

It seems that t2t-bleu tends to provide higher score.

martinpopel · 2018-04-09T09:32:53Z

With multi-bleu.pl you need to tokenize the hypothesis and reference yourself first (and possibly normalize unicode punctuation) - depending on the way you do it you can get big difference (e.g. +- 5 BLEU). Thus multi-bleu.pl is not replicable in general.
It seems you have forgotten to do any tokenization.
The advantage of t2t-bleu and sacrebleu is that you don't need care about tokenization - it is integrated inside.

duyvuleo · 2018-04-09T09:50:54Z

Ah, I forgot to mention. The decoded output from tensor2tensor for this dataset is already tokenized (look at this). Also, the provided reference is also already tokenized. So, the evaluation with multi-bleu.pl is fair.

anhtuanvn · 2019-03-02T10:30:02Z

Hi Stefan,

Thank you very much for sharing your work.

I tried to reproduce the reported results with your pretrained model. Nevertheless, I got the error below:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key transformer/symbol_modality_20428_512/shared/weights_0 not found in checkpoint
[[node save/RestoreV2_1 (defined at /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py:627) ]]"

Please help me to check the log file Log.txt and give me a piece of advice on how to solve this issue.

Thank you.

stefan-it · 2019-03-19T09:48:46Z

@ankushagarwal Sorry for the long response time! Could you solve the problem? What version of T2T are you using for your experiments?

en-vi: add IWSLT'15 English-Vietnamese as new problem

35db1a5

lukaszkaiser approved these changes Feb 28, 2018

View reviewed changes

lukaszkaiser merged commit 6a16f40 into tensorflow:master Feb 28, 2018

stefan-it mentioned this pull request Mar 19, 2018

problem: add translate_envi to all problems #656

Merged

haoransh mentioned this pull request Apr 8, 2018

How to reproduce the result reported in the table? stefan-it/nmt-en-vi#2

Closed

martinpopel mentioned this pull request Apr 9, 2018

allow --tokenize none awslabs/sockeye#351

Closed

7 tasks

stefan-it mentioned this pull request Apr 25, 2018

Multiple input features for transformer model #703

Closed

sugeeth14 mentioned this pull request Jan 17, 2019

Suggestions on training on English to Vietnamese translation facebookresearch/fairseq#458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

en-vi: add IWSLT'15 English-Vietnamese as new problem #611

en-vi: add IWSLT'15 English-Vietnamese as new problem #611

stefan-it commented Feb 23, 2018

lukaszkaiser left a comment

duyvuleo commented Apr 9, 2018

martinpopel commented Apr 9, 2018

duyvuleo commented Apr 9, 2018 •

edited

Loading

anhtuanvn commented Mar 2, 2019

stefan-it commented Mar 19, 2019 •

edited

Loading

en-vi: add IWSLT'15 English-Vietnamese as new problem #611

en-vi: add IWSLT'15 English-Vietnamese as new problem #611

Conversation

stefan-it commented Feb 23, 2018

lukaszkaiser left a comment

Choose a reason for hiding this comment

duyvuleo commented Apr 9, 2018

martinpopel commented Apr 9, 2018

duyvuleo commented Apr 9, 2018 • edited Loading

anhtuanvn commented Mar 2, 2019

stefan-it commented Mar 19, 2019 • edited Loading

duyvuleo commented Apr 9, 2018 •

edited

Loading

stefan-it commented Mar 19, 2019 •

edited

Loading