BERT model for Machine Translation #31

KeremTurgutlu · 2018-11-18T02:10:15Z

Is there a way to use any of the provided pre-trained models in the repository for machine translation task?

Thanks

thomwolf · 2018-11-18T08:48:33Z

Hi Kerem, I don't think so. Have a look at the fairsep repo maybe.

JasonVann · 2018-11-26T12:17:16Z

@thomwolf hi there, I couldn't find out anything about the fairsep repo. Could you post a link? Thanks!

thomwolf · 2018-11-26T12:41:09Z

Hi, I am talking about this repo: https://github.com/pytorch/fairseq.
Have a look at their Transformer's models for machine translation.

alphadl · 2019-02-20T10:30:54Z

I have conducted several MT experiments which fixed the embeddings by using BERT, UNFORTUNATELY, I find it makes performance worse. @JasonVann @thomwolf

SinghJasdeep · 2019-02-20T18:05:48Z

Hey!

FAIR has demonstrated that using BERT for unsupervised translation greatly improves BLEU.

Paper: https://arxiv.org/abs/1901.07291

Repo: https://github.com/facebookresearch/XLM

Older papers showing pre-training with LM (not MLM) helps Seq2Seq: https://arxiv.org/abs/1611.02683

Hope this helps!

gtesei · 2019-03-01T01:23:26Z

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

echan00 · 2019-04-13T07:04:35Z

Does anyone know if BERT improves things also for supervised translation?

Also interested

nyck33 · 2019-05-05T03:41:59Z

Because BERT is an encoder, I guess we need a decoder. I looked here: https://jalammar.github.io/
and it seems Openai Transformer is a decoder. But I cannot find a repo for it.
https://www.tensorflow.org/alpha/tutorials/text/transformer
I think Bert outputs a vector of size 768. Can we just do a reshape and use the decoder in that transformer notebook? In general can I just reshape and try out a bunch of decoders?

tacchinotacchi · 2019-06-03T19:57:33Z

These links are useful.

Does anyone know if BERT improves things also for supervised translation?

Thanks.

https://arxiv.org/pdf/1901.07291.pdf seems to suggest that it does improve the results for supervised translation as well. However this paper is not about using BERT embeddings, rather about pre-training the encoder and decoder on an Masked Language Modelling objective. The biggest benefit comes from initializing the encoder with the weights from BERT, and surprisingly using it to initialize the decoder also brings small benefits, even though if I understand correctly you still have to randomly initialize the weights for the encoder attention module, since it's not present in the pre-trained network.

EDIT: of course the pre-trained network needs to have been trained on multi-lingual data, as stated in the paper

torshie · 2019-07-05T03:04:53Z

I have managed to replace transformer's encoder with a pretrained bert encoder, however experiment results were very poor. It dropped BLEU score by about 4

The source code is available here: https://github.com/torshie/bert-nmt , implemented as a fairseq user model. It may not work out of box, some minor tweeks may be needed.

sailordiary · 2019-10-07T05:24:05Z

Could be relevant:

Towards Making the Most of BERT in Neural Machine Translation
On the use of BERT for Neural Machine Translation

Bachstelze · 2019-11-01T15:53:02Z

Also have a look at MASS and XLM.

lileicc · 2021-11-09T23:15:56Z

Yes. It is possible to use both BERT as encoder and GPT as decoder and glue them together.
There is a recent paper on this: Multilingual Translation via Grafting Pre-trained Language Models
https://aclanthology.org/2021.findings-emnlp.233.pdf
https://github.com/sunzewei2715/Graformer

Add support for timestamped speech-to-text (w/ whisper)

thomwolf closed this as completed Nov 18, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

Bachstelze mentioned this issue Nov 3, 2019

Is it possible to use bert in fairseq to do machine translation and GEC? microsoft/MASS#97

Open

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023

Merge pull request huggingface#31 from xenova/whisper-timestamps

f10106a

Add support for timestamped speech-to-text (w/ whisper)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT model for Machine Translation #31

BERT model for Machine Translation #31

KeremTurgutlu commented Nov 18, 2018

thomwolf commented Nov 18, 2018

JasonVann commented Nov 26, 2018

thomwolf commented Nov 26, 2018 •

edited

Loading

alphadl commented Feb 20, 2019

SinghJasdeep commented Feb 20, 2019

gtesei commented Mar 1, 2019

echan00 commented Apr 13, 2019

nyck33 commented May 5, 2019 •

edited

Loading

tacchinotacchi commented Jun 3, 2019 •

edited

Loading

torshie commented Jul 5, 2019

sailordiary commented Oct 7, 2019

Bachstelze commented Nov 1, 2019

lileicc commented Nov 9, 2021

BERT model for Machine Translation #31

BERT model for Machine Translation #31

Comments

KeremTurgutlu commented Nov 18, 2018

thomwolf commented Nov 18, 2018

JasonVann commented Nov 26, 2018

thomwolf commented Nov 26, 2018 • edited Loading

alphadl commented Feb 20, 2019

SinghJasdeep commented Feb 20, 2019

gtesei commented Mar 1, 2019

echan00 commented Apr 13, 2019

nyck33 commented May 5, 2019 • edited Loading

tacchinotacchi commented Jun 3, 2019 • edited Loading

torshie commented Jul 5, 2019

sailordiary commented Oct 7, 2019

Bachstelze commented Nov 1, 2019

lileicc commented Nov 9, 2021

thomwolf commented Nov 26, 2018 •

edited

Loading

nyck33 commented May 5, 2019 •

edited

Loading

tacchinotacchi commented Jun 3, 2019 •

edited

Loading