-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT model for Machine Translation #31
Comments
Hi Kerem, I don't think so. Have a look at the fairsep repo maybe. |
@thomwolf hi there, I couldn't find out anything about the fairsep repo. Could you post a link? Thanks! |
Hi, I am talking about this repo: https://github.com/pytorch/fairseq. |
I have conducted several MT experiments which fixed the embeddings by using BERT, UNFORTUNATELY, I find it makes performance worse. @JasonVann @thomwolf |
Hey! FAIR has demonstrated that using BERT for unsupervised translation greatly improves BLEU. Paper: https://arxiv.org/abs/1901.07291 Repo: https://github.com/facebookresearch/XLM Older papers showing pre-training with LM (not MLM) helps Seq2Seq: https://arxiv.org/abs/1611.02683 Hope this helps! |
These links are useful. Does anyone know if BERT improves things also for supervised translation? Thanks. |
Also interested |
Because BERT is an encoder, I guess we need a decoder. I looked here: https://jalammar.github.io/ |
https://arxiv.org/pdf/1901.07291.pdf seems to suggest that it does improve the results for supervised translation as well. However this paper is not about using BERT embeddings, rather about pre-training the encoder and decoder on an Masked Language Modelling objective. The biggest benefit comes from initializing the encoder with the weights from BERT, and surprisingly using it to initialize the decoder also brings small benefits, even though if I understand correctly you still have to randomly initialize the weights for the encoder attention module, since it's not present in the pre-trained network. EDIT: of course the pre-trained network needs to have been trained on multi-lingual data, as stated in the paper |
I have managed to replace transformer's encoder with a pretrained bert encoder, however experiment results were very poor. It dropped BLEU score by about 4 The source code is available here: https://github.com/torshie/bert-nmt , implemented as a fairseq user model. It may not work out of box, some minor tweeks may be needed. |
Yes. It is possible to use both BERT as encoder and GPT as decoder and glue them together. |
Add support for timestamped speech-to-text (w/ whisper)
Is there a way to use any of the provided pre-trained models in the repository for machine translation task?
Thanks
The text was updated successfully, but these errors were encountered: