Multilingual Language Models #614

stefan-it · 2019-03-17T21:21:41Z

Hi,

I trained language models for 16 languages on Wikipedia dumps + OPUS, that can be integrated into flair :) This is the result of ~ 2 months work.

Language models

Training data are (a) recent Wikipedia dump and (b) corpora from OPUS. Training was done for one epoch over the full training corpus.

Language (Code)	Tokens (training)	Forward ppl	Backward ppl
Arabic (ar)	736,512,400	3.39	3.45
Bulgarian (bg)	111,336,781	2.46	2.47
Czech (cs)	442,892,103	2.89	2.90
Danish (da)	325,816,384	2.62	2.68
Basque (eu)	36,424,055	2.64	2.31
Persian (fa)	146,619,206	3.68	3.66
Finnish (fi)	427,194,262	2.63	2.65
Hebrew (he)	502,949,245	3.84	3.87
Hindi (hi)	28,936,996	2.87	2.86
Croatian (hr)	625,084,958	3.13	3.20
Indonesian (id)	174,467,241	2.80	2.74
Italian (it)	1,549,430,560	2.62	2.63
Dutch (nl)	1,275,949,108	2.43	2.55
Norwegian (no)	156,076,225	3.01	3.01
Polish (pl)	1,428,604,528	2.95	2.84
Slovenian (sl)	419,744,423	2.88	2.91
Swedish (sv)	671,922,632	6.82	2.25

Download links:

wget https://schweter.eu/cloud/flair-lms/lm-ar-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-ar-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-bg-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-bg-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-cs-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-cs-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-da-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-da-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-eu-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-eu-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-fa-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-fa-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-fi-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-fi-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-he-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-he-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-hi-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-hi-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-hr-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-hr-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-id-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-id-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-it-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-it-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-nl-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-nl-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-no-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-no-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-pl-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-pl-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-sl-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-sl-opus-large-backward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-sv-opus-large-forward-v0.1.pt
wget https://schweter.eu/cloud/flair-lms/lm-sv-opus-large-backward-v0.1.pt

Hyperparameters:

Parameter	Value
`hidden_size`	2048
`nlayers`	1
`sequence_length`	250
`mini_batch_size`	100

Instead of using common_chars, all characters (from the training corpus) are used as vocabulary for language model training.

PoS Tagging on Universal Dependencies (v1.2)

To test the new language models on a downstream task, results for PoS tagging on Universal Dependencies (v1.2) are reported (with comparisons to other papers).

Language (Code)	Yu et. al (2017)	Plank et. al (2016)	Yasunaga et. al (2017)	Flair	Δ
Arabic (ar)	99.00	98.91	n.a.	98.86	-0.14
Bulgarian (bg)	98.20	98.23	98.53	99.18	0.65
Czech (cs)	98.79	98.24	98.81	99.14	0.33
Danish (da)	95.92	96.35	96.74	98.48	1.74🔥
Basque (eu)	94.94	95.51	94.71	97.30	1.79🔥
Persian (fa)	97.12	97.60	97.51	98.15	0.55
Finnish (fi)	95.31	95.85	95.40	98.11	2.26🔥
Hebrew (he)	96.04	96.96	97.43	97.67	0.24
Hindi (hi)	96.96	97.10	97.21	97.85	0.64
Croatian (hr)	95.05	96.82	96.32	97.43	0.61
Indonesian (id)	93.44	93.41	94.03	93.85	-0.18
Dutch (nl)	93.11	93.82	93.09	94.03	0.21
Norwegian (no)	97.65	98.06	98.08	98.73	0.65
Polish (pl)	96.83	97.63	97.57	98.81	1.18🔥
Slovenian (sl)	97.16	96.97	98.11	99.02	0.91
Swedish (sv)	96.28	96.69	96.70	98.54	1.84🔥

Hyperparameters:

Parameter	Value
`hidden_size`	`512`
`learning_rate`	`0.1`
`mini_batch_size`	`8`
`max_epochs`	`500`

Results on Universal Dependencies show new SOTA, except for Arabic and Indonesian.

The text was updated successfully, but these errors were encountered:

alanakbik · 2019-03-18T14:24:52Z

Hi @stefan-it this is awesome, looking forward to integrating this!!

Is there a paper on your results?

stefan-it · 2019-03-24T23:13:44Z

@alanakbik I was just thinking about submitting a workshop paper. But I just got a ACL rejection, because we just included the supplementary material inside the paper... this is a really demotivating factor 🤣

alanakbik · 2019-03-26T15:40:44Z

Yeah I think we all know the feeling :/ But there's always another conference on the horizon :) Any plans on putting the paper on archiv?

zeeshansayyed · 2019-05-25T01:03:14Z

Thank you very much @stefan-it . Coming here from #179 . If you don't mind can you please answer the following questions regarding Arabic LM:

How many epochs did you train for?
What was your initial learning rate? Can you provide a rough idea of how your learning rate dropped over the epochs?

The reason I ask is that I am training an LM over a 1.5B word Arabic corpus. And I would like to get some pointers on when to stop training. It's been 2 days on a K80 and after 1 epoch, I am looking at a perplexity of roughly 3.6 and the learning rate has dropped to 5.

Maybe, even @alanakbik could share his experience over training over huge corpora.

Thanks
Zeeshan

alanakbik · 2019-05-27T13:30:41Z

Hello @zeeshansayyed we generally train for about 2 weeks. One thing I note is that your learning rate has already annealed to 5 after 2 days which indicates that your patience may be too low. Try doubling the patience so that you learn with a learning rate of 20 for a few more days.

zeeshansayyed · 2019-05-27T21:14:15Z

Yes. On the 4th day, the learning rate fell to 1.25. In the code, the default patience seems to be at 10. But elsewhere on the internet, I have seen it be 3 or 5. Do you think a 'patience' of 20 would do the trick? I will stop and restart training.

alanakbik · 2019-05-28T13:08:43Z

Yes, 20 will be better. You could even go higher if you have enough time (the higher the patience the longer it trains).

stefan-it · 2019-05-28T13:55:19Z

Hi @zeeshansayyed :)

to answer your questions: I trained all models for one epoch. Initial learning rate was 20. In my experiments I used a training corpus split size < 20, so that the learning rate never decreased.

GH-614: new language models

jantrienes · 2019-05-29T11:20:21Z

@alanakbik @stefan-it I see that this issue has already been addressed with #761. However, do the old language models still remain available? For reproducibility, it would be good if one can chose between the different versions of pre-trained embeddings.

stefan-it · 2019-05-29T11:50:27Z

This was my fault 😅

I've talked with @alanakbik about that (I voted for overriding the old models), but I think then we need to discuss a kind of "versioning schema". E.g. nl-forward would always point to the latest version of trained flair embeddings. nl-forward-v0 or nl-forward-v1 would point to previous versions.

A more complex use case:
When you want to quote the latest version (e.g. that you used in a paper), it would be good to have a kind of "symlink" that points from nl-forward to nl-forward-v2, so you can quote nl-forward-v2 in a paper (in case of nl-forward will be updated to v3).

🤔

jantrienes · 2019-05-29T11:56:59Z

@stefan-it I think it would be worth opening an issue for this. I would be strongly in favor of keeping the "old" models for reproducibility.

We might consider something similar to the spaCy model versioning scheme. They do in fact support the latter use case you mention (i.e., nl-forward points to latest model).

alanakbik · 2019-05-29T12:03:35Z

Hello @stefan-it @jantrienes yes this makes a lot of sense. Another question would be what happens if two groups independently contribute models for a language, for instance we have LMs for Polish from different groups. In this case, one model is not an improved version of the other, but they were simply trained on different data with different parameters. How would we distinguish between the two, and also how would we choose which one to point to if pl-forward is selected?

GH-614: re-added older LMs with version number

abeermohamed1 · 2020-03-13T23:46:48Z

@stefan-it Congratulations for the great work. Is there a paper for your pre-trained model. how to cite this if i want. thank you

stefan-it · 2020-03-16T15:01:18Z

Hi @abeermohamed1 ,

unfortunately, there's no paper available. But you could cite the flair-lms repo:

https://github.com/flairNLP/flair-lms

abeermohamed1 · 2020-03-19T19:03:18Z

@stefan-it Thank you. this a great work and effort.
Can I ask you what you mean by the below I didn't understand why your paper is got rejected? Can you please explain for me as I am begginer :) in research.

"But I just got a ACL rejection, because we just included the supplementary material inside the paper"

See flairNLP/flair#614

codemaster-22 · 2021-06-19T13:46:43Z

@stefan-it can you please tell how to get fast model , like I have seen u added bg-X-fast

codemaster-22 · 2021-07-11T12:56:04Z

@stefan-it can you please let me know the size of corpus required to fine tune news-forward model on Social media data.

stefan-it · 2021-07-12T07:37:27Z

Hi @codemaster-22 , I did not train the news-forward model, but I'm sure that @alanakbik can help with the size of training corpus!

codemaster-22 · 2021-07-12T07:56:36Z

@alanakbik can you please help me out with this asap ? I am curious to start fine tuning on english tweets.

alanakbik · 2021-07-12T15:33:04Z

Hi @codemaster-22 I trained the news model on the 1 billion word corpus, which in fact is about 800 million tokens of text.

codemaster-22 · 2021-07-12T15:35:30Z

@alanakbik I want to fine tune the news-forward model on English tweets , so any suggestions on number of tokens I should use ?

alanakbik · 2021-07-12T15:39:14Z

If you have a large enough corpus of tweets, you might consider training a new model from scratch instead of fine-tuning the news model. The language is very different in style, also you might need a different character dictionary for emojis etc. If you have around 100 million tokens of tweet text that should be good but more is obviously always better.

codemaster-22 · 2021-07-12T15:41:14Z

Like I have lot of tweets , but I want to immediately experiment with fine tuning . Did you mean 100 million tokens for fine tuning or to train from scratch?

alanakbik · 2021-07-12T15:43:42Z

100 million to train from scratch. I didn't do much fine-tuning of already trained flair embeddings so I'm not really sure how much you need (probably less) and what the best parameters are. But be careful to set a low learning rate when fine-tuning.

codemaster-22 · 2021-07-13T12:45:31Z

Hi @alanakbik @stefan-it , I started finetuning with Learning rate 5 , and below are the plots

sequence_length=100,
mini_batch_size=100,
learning_rate=5,
patience=10
these are the parameters I kept , can you please suggest me something by which I can decrease loss (make it below 1)

stefan-it added the release-0.5 label Mar 17, 2019

stefan-it mentioned this issue May 24, 2019

Arabic LM #179

Closed

alanakbik pushed a commit that referenced this issue May 28, 2019

GH-614: add language models by @stefan-it

58ad49a

alanakbik pushed a commit that referenced this issue May 28, 2019

GH-614: add language models by @stefan-it

10b7226

alanakbik pushed a commit that referenced this issue May 28, 2019

GH-614: add historic German language models by @stefan-it

7608807

alanakbik pushed a commit that referenced this issue May 28, 2019

GH-614: add language models by @stefan-it

7f83460

alanakbik mentioned this issue May 28, 2019

GH-614: new language models #761

Merged

alanakbik closed this as completed in #761 May 28, 2019

alanakbik pushed a commit that referenced this issue May 28, 2019

Merge pull request #761 from zalandoresearch/GH-614-language-models

637d6b8

GH-614: new language models

alanakbik pushed a commit that referenced this issue May 29, 2019

GH-614: re-added older LMs with version number

a86698d

alanakbik mentioned this issue May 29, 2019

GH-614: re-added older LMs with version number #766

Merged

alanakbik pushed a commit that referenced this issue May 29, 2019

Merge pull request #766 from zalandoresearch/GH-614-model-versioning

bf86474

GH-614: re-added older LMs with version number

rajrohan mentioned this issue Mar 10, 2020

Flair for Hindi dataset #1405

Closed

matteo-grella added a commit to nlpodyssey/spago that referenced this issue Mar 5, 2021

Add support to import Flair Language Models using goPickle;

094b839

See flairNLP/flair#614

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual Language Models #614

Multilingual Language Models #614

stefan-it commented Mar 17, 2019 •

edited

Loading

alanakbik commented Mar 18, 2019

stefan-it commented Mar 24, 2019 •

edited

Loading

alanakbik commented Mar 26, 2019

zeeshansayyed commented May 25, 2019

alanakbik commented May 27, 2019

zeeshansayyed commented May 27, 2019

alanakbik commented May 28, 2019

stefan-it commented May 28, 2019

jantrienes commented May 29, 2019

stefan-it commented May 29, 2019 •

edited

Loading

jantrienes commented May 29, 2019

alanakbik commented May 29, 2019 •

edited

Loading

abeermohamed1 commented Mar 13, 2020

stefan-it commented Mar 16, 2020

abeermohamed1 commented Mar 19, 2020

codemaster-22 commented Jun 19, 2021

codemaster-22 commented Jul 11, 2021

stefan-it commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 13, 2021 •

edited

Loading

Multilingual Language Models #614

Multilingual Language Models #614

Comments

stefan-it commented Mar 17, 2019 • edited Loading

Language models

PoS Tagging on Universal Dependencies (v1.2)

alanakbik commented Mar 18, 2019

stefan-it commented Mar 24, 2019 • edited Loading

alanakbik commented Mar 26, 2019

zeeshansayyed commented May 25, 2019

alanakbik commented May 27, 2019

zeeshansayyed commented May 27, 2019

alanakbik commented May 28, 2019

stefan-it commented May 28, 2019

jantrienes commented May 29, 2019

stefan-it commented May 29, 2019 • edited Loading

jantrienes commented May 29, 2019

alanakbik commented May 29, 2019 • edited Loading

abeermohamed1 commented Mar 13, 2020

stefan-it commented Mar 16, 2020

abeermohamed1 commented Mar 19, 2020

codemaster-22 commented Jun 19, 2021

codemaster-22 commented Jul 11, 2021

stefan-it commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 12, 2021

alanakbik commented Jul 12, 2021

codemaster-22 commented Jul 13, 2021 • edited Loading

stefan-it commented Mar 17, 2019 •

edited

Loading

stefan-it commented Mar 24, 2019 •

edited

Loading

stefan-it commented May 29, 2019 •

edited

Loading

alanakbik commented May 29, 2019 •

edited

Loading

codemaster-22 commented Jul 13, 2021 •

edited

Loading