term-integration-mt

The source code repository for the paper "Towards Precise Lexicon Integration in Neural Machine Translation"

Usage

Providing Mullti-choice Lexical Constraints

Instead of creating:

{ 'text': 'This is a test .', 'constraints': ['constr@@ aint', 
                                              'multi@@ word constr@@ aint',
                                             ] }

Create:

{ 'text': 'This is a test .', 'constraints': [ ['constr@@ aint', 'constr@@ ain@@ ts', 'Cons@@ tr@@ aint'], 
                                               ['multi@@ word constr@@ aint', 'Multi@@ word constr@@ aint'], 
                                             ] }

And let your trained NMT model to decide for the best fitting constraint options

Benchmarks

EN -> RU newstest2017

Model	Term. Rate	Term. Prec.	Term. Recall	Term. F1	BLEU (Δ)
baseline _{(no constraints)}	57.43	78.20	81.16	79.65	33.2
Dinu et al., 2019 _{(with all lemmata matching factors)}	81.22	62.76	95.54	75.76	30.2 (-3.0)
Dinu et al., 2019 _{(with BERT selected factors)}	57.17	79.13	81.45	80.27	31.8 (-1.4)
Post and Vilar, 2018 _{(with all lemmata matching constraints)}	99.88	49.04	99.23	65.64	26.0 (-7.2)
Multi-Choice Lexical Constraints _{(with alll lemmata matching constraints)}	99.68	50.82	99.54	67.29	28.2 (-5.0)
Post and Vilar, 2018 _{(with BERT selected constraints)}	61.67	75.02	87.30	80.69	31.1 (-2.1)
Multi-Choice Lexical Constraints _{(with random subset of lemmata matching constraints)}	70.71	66.92	86.55	75.48	31.7 (-1.5)
Multi-Choice Lexical Constraints* _{(with BERT selected constraints)}	61.62	77.35	87.30	82.03	32.5 (-0.7)

EN -> RU newstest2020 (extracted from ru-en wmt20/test-ts)

Model	Term. Rate	Term. Prec.	Term. Recall	Term. F1	BLEU (Δ)
baseline _{(no constraints)}	57.33	77.19	75.01	76.08	28.8
Dinu et al., 2019 _{(with all lemmata matching factors)}	81.42	64.72	92.72	76.23	26.4 (-2.4)
Dinu et al., 2019 _{(with BERT selected factors)}	58.27	79.09	77.88	78.48	27.8 (-1.0)
Post and Vilar, 2018 _{(with all lemmata matching constraints)}	99.79	51.13	99.32	67.51	24.6 (-4.2)
Multi-Choice Lexical Constraints _{(with alll lemmata matching constraints)}	99.51	52.46	99.15	68.62	24.9 (-3.9)
Post and Vilar, 2018 _{(with BERT selected constraints)}	63.90	74.35	84.73	79.20	27.4 (-1.4)
Multi-Choice Lexical Constraints _{(with random subset of lemmata matching constraints)}	72.31	65.17	82.54	72.83	27.3 (-1.5)
Multi-Choice Lexical Constraints* _{(with BERT selected constraints)}	63.84	75.84	84.52	79.94	28.1 (-0.7)

TODO

Update sockeye.lexical_constraints script to produce MLC compatible constraints.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
BERT-NER @ 765994f		BERT-NER @ 765994f
evaluation		evaluation
sockeye @ f9f2ca5		sockeye @ f9f2ca5
.gitmodules		.gitmodules
README.md		README.md
en_ru_lexical_cons_freqs.pickle		en_ru_lexical_cons_freqs.pickle
news_term_dict_no_stopwords.pickle		news_term_dict_no_stopwords.pickle
news_term_search_patterns_no_stopwords.pickle		news_term_search_patterns_no_stopwords.pickle
quasi_lexica_from_news_domain_no_stopwords.tsv		quasi_lexica_from_news_domain_no_stopwords.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

term-integration-mt

Usage

Benchmarks

EN -> RU newstest2017

EN -> RU newstest2020 (extracted from ru-en wmt20/test-ts)

TODO

About

Releases

Packages

Languages

ogunoz/term-integration-mt

Folders and files

Latest commit

History

Repository files navigation

term-integration-mt

Usage

Benchmarks

EN -> RU newstest2017

EN -> RU newstest2020 (extracted from ru-en wmt20/test-ts)

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages