Skip to content

Code for ACL 2024 paper "Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features".

Notifications You must be signed in to change notification settings

ictnlp/SemLing-MNMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

SemLing-MNMT

Code and scripts for the ACL2024 Findings paper "Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features".

image-20240801002159556

Code

The code is based on the open-source toolkit fairseq. Our model code transformer_disentangler_and_linguistic_encoder.py is in "fairseq/fairseq/models", and our criterion code label_smoothed_cross_entropy_with_disentangling.py is in "fairseq/fairseq/criterions".

Get Started

Requirements and Installation

  • Python version == 3.9.12

  • Pytorch version == 1.12.1

  • Install fairseq:

    git clone https://github.com/ictnlp/SemLing-MNMT.git
    cd SemLing-MNMT
    pip install --editable ./

Data Pre-processing

We use the Sentencepiece toolkit to pre-process the IWSLT2017, OPUS-7 and PC-6 datasets. For each dataset, we implement the Unigram Model algorithm for tokenization and learn a joint vocabulary with 32K tokens.

Training and Inference

We provide training and inference scripts of IWSLT2017 in the folder "scripts" as examples. Add your pathes to scripts and run them.

Here are some explanations:

  • In train.sh, --disentangler-lambda, --disentangler-reconstruction-lambda, and --disentangler-negative-lambda are hyperparameters corresponding to $\lambda$, $\lambda_1$, $\lambda_2$ in our paper. And --linguistic-encoder-layers controls the layer number of the linguistic encoder.

  • In generate.sh and generate_zero_shot.sh, we generate translation and compute BLEU scores with SacreBLEU (version == 1.5.1).

About

Code for ACL 2024 paper "Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published