GitHub - Mark-Hopkins-at-Williams/thesis-enis

Neural Machine Translator

Code Credits: https://nlp.seas.harvard.edu/annotated-transformer/

A Python module for creating custom neural machine translators from a source to target language.

Installing the Package

To install the translation package, you must first ensure that your virtual environment has all the required dependencies. TODO: add dependency management. Install the package by running pip install -e translation/ in your terminal at the root of the Git repo.

Creating a Translator

To create a translator, use the following code:

from models import TranslationModel
translator = TranslationModel('{src_language}', '{tgt_language}', dataset)

where dataset is a HuggingFace-style dataset containing a train, validation, and test split, each with a single 'translation' feature of source-target dictionaries of parallel sentences.

The first time this is run, the model will train on your system's GPUs. After it has trained, the results will be cached and future retrievals will not be needed.

To translate, use the .translate method on the TranslationModel with a source sentence argument: translated = translator.translate('{source sentence input}')

Creating a Dataset

If you have local parallel data files, then you can use the load_dataset function in the src/data_utils.py module. Store your data with the files

parent/train/{src}.txt
parent/train/{tgt}.txt
parent/validation/{src}.txt
parent/validation/{tgt}.txt
parent/test/{src}.txt
parent/test/{tgt}.txt

Then, call the function like so: load_datasets('parent', '{src}-{tgt}') to generate a dataset in the proper form.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
translation		translation
.gitignore		.gitignore
README.md		README.md
bbc_spider.py		bbc_spider.py
claude.py		claude.py
compute_cartography.py		compute_cartography.py
crawl_web.py		crawl_web.py
estimate_cost.py		estimate_cost.py
eval_models.py		eval_models.py
find_similar_articles.py		find_similar_articles.py
flores_200_bitext.py		flores_200_bitext.py
generate_wikimedia_dataset.py		generate_wikimedia_dataset.py
google_translate.py		google_translate.py
gpt.py		gpt.py
hijack_gpu.sh		hijack_gpu.sh
ipython_context.py		ipython_context.py
none.py		none.py
none.sh		none.sh
none_0.py		none_0.py
none_1.py		none_1.py
parse_yoruba_outputs.py		parse_yoruba_outputs.py
rename_models.py		rename_models.py
reserve_gpu.sh		reserve_gpu.sh
save_job.sh		save_job.sh
test.py		test.py
test_languages.py		test_languages.py
test_models.py		test_models.py
train.py		train.py
train.sh		train.sh
train_1.py		train_1.py
train_2.py		train_2.py
translate_yoruba_monolingual.py		translate_yoruba_monolingual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Machine Translator

Installing the Package

Creating a Translator

Creating a Dataset

About

Releases

Packages

Contributors 2

Languages

Mark-Hopkins-at-Williams/thesis-enis

Folders and files

Latest commit

History

Repository files navigation

Neural Machine Translator

Installing the Package

Creating a Translator

Creating a Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages