Assessing-MT

Assessing open-source MT models in terms of their abilities to preserve the level of formality, politeness and sentiment. See this notebook for details.

Data

Diagnostic test data (and translations) used in the notebook are saved under data. They are sampled from the following sources:

Formality: sentences processed from the Enron corpus released by Madaan et al., see this repo for details.
Politeness: Customer Support on Twitter for English, and Large-scale Cleaned Chinese Conversation (LCCC) for Chinese.
Sentiment: Amazon product data

Note that to respect Twitter's terms of service, we do not include tweet contents in our data, but instead provide IDs from which one can retrieve the content from the original source.

Classifiers

Pre-trained style classifiers used in the notebook can be downloaded from the following links:

Formality: Google drive link
Politeness (English): Google drive link
Politeness (Chinese): Google drive link
Sentiment: nlptown/bert-base-multilingual-uncased-sentiment

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
Assessing_Style_Preservation_with_MT.ipynb		Assessing_Style_Preservation_with_MT.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessing-MT

Data

Classifiers

About

Languages

liye/Assessing-MT

Folders and files

Latest commit

History

Repository files navigation

Assessing-MT

Data

Classifiers

About

Resources

Stars

Watchers

Forks

Languages