register is a toolkit for analyzing language use patterns that characterize registers, genres and styles. It provides a wide range of features and covers various languages (Note that not all feature packages are supported for all languages (see doc)).
register requires Python >= 3.6.
- Clone/download the repository
- In the current folder run:
pip install -r requirements.txt
- Load the spaCy language model (If your language is not supported by spaCy, you can still use basic feature packages (e.g., character n-grams or token n-grams)), e.g.:
python -m spacy download de_core_news_sm
Some features need further resources:
- For features based on constituency parse trees load the benepar model for your language:
import benepar
benepar.download('benepar_de2')
-
The feature package emotion needs specific language data.
- Download the MEmoLon lexicon (2.4 GB) and put the file for your language (e.g.,
de.tsv
) into lang_data/emotion. - Load textblob requirements:
python -m textblob.download_corpora
- Download the MEmoLon lexicon (2.4 GB) and put the file for your language (e.g.,
Run register with (from the src
directory):
python run_register.py path/to/your/configuration_file.json
If you don't specify your own JSON configuration file, the config.json
file in the src
directory is taken.
Edit this file to your needs or create your configuration file following the documentation.
register provides quite a lot configuration options, such as the choice of features you want to extract from your text or different machine learning models to use.
To reproduce the results for the feature-based models used in our paper 'A Question of Style: A Dataset for Analyzing Formality on Different Levels' use the configuration files config_pt16.json
and config_c18.json
in the src
directory. Edit the path to point to your local copy of in_formal sentences.
(Attention: Constituency parsing for the PT18 model takes time. It may take a while.)
When using register, please cite:
@inproceedings{eder-etal-2023,
title = "A Question of Style: A Dataset for Analyzing Formality on Different Levels",
author = "Eder, Elisabeth and
Krieg-Holz, Ulrike and
Wiegand, Michael",
booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-eacl.42",
pages = "580--593"
}
register builds on external resources. If you use them, please cite these resources appropriately (see the documentation).