This repository contains the data and code from our paper Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation. Meta4XNLI is a parallel dataset with annotations in English and Spanish for metaphor detection at token level and metaphor interpretation framed within NLI the task. We evaluated multilingual models such as mDeBERTa and XLM-RoBERTa-large to examine their performance in monolingual, cross-lingual, cross-domain and zero-shot scenarios for metaphor detection. Also, we evaluated if these models struggle to identify inference relationships when metaphors are involved in the text.
The repository is organised as follows:
cometa/
: train and test partitions in tabulated format.vua/
: train, dev and test partitions of VUA-20 version with all POS labeled [shared task 2020] in tabulated format.meta4xnli/
detection/
: all files in tabulated format. -source_datasets/
: files in tabulated format split by the original dataset, premises and hypotheses separated, in Englishen/
and Spanishes/
. -splits/
: train, dev and test splits to train and evaluate in Englishen/
and Spanishes/
.interpretation/
: all files in .tsv format with following fields:{language}
: en or es;{gold_label}
: inference label from original dataset: [entailment, neutral or contradiction];{sentence1}
: premise;{sentence2}
: hypothesis;{promptID}
: premise identifier number_{source_dataset};{pairID}
: premise and hypothesis pair identifier number_{source_dataset};{genre}
: text domain labeled from original dataset annotations;{source_dataset}
: original dataset to which the pair belongs.source_datasets/
: each file includes sentences in English and Spanish:{source_dataset}_met.tsv
: files with pairs with metaphors;{source_dataset}_no_met.tsv
: files with pairs without metaphors.splits
: train, dev and test splits with and without metaphors, both languages in the same file. The language must be specified in argumentdataset_config_name
offine-tuning-nli.sh
file.
generate_scripts.py
: create scripts with different combinations of parameters.meta4xnli_det/int.py
: script to process dataset files. You need to set your paths in variables_TRAIN_DATA_URL, _EVAL_DATA_URL, _TESTEVAL_DATA_URL
(in meta4xnli_int.py);_TRAINING_FILE, _DEV_FILE, _TEST_FILE
(in meta4xnli_det.py).interpretation/bsc_run_glue.py
: script to fine-tune or evaluate models for the task of NLI.detection/bsc_run_ner.py
: script to fine-tune or evaluate models for the task of sequence labeling.fine-tune-det/nli.sh
: files to execute fine-tuning scripts with corresponding parameters.inference-nli.sh/inference-metaphor-detection.sh
: files to evaluate trained models with corresponding parameters.
- Install Transformers 4.20
- Place the chosen train, dev and test files from data/ in the path of the
meta4xnli_det/int.py
scripts. - Generate the scripts using generate_scripts.py from
detection/
orinterpretation/
folders depending on the desired task - Run (you can just run the best models using the hyperparameters specified in the paper.