I updated the code-base with code from my follow-up research, in which the bug is solved. I decided to include all my progress as I did not finish the project, but definitely think there are interesting things in there which might be useful for others. High-over, what is added:
- Support for Adapter-Transformers
- Support for all kinds of auxiliary losses: triplet loss, consistency loss in line with this paper (which does require augmented data) and a loss forcing the final feature space to be convex (i.e. interpolating in feature space should result in the same interpolation in output space)
- Multi-task learning loss components which can compensate for imbalance in nr of labels per task and uncertainty inherent to a task
- A spin-off model from ProtoMAML which can use user-defined class descriptions as prototypes instead of the center of the embedded support set. This model also supports splitting the support set such that one half is used to initialize prototypes and the other one to perform inner-loop optimization.
- More extra's to push performance such as Stochastic Weight Averaging and ensemble techniques such as majority voting (based on new model described in previous point)
For a partial write-up of the above implementations, feel free to contact me.
This repository is based on the work of Antreas Antoniou, How To Train Your MAML.
Section | Description |
---|---|
Setup | How to setup a working environment |
Data and Preprocessing | How to prepare and utilize a (custom) dataset |
Supported meta-learning algorithms | Learning methods |
Supported base-learners | Base-learners |
Running an experiment | How to configure and run an experiment |
Citation | Citing our work |
[1] Install anaconda: Instructions here: https://www.anaconda.com/download/
[2] Create virtual environment:
conda create --name meta python=3.8
conda activate meta
[3] Install PyTorch (>1.5). Please refer to the PyTorch installation page for the specifics for your platform.
[4] Clone the repository:
git clone https://github.com/mrvoh/meta_learning_multilingual_doc_classification.git
cd meta_learning_multilingual_doc_classification
[5] Install the Ranger optimizer Instructions found in the original repo on Github [6] Install the requirements:
pip install -r requirements.txt
Each document/sample is stored as a separate *.json
, formatted as follows:
{
"source_sentence": "dummy",
"target_sentence": "This is a test document",
"source": "MLDoc",
"teacher_encoding": [0, 0, 1, 0], # One-hot or continuous labels as learning signal
"teacher_name": "ground_truth",
"target_language": "en"
}
The whole dataset has to be stored in the datasets
folder in the same directory as train_maml_system.py
with the following folder structure:
Dataset
||
___||_________
| | |
train val test
|___________.....
| |
Dataset_1 Dataset_D
|______________.....
| | |
lang_1 lang_2 lang_L
| | |
class_0 class_1 ... class_N
| |___________________
| |
samples for class_0 samples for class_1
So for instance, the first sample from the MLDoc dataset, corresponding to class ECAT
in French would be located at datasets/Dataset/train/MLDoc/fr/ECAT/sample1.json
- MAML++
- Reptile
- Prototypical Network
- ProtoMAML
- ProtoMAMLn
All base-learners are based on the HuggingFace Transformers library, but in order to support learnable learning rates, the forward() method of the base-learner has to be implemented in a functional way (see meta_bert.py
). Hence, base-learner support is limited to:
- BERT
- XLM-Roberta
- DistilBert
So for instance in order to use the base multilingual version of bert as base-learner, the pretrained_weights
option has to be set to bert-base-multilingual-cased
.
In order to run an experiment, setup the hyperparameters as desired and run
export DATASET_DIR="path/to/dataset/"
python train_maml_system.py --name_of_args_json_file path/to/config.json
The following options are configurable:
"batch_size":4, # number of tasks for one update
"gpu_to_use":0, # set to -1 to not use GPU if available
"num_dataprovider_workers":4,
"dataset_name":"eng_text_class", # Name of dataset as per Data and Preprocessing section
"dataset_path":"eng_text_class",
"reset_stored_paths":false,
"experiment_name":"eng_text_class-threeway",
"pretrained_weights":"distilbert-base-multilingual-cased", #pretrained weights of base-learner from HuggingFace Transformers
"teacher_dir": "teachers",
"meta_loss":"ce", # Loss to update base-learner with, KL divergence with continuous labels is also availabe (kl)
"num_freeze_epochs": 0, # number of epochs to only train inner-loop optimizer
"patience":3, # Number of epochs of no improvement before applying early stopping
"train_seed": 42,
"val_seed": 0,
"evaluate_on_test_set_only": false,
"eval_using_full_task_set": true,
"num_evaluation_seeds": 5,
"meta_update_method":"protomaml", # Options: maml, reptile, protomaml, protonet
"protomaml_do_centralize": true, # whether to use ProtoMAMln instead of regular ProtoMAML
"total_epochs": 50,
"total_iter_per_epoch":100, # number of update steps per epoch
"total_epochs_before_pause": 100,
"per_step_layer_norm_weights":true, # separate layer norm weights per inner-loop step
"evalute_on_test_set_only": false,
"num_evaluation_tasks":50,
"learnable_per_layer_per_step_inner_loop_learning_rate": true, # whether to train or freeze inner lr
"init_inner_loop_learning_rate": 1e-5,
"init_class_head_lr_multiplier": 10, # factor with which to increase the initial lr of the classification head of the model
"split_support_and_query": true,
"sample_task_to_size_ratio": false,
"shuffle_labels":true,
"min_learning_rate":0.000001,
"meta_learning_rate":3e-5, # learning rate applied to the base-learner
"meta_inner_optimizer_learning_rate":6e-5, # learning rate applied to the inner-loop optimizer
"number_of_training_steps_per_iter":5,
"num_classes_per_set":4,
"num_samples_per_class":2,
"num_target_samples": 2,
"second_order": false
"first_order_to_second_order_epoch":50 # epoch at which to start using second order gradients
Please cite our paper if you use it in your own work.
@inproceedings{van2021multilingual,
title={Multilingual and cross-lingual document classification: A meta-learning approach},
author={van der Heijden, Niels and Yannakoudakis, Helen and Mishra, Pushkar and Shutova, Ekaterina},
booktitle={Proceedings of the 2021 Conference of the European Chapter of the Association for Computational Linguistics},
year={2021}
}