Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



13 Commits

Repository files navigation

Bug fix update

I updated the code-base with code from my follow-up research, in which the bug is solved. I decided to include all my progress as I did not finish the project, but definitely think there are interesting things in there which might be useful for others. High-over, what is added:

  • Support for Adapter-Transformers
  • Support for all kinds of auxiliary losses: triplet loss, consistency loss in line with this paper (which does require augmented data) and a loss forcing the final feature space to be convex (i.e. interpolating in feature space should result in the same interpolation in output space)
  • Multi-task learning loss components which can compensate for imbalance in nr of labels per task and uncertainty inherent to a task
  • A spin-off model from ProtoMAML which can use user-defined class descriptions as prototypes instead of the center of the embedded support set. This model also supports splitting the support set such that one half is used to initialize prototypes and the other one to perform inner-loop optimization.
  • More extra's to push performance such as Stochastic Weight Averaging and ensemble techniques such as majority voting (based on new model described in previous point)

For a partial write-up of the above implementations, feel free to contact me.

Multilingual and cross-lingual document classification: A meta-learning approach

This repository is based on the work of Antreas Antoniou, How To Train Your MAML.


Section Description
Setup How to setup a working environment
Data and Preprocessing How to prepare and utilize a (custom) dataset
Supported meta-learning algorithms Learning methods
Supported base-learners Base-learners
Running an experiment How to configure and run an experiment
Citation Citing our work


[1] Install anaconda: Instructions here:

[2] Create virtual environment:

conda create --name meta python=3.8
conda activate meta

[3] Install PyTorch (>1.5). Please refer to the PyTorch installation page for the specifics for your platform.

[4] Clone the repository:

git clone
cd meta_learning_multilingual_doc_classification

[5] Install the Ranger optimizer Instructions found in the original repo on Github [6] Install the requirements:

pip install -r requirements.txt

Data pre-processing

Each document/sample is stored as a separate *.json , formatted as follows:

"source_sentence": "dummy", 
"target_sentence": "This is a test document", 
"source": "MLDoc", 
"teacher_encoding": [0, 0, 1, 0], # One-hot or continuous labels as learning signal
"teacher_name": "ground_truth", 
"target_language": "en"

The whole dataset has to be stored in the datasets folder in the same directory as with the following folder structure:

|       |     |
train   val  test
|                |
Dataset_1        Dataset_D
|       |           |
lang_1  lang_2      lang_L 
    |       |            |
 class_0 class_1 ... class_N
    |       |___________________
    |                           |
samples for class_0    samples for class_1

So for instance, the first sample from the MLDoc dataset, corresponding to class ECAT in French would be located at datasets/Dataset/train/MLDoc/fr/ECAT/sample1.json

Supported meta-learning algorithms

  • MAML++
  • Reptile
  • Prototypical Network
  • ProtoMAML
  • ProtoMAMLn

Supported base-learners

All base-learners are based on the HuggingFace Transformers library, but in order to support learnable learning rates, the forward() method of the base-learner has to be implemented in a functional way (see Hence, base-learner support is limited to:

  • BERT
  • XLM-Roberta
  • DistilBert

So for instance in order to use the base multilingual version of bert as base-learner, the pretrained_weights option has to be set to bert-base-multilingual-cased .

Running an experiment

In order to run an experiment, setup the hyperparameters as desired and run

export DATASET_DIR="path/to/dataset/"
python --name_of_args_json_file path/to/config.json

The following options are configurable:

  "batch_size":4, # number of tasks for one update
  "gpu_to_use":0, # set to -1 to not use GPU if available
  "dataset_name":"eng_text_class", # Name of dataset as per Data and Preprocessing section
  "pretrained_weights":"distilbert-base-multilingual-cased", #pretrained weights of base-learner from HuggingFace Transformers
  "teacher_dir": "teachers",
  "meta_loss":"ce", # Loss to update base-learner with, KL divergence with continuous labels is also availabe (kl)
  "num_freeze_epochs": 0, # number of epochs to only train inner-loop optimizer
  "patience":3, # Number of epochs of no improvement before applying early stopping

  "train_seed": 42, 
  "val_seed": 0,
  "evaluate_on_test_set_only": false,
  "eval_using_full_task_set": true,
  "num_evaluation_seeds": 5,
  "meta_update_method":"protomaml", # Options: maml, reptile, protomaml, protonet
  "protomaml_do_centralize": true, # whether to use ProtoMAMln instead of regular ProtoMAML
  "total_epochs": 50,
  "total_iter_per_epoch":100, # number of update steps per epoch
  "total_epochs_before_pause": 100,
  "per_step_layer_norm_weights":true,  # separate layer norm weights per inner-loop step
  "evalute_on_test_set_only": false,
  "learnable_per_layer_per_step_inner_loop_learning_rate": true, # whether to train or freeze inner lr
  "init_inner_loop_learning_rate": 1e-5,
  "init_class_head_lr_multiplier": 10, # factor with which to increase the initial lr of the classification head of the model
  "split_support_and_query": true,
  "sample_task_to_size_ratio": false,

  "meta_learning_rate":3e-5, # learning rate applied to the base-learner
  "meta_inner_optimizer_learning_rate":6e-5, # learning rate applied to the inner-loop optimizer
  "num_target_samples": 2,

  "second_order": false
  "first_order_to_second_order_epoch":50 # epoch at which to start using second order gradients


Please cite our paper if you use it in your own work.

  title={Multilingual and cross-lingual document classification: A meta-learning approach},
  author={van der Heijden, Niels and Yannakoudakis, Helen and Mishra, Pushkar and Shutova, Ekaterina},
  booktitle={Proceedings of the 2021 Conference of the European Chapter of the Association for Computational Linguistics},


Placeholder repository






No releases published


No packages published
