🎉 Update: We were accepted at NeurIPS 2024, see you all in Vancouver in December!
A repoistory for training unsupervised environment design (UED) methods on 2D navigation tasks. We support three environments:
- JaxNav, our Jax-based simulator for single- and multi-robot 2D geometric navigation, this environment is imported from JaxMARL.
- MiniGrid: a sigle-agent maze navigation domain, this environment is imported from JaxUED.
- XLand-Minigrid: a goal-oriented, grid-based, meta-RL task inspired by XLand and MiniGrid, imported from XLand-MiniGrid and often refered to as just XLand in this repo.
We include several UED methods:
- Sampling For Learnability (SFL), our proposed UED method for binary-outcome deterministic settings.
- PLR
- Robust PLR
- ACCEL
- Domain Randomisation
Our PLR and ACCEL implementations are built off JaxUED.
We introduce Sampling For Learnability (SLF), a new UED method for binary-outcome deterministic settings which outperforms current state-of-the-art approaches on Minigrid and our robotics simulator JaxNav. SFL uniformly randomly samples maps based on learnability which given an agent's success rate on a level
Rather than just comparing performance on a set of hand designed test maps, we introduce a new evaluation protocol for curriculum methods, designed to explicitly test robustness. Our protocol computes a risk metric on the performance of the method, by evaluating its performance in the worst
JaxNav Single-Agent |
JaxNav Multi-Agent |
MiniGrid Maze |
XLand-MiniGrid |
We reccomend using our Dockerfile. With Docker and the Nvidia Container Toolkit installed, it can be built with $ make build
and run with $ make run
.
For installing from source, first ensure you have the correct JAX version for your system installed and then install our dependencies with $ pip install -e .
XLand-Minigrid has a different JAX requirement to JaxMARL and JaxUED. As such, code for xland is held seperately within xland/
, with seperate a Dockerfile and Makefile located within.
All training scripts can be found within sfl/train
and we include a set of configuration files, contained within sweep_configs
, to launch experiements across a number of seeds using wandb
sweeps. We also include a helpful script for easily starting sweeps, start_wandb_sweep.py
. Using this script, SFL on single-agent JaxNav can be run across 4 GPUs with 1 agent per gpu as:
$ python start_wandb_sweep.py sweep_configs/jaxnav-sa_sfl_10seeds.yaml 0:4 1
We use wandb
for logging, your API key and entity can be set within the Dockerfile.
You can either use your own trained policies (downloaded from wandb
) or our saved checkpoints (located within checkpoints/
). For all settings (JaxNav single agent, JaxNav multi agent, MiniGrid and XLand), evaluation is a three step process using scripts located within sfl/deploy
for the first three and within xland/eval
for XLand.
- A set number of levels are generated using
*_0_generate_levels.py
. These levels are saved tosfl/eval/ENV_NAME
, withENV_NAME
being eitherjaxnav
orminigrid
. - Rollouts for the methods under consideration on these levels are collected with
*_1_rollout.py
, run this twice for two seeds (we use 0 and 1). Results from these rollouts are saved as csv's tosfl/data/eval/results
. - The performance of all methods is analysed by
*_2_analyse.py
, with results plotted and saved toresults/
.
If you instead wish to analyse and vizualise performance on the hand-designed test sets, you can use sfl/deploy/deploy_on_singletons.py
for JaxNav and sfl/deploy/deploy_minigrid_on_singeltons.py
for MiniGrid. For the sampled test sets used with JaxNav, use sfl/deploy/deploy_on_sampled_set.py
.
To reproduce our graph illustrating how current UED scoring metrics do not correleate with learnability, but instead with success rate, use the sfl/deploy/deploy_on_sampled_and_calc_regret.ipynb
notebook.
This Jax-based environment for 2D geometric navigation is introduced with this work but the code and documentation is held within JaxMARL.
If you use our SFL method or JaxNav in your work, please cite us as:
@misc{rutherford2024noregrets,
title={No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery},
author={Alexander Rutherford and Michael Beukman and Timon Willi and Bruno Lacerda and Nick Hawes and Jakob Foerster},
year={2024},
eprint={2408.15099},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.15099},
}