Skip to content

Latest commit

 

History

History
15 lines (14 loc) · 1 KB

README.md

File metadata and controls

15 lines (14 loc) · 1 KB

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

This is a Pytorch implementation of the REDQ+AdaptiveBC method proposed in the paper "Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning" by Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, and Joni Pajarinen.

Setup

conda env create -f environment.yaml
conda activate adaptive

Note: For d4rl, the MuJoCo license is still needed, you can get the license for free from http://roboti.us/, and copy it into your MuJoCo installation foler. The instruction of the mujoco_py might be useful.

Run the code

The training includes two stages: pretraining on the d4rl dataset and finetuning on the corresponding task. The run the experiment:

python3 main.py --env=<TASK_NAME> --seed=<SEED>

We use wandb for logging. Please check the documentation for details.