First, clone the repo
git clone https://github.com/Marco5de/DeepRL.git
cd DeepRL
I recommend installing all of the required modules in a fresh virtualenv.
virtualenv venv
source venv/bin/activate
then install most of the required modules directly using pip
pip install -r requirements.txt
install all of the external dependencies beginning with openAI gym
cd src
mkdir extern && cd extern
git clone https://github.com/openai/gym.git
cd gym
pip install -e .
cd ..
and pybullet-gym
git clone https://github.com/benelot/pybullet-gym.git
cd pybullet-gym
pip install -e .
cd ../../../
with that the installation of all required modules is complete.
Now to run a training run the train.py
script which in turn is using the small lib implementing the PPO algorithm.
python3 src/train.py
Most of the settings for the training are contained in the train.py
script as global variables.
The following contains a brief description for each of the available settings
SAVE_MODEL_FREQ
: model is saved everySAVE_MODEL_FREQ
LOG_FREQ
: log to stdout everyLOG_FREQ
iterations, note that no logging lib is used but simple print outputTRAIN_STEPS
: total number of training iterations, note that this does not directly correspond to the number of training steps often found in literature as each train_step corresponds to a total ofN * T
timesteps but also may depend on the environment sequence lengthENV_IDX
: Specifies which environment is used, seeENV_NAMES
for an enumeration of the respective environmentsRENDER_ENV
: Specifies if the environment is rendered during training, note that the pybulletgym rendering works different from the openai-gym rendering as theenv.render()
function must only be called once!
The hyperparameters can theoretically be read from a yaml in the same format as shown in link.
Note that not all options are currently implemented and the read_yaml()
is currently unused!
The simplest way is to manually adjust the default initialization in the ctor of the class.
The following lists the hyperparameters for the environments that were considered in the report.
Hyperparameter | Pendulum-v0 | AntPyBulletEnv-v0 |
---|---|---|
epsilon_clip |
0.2 | 0.2 |
gamma |
0.99 | 0.99 |
beta |
1.0 | 1.0 |
d_target_ratio |
1.5 | 1.5 |
d_target |
0.25 | 0.25 |
var |
0.5 | 0.1 |
N |
2048 | 2048 |
T |
200 | 32 |
K |
10 | 10 |
numeric_stable |
1e-10 | 1e-10 |
base_lr |
3e-4 | 2.5e-4 |
The implementation is using tensorboard to log the loss function values of the policy and value function network.
Additionally, the processing time, average episode length and average episodic reward are all logged.
The default output location is src/res/log_dir
.
To visualize the data in tensorboard simply start a server with the log directory
tensorboard --port 6006 --logdir srd/res/log_dir
The default path for model is src/res/model
each model consists of the two MLP checkpoints policy_net.pth
and value_net.pth
respectively which are simply saved and loaded using the PyTorch library.
Additionally, the normalization of the environment must be preserved which is why a vec_normalize.pkl
is also present.
To save and load a model simply use the functionality provided in the PPO.py
class: load_model()
, save_model()
.
The report including all resources can be found in the report
directory.
The LaTeX code is compiled using latexmk
, thus it is assumed that it is installed alongside all required LaTeX packages.
All of them should be contained in a full texlive
installation.
The easiest way to build the report is using the provided Makefile
.
To clean up the generated build files use make clean
.