Skip to content

Latest commit

 

History

History

rllib

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Reproducing RLLib experiments in Hoplite on AWS.

If you already have access to the cluster and AMI, see README-with-ami instead.

Setup Local Environment (About 5 min)

First, on your local machine:

  1. Make sure Python 3 is installed on the local machine.Then install Ray version 0.8.0 and boto with:
    pip install ray==0.8.0 boto3
  2. Configure your AWS credentials (aws_access_key_id and aws_secret_access_key) in ~/.aws/credentials as described here. Your ~/.aws/credentials should look like the following:
    [default]
    aws_access_key_id=XXXXXXXX
    aws_secret_access_key=YYYYYYYY
    
    Change the permission of this file:
    chmod 600 ~/.aws/credentials

Setup AMI (About 20 min)

Please contact Zhuohan Li ([email protected]) for configured AMI. See following for the instructions to setup the cluster from scratch:

Start an AWS node with initial.yaml and connect to the node:

ray up initial.yaml
ray attach initial.yaml

You should have sshed into an AWS instance now, the following commands are executed on the AWS instance:

  1. Clone Hoplite, install dependancies, and then compile Hoplite:
    cd ~
    git clone https://github.com/suquark/hoplite.git
    cd hoplite
    ./install_dependencies.sh
    mkdir build
    cd build
    cmake -DCMAKE_BUILD_TYPE=Release ..
    make -j
    Note that Hoplite should be compiled before activating conda environment, otherwise the Protobuf library in the conda environment will cause compilation errors.
  2. Activate conda environment:
    conda activate
    echo "conda activate" >> ~/.bashrc
  3. Install ray and tensorflow:
    pip install ray[all]==0.8.0 tensorflow==2.0.0
  4. Install Hoplite Python library:
    cd ~/hoplite
    pip install -e python
    cp build/notification python/hoplite/
    ./python/setup.sh
  5. Install the modified rllib library:
    cd ~
    git clone https://github.com/zhuohan123/hoplite-rllib.git
    cd hoplite-rllib
    git checkout artifact
    python python/ray/setup-dev.py # Setup symlink for rllib. Only set symlink for RLLib and don't set symlink for any other components (reply Y for the first option and reply n for all other).
  6. Setup ssh key:
    ssh-keygen
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  7. Create an AMI on AWS console. See EC2 -> Instances -> Actions -> Image and templates -> Create image. Set the image name (e.g. hoplite-artifact-rllib-ami) and then create image.
  8. Go to AMIs tab on AWS console. When the AMI is ready, turn off the instance via:
ray down initial.yaml

Start the Cluster and Evaluate (About 30 min)

  1. Create a placement group on the AWS Management Console. See EC2 -> Placement Groups. Choose the Cluster placement strategy. This can make sure the interconnection bandwidth among different nodes in the cluster are high.
  2. Replace the {image-id} in cluster.yaml with the AMI-id you just created and {group-name} with the placement group name you just created. Start the cluster and connect to the head node via:
    ray up cluster.yaml
    ray attach cluster.yaml
  3. Move to the running scripts directory:
    cd ~/hoplite-rllib/hoplite-scripts
  4. Generate the cluster configuration:
    python a3c_generate_config.py
    python impala_generate_config.py
  5. Test all configurations:
    ./test_all_generated.sh
  6. After all experiments finished, we can get the results via:
    python a3c_parse_log.py
    python impala_parse_log.py
    The results will be in the format of:
    #nodes / - / Hoplite or Ray / Throughput (mean) / Throughput (std)