Name		Name	Last commit message	Last commit date
parent directory ..
README-with-ami.md		README-with-ami.md
README.md		README.md
cluster.yaml		cluster.yaml
example.yaml		example.yaml
initial.yaml		initial.yaml

README.md

Reproducing RLLib experiments in Hoplite on AWS.

If you already have access to the cluster and AMI, see README-with-ami instead.

Setup Local Environment (About 5 min)

First, on your local machine:

Make sure Python 3 is installed on the local machine.Then install Ray version 0.8.0 and boto with:
```
pip install ray==0.8.0 boto3
```
Configure your AWS credentials (aws_access_key_id and aws_secret_access_key) in ~/.aws/credentials as described here. Your ~/.aws/credentials should look like the following:
```
[default]
aws_access_key_id=XXXXXXXX
aws_secret_access_key=YYYYYYYY
```
Change the permission of this file:
```
chmod 600 ~/.aws/credentials
```

Setup AMI (About 20 min)

Please contact Zhuohan Li ([email protected]) for configured AMI. See following for the instructions to setup the cluster from scratch:

Start an AWS node with initial.yaml and connect to the node:

ray up initial.yaml
ray attach initial.yaml

You should have sshed into an AWS instance now, the following commands are executed on the AWS instance:

Clone Hoplite, install dependancies, and then compile Hoplite:
```
cd ~
git clone https://github.com/suquark/hoplite.git
cd hoplite
./install_dependencies.sh
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
```
Note that Hoplite should be compiled before activating conda environment, otherwise the Protobuf library in the conda environment will cause compilation errors.

Activate conda environment:

conda activate
echo "conda activate" >> ~/.bashrc

Install ray and tensorflow:

pip install ray[all]==0.8.0 tensorflow==2.0.0

Install Hoplite Python library:

cd ~/hoplite
pip install -e python
cp build/notification python/hoplite/
./python/setup.sh

Install the modified rllib library:

cd ~
git clone https://github.com/zhuohan123/hoplite-rllib.git
cd hoplite-rllib
git checkout artifact
python python/ray/setup-dev.py # Setup symlink for rllib. Only set symlink for RLLib and don't set symlink for any other components (reply Y for the first option and reply n for all other).

Setup ssh key:

ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Create an AMI on AWS console. See EC2 -> Instances -> Actions -> Image and templates -> Create image. Set the image name (e.g. hoplite-artifact-rllib-ami) and then create image.
Go to AMIs tab on AWS console. When the AMI is ready, turn off the instance via:

ray down initial.yaml

Start the Cluster and Evaluate (About 30 min)

Create a placement group on the AWS Management Console. See EC2 -> Placement Groups. Choose the Cluster placement strategy. This can make sure the interconnection bandwidth among different nodes in the cluster are high.
Replace the {image-id} in cluster.yaml with the AMI-id you just created and {group-name} with the placement group name you just created. Start the cluster and connect to the head node via:
```
ray up cluster.yaml
ray attach cluster.yaml
```
Move to the running scripts directory:
```
cd ~/hoplite-rllib/hoplite-scripts
```

Generate the cluster configuration:

python a3c_generate_config.py
python impala_generate_config.py

Test all configurations:
```
./test_all_generated.sh
```

After all experiments finished, we can get the results via:

python a3c_parse_log.py
python impala_parse_log.py

The results will be in the format of:

#nodes / - / Hoplite or Ray / Throughput (mean) / Throughput (std)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rllib

rllib

README.md

Reproducing RLLib experiments in Hoplite on AWS.

Setup Local Environment (About 5 min)

Setup AMI (About 20 min)

Start the Cluster and Evaluate (About 30 min)

Files

rllib

Directory actions

More options

Directory actions

More options

Latest commit

History

rllib

Folders and files

parent directory

README.md

Reproducing RLLib experiments in Hoplite on AWS.

Setup Local Environment (About 5 min)

Setup AMI (About 20 min)

Start the Cluster and Evaluate (About 30 min)