If you already have access to the cluster and AMI, see README-with-ami instead.
First, on your local machine:
- Make sure Python 3 is installed on the local machine.Then install Ray version
0.8.0
and boto with:pip install ray==0.8.0 boto3
- Configure your AWS credentials (
aws_access_key_id
andaws_secret_access_key
) in~/.aws/credentials
as described here. Your~/.aws/credentials
should look like the following:Change the permission of this file:[default] aws_access_key_id=XXXXXXXX aws_secret_access_key=YYYYYYYY
chmod 600 ~/.aws/credentials
Please contact Zhuohan Li ([email protected]) for configured AMI. See following for the instructions to setup the cluster from scratch:
Start an AWS node with initial.yaml
and connect to the node:
ray up initial.yaml
ray attach initial.yaml
You should have sshed into an AWS instance now, the following commands are executed on the AWS instance:
- Clone Hoplite, install dependancies, and then compile Hoplite:
Note that Hoplite should be compiled before activating conda environment, otherwise the Protobuf library in the conda environment will cause compilation errors.
cd ~ git clone https://github.com/suquark/hoplite.git cd hoplite ./install_dependencies.sh mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release .. make -j
- Activate conda environment:
conda activate echo "conda activate" >> ~/.bashrc
- Install ray and tensorflow:
pip install ray[all]==0.8.0 tensorflow==2.0.0
- Install Hoplite Python library:
cd ~/hoplite pip install -e python cp build/notification python/hoplite/ ./python/setup.sh
- Install the modified rllib library:
cd ~ git clone https://github.com/zhuohan123/hoplite-rllib.git cd hoplite-rllib git checkout artifact python python/ray/setup-dev.py # Setup symlink for rllib. Only set symlink for RLLib and don't set symlink for any other components (reply Y for the first option and reply n for all other).
- Setup ssh key:
ssh-keygen cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- Create an AMI on AWS console. See EC2 -> Instances -> Actions -> Image and templates -> Create image. Set the image name (e.g.
hoplite-artifact-rllib-ami
) and then create image. - Go to AMIs tab on AWS console. When the AMI is ready, turn off the instance via:
ray down initial.yaml
- Create a placement group on the AWS Management Console. See EC2 -> Placement Groups. Choose the
Cluster
placement strategy. This can make sure the interconnection bandwidth among different nodes in the cluster are high. - Replace the
{image-id}
incluster.yaml
with the AMI-id you just created and{group-name}
with the placement group name you just created. Start the cluster and connect to the head node via:ray up cluster.yaml ray attach cluster.yaml
- Move to the running scripts directory:
cd ~/hoplite-rllib/hoplite-scripts
- Generate the cluster configuration:
python a3c_generate_config.py python impala_generate_config.py
- Test all configurations:
./test_all_generated.sh
- After all experiments finished, we can get the results via:
The results will be in the format of:
python a3c_parse_log.py python impala_parse_log.py
#nodes / - / Hoplite or Ray / Throughput (mean) / Throughput (std)