Skip to content

Commit

Permalink
Chore: Reorg dockerfiles and fix docs (#51)
Browse files Browse the repository at this point in the history
  • Loading branch information
jaywonchung authored Apr 28, 2024
1 parent 1f28222 commit d42cb2c
Show file tree
Hide file tree
Showing 12 changed files with 16 additions and 16 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/push_docker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ on:
- 'zeus/**'
- 'zeus_monitor/**'
- '.dockerignore'
- 'Dockerfile'
- 'docker/Dockerfile'
- 'LICENSE'
- 'setup.py'
- 'pyproject.toml'
Expand Down Expand Up @@ -52,7 +52,7 @@ jobs:
uses: docker/build-push-action@v3
with:
context: .
file: Dockerfile
file: docker/Dockerfile
builder: ${{ steps.buildx.outputs.name }}
push: true
tags: ${{ steps.meta.outputs.tags }}
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ services:
server:
image: bso-server
build:
context: ../
dockerfile: ./docker/bso_server.Dockerfile
context: ../../
dockerfile: ./docker/batch_size_optimizer/server.Dockerfile
container_name: bso
restart: always
environment:
Expand Down Expand Up @@ -55,8 +55,8 @@ services:
migration:
image: bso-migration
build:
context: ../
dockerfile: ./docker/bso_migration.Dockerfile
context: ../../
dockerfile: ./docker/batch_size_optimizer/migration.Dockerfile
deploy:
restart_policy:
condition: on-failure
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/batch_size_optimizer/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ sequenceDiagram;

```Shell
# From the root directory
docker build -f ./docker/bso_server.Dockerfile -t bso-server .
docker build -f ./docker/bso_migration.Dockerfile -t bso-migration .
docker build -f ./docker/batch_size_optimizer/server.Dockerfile -t bso-server .
docker build -f ./docker/batch_size_optimizer/migration.Dockerfile -t bso-migration .
```

2. Create Kubernetes yaml files using Kompose. Kompose is a tool that converts docker-compose files into Kubernetes files. For more information, visit [Kompose Reference](#kompose-references)
Expand Down Expand Up @@ -146,7 +146,7 @@ sequenceDiagram;
### Remark about the server

Zeus Batch Size Optimizer server is using Sqlalchemy to support various types of databases. However, you need to download the corresponding async connection driver.
As a default, we are using Mysql. You can add installation code to `bso_migration.Dockerfile` and `bso_server.Dockerfile`. Refer to those files for reference.
As a default, we are using Mysql. You can add installation code to `docker/batch_size_optimizer/migration.Dockerfile` and `docker/batch_size_optimizer/server.Dockerfile`. Refer to those files for reference.

## Use BSO in your training script (Client)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Batch size optimzer is composed of two parts: server and client. Client will be
## Data parallel training with Zeus

In the case of data parallel training, Batch size optimizer should be able to give the consistent batch size to all gpus. Since there is no way for batch size to tell the differences between concurrent job submissions and multiple GPU training, we ask users to send a request from a single GPU and broadcast the result(batch size, trial number) to other GPUs. In the case of reporting the result to the batch size optimizer server and receiving the corresponding result (train fail or succeeded) can be dealt by the server since it has the `trial_number`. Thus, report doesn't require any broadcast or communications with other GPUs.
Refer to the `examples/bso_server/mnist_dp.py` for the use case.
Refer to the `examples/batch_size_optimizer/mnist_dp.py` for the use case.

## Kubeflow

Expand All @@ -19,7 +19,7 @@ Kubeflow is a tool to easily deploy your ML workflows to kubernetes. We provides

```Shell
# From project root directory
docker build -f ./examples/bso_server/mnist.Dockerfile -t mnist-example .
docker build -f ./examples/batch_size_optimizer/mnist.Dockerfile -t mnist-example .
```

3. Deploy training script.
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ spec:
imagePullPolicy: Never
command:
- "python3"
- "/workspace/examples/bso_server/mnist_dp.py"
- "/workspace/examples/batch_size_optimizer/mnist_dp.py"
- "--epochs=5"
- "--backend=nccl"
env:
Expand All @@ -38,7 +38,7 @@ spec:
imagePullPolicy: Never
command:
- "python3"
- "/workspace/examples/bso_server/mnist_dp.py"
- "/workspace/examples/batch_size_optimizer/mnist_dp.py"
- "--epochs=5"
- "--backend=nccl"
env:
Expand All @@ -48,4 +48,4 @@ spec:
value: "mnist-dev-dp-2"
securityContext:
capabilities:
add: ["SYS_ADMIN"]
add: ["SYS_ADMIN"]
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ spec:
imagePullPolicy: Never
command:
- "python3"
- "/workspace/examples/bso_server/mnist_single_gpu.py"
- "/workspace/examples/batch_size_optimizer/mnist_single_gpu.py"
- "--epochs=5"
env:
- name: ZEUS_SERVER_URL
Expand All @@ -27,4 +27,4 @@ spec:
value: "INFO"
securityContext:
capabilities:
add: ["SYS_ADMIN"]
add: ["SYS_ADMIN"]
File renamed without changes.

0 comments on commit d42cb2c

Please sign in to comment.