-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #845 from lyd911/master
adding mindspore example
- Loading branch information
Showing
4 changed files
with
154 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# MindSpore Volcano Example | ||
|
||
#### These examples shows how to run MindSpore via Volcano. Since MindSpore itself is relatively new, these examples maybe oversimplified, but will evolve with both communities. | ||
|
||
## Introduction of MindSpore | ||
|
||
MindSpore is a new open source deep learning training/inference framework that | ||
could be used for mobile, edge and cloud scenarios. MindSpore is designed to | ||
provide development experience with friendly design and efficient execution for | ||
the data scientists and algorithmic engineers, native support for Ascend AI | ||
processor, and software hardware co-optimization. | ||
|
||
MindSpore is open sourced on both [Github](https://github.com/mindspore-ai/mindspore ) and [Gitee](https://gitee.com/mindspore/mindspore ). | ||
|
||
## Prerequisites | ||
|
||
These two examples are tested under below env: | ||
|
||
- Ubuntu: `16.04.6 LTS` | ||
- docker: `v18.06.1-ce` | ||
- Kubernetes: `v1.16.6` | ||
- NVIDIA Docker: `2.3.0` | ||
- NVIDIA/k8s-device-plugin: `1.0.0-beta6` | ||
- NVIDIA drivers: `418.39` | ||
- CUDA: `10.1` | ||
|
||
## MindSpore CPU example | ||
|
||
Using a modified MindSpore CPU image as the container image which | ||
trains LeNet with MNIST dataset. | ||
|
||
pull image: `docker pull lyd911/mindspore-cpu-example:0.2.0` | ||
to run: `kubectl apply -f mindspore-cpu.yaml` | ||
to check the result: `kubectl logs mindspore-cpu-pod-0` | ||
|
||
## MindSpore GPU example | ||
|
||
Using a modified image which the openssh-server is installed from | ||
the official MindSpore GPU image. To check the eligibility of | ||
MindSpore GPU's ability to communicate with other processes, we | ||
leverage the mpimaster and mpiworker task spec of Volcano. In this | ||
example, we launch one mpimaster and two mpiworkers, the python script | ||
is taken from [MindSpore Gitee README](https://gitee.com/mindspore/mindspore/blob/master/README.md ), which is also modified to be | ||
able to run parallelly. | ||
|
||
pull image: `docker pull lyd911/mindspore-gpu-example:0.2.0` | ||
to run: `kubectl apply -f mindspore-gpu.yaml` | ||
to check result: `kubectl logs mindspore-gpu-mpimster-0` | ||
|
||
The expected output should be (2*3) of multi-dimensional array. | ||
|
||
## Future | ||
|
||
An end to end example of training a network using MindSpore on | ||
distributed GPU via Volcano is expected in the future. |
32 changes: 32 additions & 0 deletions
32
example/MindSpore-example/mindspore_cpu/mindspore-cpu.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
apiVersion: batch.volcano.sh/v1alpha1 | ||
kind: Job | ||
metadata: | ||
name: mindspore-cpu | ||
spec: | ||
minAvailable: 1 | ||
schedulerName: volcano | ||
policies: | ||
- event: PodEvicted | ||
action: RestartJob | ||
plugins: | ||
ssh: [] | ||
env: [] | ||
svc: [] | ||
maxRetry: 5 | ||
queue: default | ||
tasks: | ||
- replicas: 8 | ||
name: "pod" | ||
template: | ||
spec: | ||
containers: | ||
- command: ["/bin/bash", "-c", "python /tmp/lenet.py"] | ||
image: lyd911/mindspore-cpu-example:0.2.0 | ||
imagePullPolicy: IfNotPresent | ||
name: mindspore-cpu-job | ||
resources: | ||
limits: | ||
cpu: "1" | ||
requests: | ||
cpu: "1" | ||
restartPolicy: OnFailure |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
import numpy as np | ||
import mindspore.context as context | ||
from mindspore import Tensor | ||
from mindspore.ops import functional as F | ||
from mindspore.communication.management import init, get_rank, get_group_size | ||
|
||
init('nccl') | ||
context.set_context(device_target="GPU") | ||
context.set_auto_parallel_context(parallel_mode="data_parallel", mirror_mean=True, device_num=get_group_size()) | ||
|
||
x = Tensor(np.ones([1,3,3,4]).astype(np.float32)) | ||
y = Tensor(np.ones([1,3,3,4]).astype(np.float32)) | ||
print(F.tensor_add(x, y)) |
54 changes: 54 additions & 0 deletions
54
example/MindSpore-example/mindspore_gpu/mindspore-gpu.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
apiVersion: batch.volcano.sh/v1alpha1 | ||
kind: Job | ||
metadata: | ||
name: mindspore-gpu | ||
spec: | ||
minAvailable: 3 | ||
schedulerName: volcano | ||
plugins: | ||
ssh: [] | ||
svc: [] | ||
tasks: | ||
- replicas: 1 | ||
name: mpimaster | ||
template: | ||
spec: | ||
containers: | ||
- command: | ||
- /bin/bash | ||
- -c | ||
- | | ||
mkdir -p /var/run/sshd; /usr/sbin/sshd; | ||
MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`; | ||
sleep 10; | ||
mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 --prefix /usr/local/openmpi-3.1.5 python /tmp/gpu-test.py; | ||
sleep 3600; | ||
image: lyd911/mindspore-gpu-example:0.2.0 | ||
name: mpimaster | ||
ports: | ||
- containerPort: 22 | ||
name: mpijob-port | ||
workingDir: /home | ||
restartPolicy: OnFailure | ||
- replicas: 2 | ||
name: mpiworker | ||
template: | ||
spec: | ||
containers: | ||
- command: | ||
- /bin/bash | ||
- -c | ||
- | | ||
mkdir -p /var/run/sshd; /usr/sbin/sshd -D; | ||
image: lyd911/mindspore-gpu-example:0.2.0 | ||
name: mpiworker | ||
resources: | ||
limits: | ||
nvidia.com/gpu: "1" | ||
ports: | ||
- containerPort: 22 | ||
name: mpijob-port | ||
workingDir: /home | ||
restartPolicy: OnFailure | ||
|
||
--- |