Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README with GKE parallelstore related example blueprint details #3409

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1518,6 +1518,30 @@ cleaned up when the job is deleted.

[storage-gke.yaml]: ../examples/storage-gke.yaml

### [gke-storage-managed-parallelstore.yaml] ![core-badge] ![experimental-badge]

This blueprint shows how to use managed parallelstore storage options with GKE in the toolkit.

The blueprint contains the following:

* A K8s Job that uses a managed parallelstore storage volume option.
* A K8s Job that demonstrates ML training workload with managed parallelstore storage disk operation.

> **Warning**: In this example blueprint, when storage type `Parallelstore` is specified in `gke-storage` module.
> The lifecycle of the parallelstore is managed by the blueprint.
> On glcuster destroy operation, the Parallelstore storage created will also be destroyed.
>
> [!Note]
> The Kubernetes API server will only allow requests from authorized networks.
> The `gke-cluster` module needs access to the Kubernetes API server
> to create a Persistent Volume and a Persistent Volume Claim. **You must use
> the `authorized_cidr` variable to supply an authorized network which contains
> the IP address of the machine deploying the blueprint, for example
> `--vars authorized_cidr=<your-ip-address>/32`.** You can use a service like
> [whatismyip.com](https://whatismyip.com) to determine your IP address.

[gke-storage-managed-parallelstore.yaml]: ../examples/gke-storage-managed-parallelstore.yaml

### [gke-a3-megagpu.yaml] ![core-badge] ![experimental-badge]

This blueprint shows how to provision a GKE cluster with A3 Mega machines in the toolkit.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
blueprint_name: gke-storage-parallelstore
blueprint_name: gke-storage-managed-parallelstore
vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: gke-storage-ps
deployment_name: gke-storage-managed-ps
region: us-central1
zone: us-central1-c
# Cidr block containing the IP of the machine calling terraform.
Expand Down
2 changes: 1 addition & 1 deletion modules/file-system/gke-storage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ then use them in a `gke-job-template` to dynamically provision the resource.
```

See example
[gke-storage-parallelstore.yaml](../../../examples/README.md#gke-storage-parallelstoreyaml--) blueprint
[gke-storage-managed-parallelstore.yaml](../../../examples/README.md#gke-storage-managed-parallelstoreyaml--) blueprint
for a complete example.

### Authorized Network
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ timeout: 14400s # 4hr

steps:
## Test GKE
- id: gke-storage-parallelstore
- id: gke-storage-managed-parallelstore
name: us-central1-docker.pkg.dev/$PROJECT_ID/hpc-toolkit-repo/test-runner
entrypoint: /bin/bash
env:
Expand All @@ -40,7 +40,7 @@ steps:
cd /workspace && make
BUILD_ID_FULL=$BUILD_ID
BUILD_ID_SHORT=$${BUILD_ID_FULL:0:6}
SG_EXAMPLE=examples/gke-storage-parallelstore.yaml
SG_EXAMPLE=examples/gke-storage-managed-parallelstore.yaml

# adding vm to act as remote node
echo ' - id: remote-node' >> $${SG_EXAMPLE}
Expand All @@ -58,4 +58,4 @@ steps:

ansible-playbook tools/cloud-build/daily-tests/ansible_playbooks/base-integration-test.yml \
--user=sa_106486320838376751393 --extra-vars="project=${PROJECT_ID} build=$${BUILD_ID_SHORT}" \
--extra-vars="@tools/cloud-build/daily-tests/tests/gke-storage-parallelstore.yml"
--extra-vars="@tools/cloud-build/daily-tests/tests/gke-storage-managed-parallelstore.yml"
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
test_name: gke-storage-parallelstore
deployment_name: gke-storage-parallelstore-{{ build }}
test_name: gke-storage-managed-parallelstore
deployment_name: gke-storage-managed-parallelstore-{{ build }}
zone: us-central1-a # for remote node
region: us-central1
workspace: /workspace
blueprint_yaml: "{{ workspace }}/examples/gke-storage-parallelstore.yaml"
blueprint_yaml: "{{ workspace }}/examples/gke-storage-managed-parallelstore.yaml"
network: "{{ deployment_name }}-net"
remote_node: "{{ deployment_name }}-0"
post_deploy_tests:
- test-validation/test-gke-storage-parallelstore.yml
- test-validation/test-gke-storage-managed-parallelstore.yml
custom_vars:
project: "{{ project }}"
cli_deployment_vars:
Expand Down
Loading