Update python-api-walkthrough.md (#398)

Co-authored-by: meredithslota <[email protected]> Co-authored-by: Lo Ferris <[email protected]>
GoogleCloudPlatform · Apr 14, 2022 · d864ffa · d864ffa
1 parent cca6c6a
commit d864ffa
Showing 1 changed file with 74 additions and 92 deletions.
diff --git a/dataproc/snippets/python-api-walkthrough.md b/dataproc/snippets/python-api-walkthrough.md
@@ -13,10 +13,9 @@ As you follow this walkthrough, you run Python code that calls
 [Dataproc gRPC APIs](https://cloud.google.com/dataproc/docs/reference/rpc/)
 to:
 
-* create a Dataproc cluster
-* submit a small PySpark word sort job to run on the cluster
-* get job status
-* tear down the cluster after job completion
+* Create a Dataproc cluster
+* Submit a PySpark word sort job to the cluster
+* Delete the cluster after job completion
 
 ## Using the walkthrough
 
@@ -32,144 +31,127 @@ an explanation of how the code works.
 
     cloudshell launch-tutorial python-api-walkthrough.md
 
-**To copy and run commands**: Click the "Paste in Cloud Shell" button
+**To copy and run commands**: Click the "Copy to Cloud Shell" button
   (<walkthrough-cloud-shell-icon></walkthrough-cloud-shell-icon>)
   on the side of a code box, then press `Enter` to run the command.
 
 ## Prerequisites (1)
 
-<walkthrough-watcher-constant key="project_id" value="<project_id>"
-></walkthrough-watcher-constant>
+<walkthrough-watcher-constant key="project_id" value="<project_id>"></walkthrough-watcher-constant>
 
 1. Create or select a Google Cloud project to use for this
-tutorial.
-* <walkthrough-project-setup billing="true"></walkthrough-project-setup>
+   tutorial.
+   * <walkthrough-project-setup billing="true"></walkthrough-project-setup>
 
 1. Enable the Dataproc, Compute Engine, and Cloud Storage APIs in your
-project.
-```sh
-gcloud services enable dataproc.googleapis.com \
-compute.googleapis.com \
-storage-component.googleapis.com \
---project={{project_id}}
-```
+   project.
+
+    ```bash
+    gcloud services enable dataproc.googleapis.com \
+    compute.googleapis.com \
+    storage-component.googleapis.com \
+    --project={{project_id}}
+    ```
 
 ## Prerequisites (2)
 
 1. This walkthrough uploads a PySpark file (`pyspark_sort.py`) to a
    [Cloud Storage bucket](https://cloud.google.com/storage/docs/key-terms#buckets) in
    your project.
    * You can use the [Cloud Storage browser page](https://console.cloud.google.com/storage/browser)
-   in Google Cloud Platform Console to view existing buckets in your project.
+     in Google Cloud Console to view existing buckets in your project.
 
-   &nbsp;&nbsp;&nbsp;&nbsp;**OR**
+     **OR**
 
    * To create a new bucket, run the following command. Your bucket name must be unique.
-   ```bash
-   gsutil mb -p {{project-id}} gs://your-bucket-name
-   ```
 
-1.  Set environment variables.
+         gsutil mb -p {{project-id}} gs://your-bucket-name
+
 
-    * Set the name of your bucket.
-    ```bash
-    BUCKET=your-bucket-name
-    ```
+2. Set environment variables.
+   * Set the name of your bucket.
+
+         BUCKET=your-bucket-name
 
 ## Prerequisites (3)
 
 1. Set up a Python
-   [virtual environment](https://virtualenv.readthedocs.org/en/latest/)
-   in Cloud Shell.
+   [virtual environment](https://virtualenv.readthedocs.org/en/latest/).
 
     * Create the virtual environment.
-    ```bash
-    virtualenv ENV
-    ```
+
+          virtualenv ENV
+
     * Activate the virtual environment.
-    ```bash
-    source ENV/bin/activate
-    ```
+
+          source ENV/bin/activate
 
-1. Install library dependencies in Cloud Shell.
-    ```bash
-    pip install -r requirements.txt
-    ```
+1. Install library dependencies.
+
+          pip install -r requirements.txt
 
 ## Create a cluster and submit a job
 
 1. Set a name for your new cluster.
-    ```bash
-    CLUSTER=new-cluster-name
-    ```
 
-1. Set a [zone](https://cloud.google.com/compute/docs/regions-zones/#available)
-   where your new cluster will be located. You can change the
-   "us-central1-a" zone that is pre-set in the following command.
-    ```bash
-    ZONE=us-central1-a
-    ```
+         CLUSTER=new-cluster-name
 
-1. Run `submit_job.py` with the `--create_new_cluster` flag
-   to create a new cluster and submit the `pyspark_sort.py` job
-   to the cluster.
+1. Set a [region](https://cloud.google.com/compute/docs/regions-zones/#available)
+   where your new cluster will be located. You can change the pre-set
+   "us-central1" region beforew you copy and run the following command.
 
-    ```bash
-    python submit_job_to_cluster.py \
-    --project_id={{project-id}} \
-    --cluster_name=$CLUSTER \
-    --zone=$ZONE \
-    --gcs_bucket=$BUCKET \
-    --create_new_cluster
-    ```
+         REGION=us-central1
+
+1. Run `submit_job_to_cluster.py` to create a new cluster and run the
+   `pyspark_sort.py` job on the cluster.
+
+         python submit_job_to_cluster.py \
+         --project_id={{project-id}} \
+         --cluster_name=$CLUSTER \
+         --region=$REGION \
+         --gcs_bucket=$BUCKET
 
 ## Job Output
 
-Job output in Cloud Shell shows cluster creation, job submission,
-    job completion, and then tear-down of the cluster.
-
-     ...
-     Creating cluster...
-     Cluster created.
-     Uploading pyspark file to Cloud Storage.
-     new-cluster-name - RUNNING
-     Submitted job ID ...
-     Waiting for job to finish...
-     Job finished.
-     Downloading output file
-     .....
-     ['Hello,', 'dog', 'elephant', 'panther', 'world!']
-     ...
-     Tearing down cluster
-     ```
-## Congratulations on Completing the Walkthrough!
+Job output displayed in the Cloud Shell terminaL shows cluster creation,
+job completion, sorted job output, and then deletion of the cluster.
+
+```xml
+Cluster created successfully: cliuster-name.
+...
+Job finished successfully.
+...
+['Hello,', 'dog', 'elephant', 'panther', 'world!']
+...
+Cluster cluster-name successfully deleted.
+```
+
+## Congratulations on completing the Walkthrough!
 <walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
 
 ---
 
 ### Next Steps:
 
-* **View job details from the Console.** View job details by selecting the
-   PySpark job from the Dataproc 
-=
+* **View job details in the Cloud Console.** View job details by selecting the
+   PySpark job name on the Dataproc 
    [Jobs page](https://console.cloud.google.com/dataproc/jobs)
-   in the Google Cloud Platform Console.
+   in the Cloud console.
 
 * **Delete resources used in the walkthrough.**
-   The `submit_job_to_cluster.py` job deletes the cluster that it created for this
+   The `submit_job_to_cluster.py` code deletes the cluster that it created for this
    walkthrough.
 
-   If you created a bucket to use for this walkthrough,
-   you can run the following command to delete the
-   Cloud Storage bucket (the bucket must be empty).
-   ```bash
-   gsutil rb gs://$BUCKET
-   ```
-   You can run the following command to delete the bucket **and all
-   objects within it. Note: the deleted objects cannot be recovered.**
-   ```bash
-   gsutil rm -r gs://$BUCKET
-   ```
+   If you created a Cloud Storage bucket to use for this walkthrough,
+   you can run the following command to delete the bucket (the bucket must be empty).
+
+         gsutil rb gs://$BUCKET
+
+   * You can run the following command to **delete the bucket and all
+     objects within it. Note: the deleted objects cannot be recovered.**
+
+         gsutil rm -r gs://$BUCKET
+
 
 * **For more information.** See the [Dataproc documentation](https://cloud.google.com/dataproc/docs/)
    for API reference and product feature information.