Squashed commit of the following:

commit f10c384 Author: Jeremy Lewi <[email protected]> Date: Thu Jan 25 17:11:18 2018 -0800 Need to set authenticator param. commit 705067f Author: Jeremy Lewi <[email protected]> Date: Thu Jan 25 15:53:08 2018 -0800 Fix some bugs and start a user guide. commit cbf67a3 Merge: c60c245 fed20a1 Author: Jeremy Lewi <[email protected]> Date: Thu Jan 25 14:31:46 2018 -0800 Sync to head. commit c60c245 Author: Jeremy Lewi <[email protected]> Date: Thu Jan 25 14:20:47 2018 -0800 * Use Envoy as a reverse proxy and for JWT verification. * Don't run Envoy as a side car in the JupyterHub pod. This is cleaner and allows us to have a single reverse proxy for all services. commit fed20a1 Author: Elson Rodriguez <[email protected]> Date: Thu Jan 25 06:28:22 2018 -0800 Changing Jupyter service to ClusterIP by default. (kubeflow#139) * Changing default Service type for Jupyterhub to Cluster IP * Exposing services publicly is a security risk so we want to avoid recommending that since people may not understand the implications * Updated documentation to reflect ClusterIP change for Jupyter. commit 8aaa393 Author: lluunn <[email protected]> Date: Thu Jan 25 06:21:55 2018 -0800 fix guide typo (kubeflow#144) commit 7bb642c Author: Robert Wilkins III <[email protected]> Date: Tue Jan 23 23:05:10 2018 -0600 Update README.md (kubeflow#142) Revised grammer and punctuation changes to the document as detailed below: Added (ML) after "Machine Learning" on line 3 to avoid ambiguation over the "ML" reference on line 18. Changed "best of breed" to "best-of-breed" on line 3 to match language used in Kubernetes/Kubernetes. Removed the comma after GPUs on line 6 to avoid treating a regular sentence as a run-on sentence. Changed "it is" to "it's" on line 23. There's no need to get formal here if the rest of the document has a light conversational feel. Removed an unnecessary comma from Line 23 after (within reason). Added dashes in "easy to use" on line 25 to match formatting from line 3. Added a colon after "using Kubeflow if" on line 30 to match the format previously established in the previous section called "The Kubeflow Mission". Capitalized Kubeflow on line 34 to match casing from line 30. Changed the sentence structure and added a period for line 36 to make more sense rather than a strange run-on sentence. Removed unnecessary newline between lines 36 and 37 to match format established in line 34. Added a period after (see below) on line 39. Added a comma after "GPUs" to avoid a run-on sentence on line 47. Changed the ending comma on line 58 to a colon to follow the established format used previously in this document. Added a comma after Kubeflow on line 83 and added a period to finish the sentence. Moved the comma after "using" to after "GKE" in line 109for the sentence to make sense. Added a dash for "in depth" on line to follow the document's established format. commit 80cb162 Author: Neeraj Kashyap <[email protected]> Date: Mon Jan 22 05:31:32 2018 -0800 Client script for inception model server (kubeflow#92) Added a script that allows users to run the hosted inception model on images on their local filesystems or on Google Cloud Storage. This is, with only very slight modifications for readability, the same as the client provided by TensorFlow Serving - https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_client.py As such, I am completely okay with us just linking to their script. My initial intention was to make this as a notebook, but the problem is that the tensorflow-serving-api Python package is only available for Python 2 and the kubeflow-core environment only offers a Python 3 backend for the Jupyter notebook. This is therefore a stopgap until I can introduce an appropriate image in place of the one used by kubeflow-core. * Changed model serving service type to ClusterIP from LoadBalancer * Added instructions for exposing service IP commit bde2ddc Author: Putra Manggala <[email protected]> Date: Fri Jan 19 17:34:59 2018 -0500 Fix datascientists typo (kubeflow#137) commit 698cc67 Author: Jeremy Lewi <[email protected]> Date: Thu Jan 18 20:45:49 2018 -0800 Make Argo UI available publicly at testing-argo.kubeflow.ui (kubeflow#132) * We use Argo to run our E2E tests so the UI is very useful for debugging tests. * Add an ingress with a static IP to expose it publicly. * Fix kubeflow#131 commit 55c220d Merge: 11d989c ca95a0d Author: Jeremy Lewi <[email protected]> Date: Wed Jan 17 21:24:58 2018 -0800 Merge remote-tracking branch 'github/iap' into iap commit 11d989c Merge: 8e6fb87 4c9217d Author: Jeremy Lewi <[email protected]> Date: Wed Jan 17 21:24:34 2018 -0800 Resolve conflicts. commit 4c9217d Author: Jeremy Lewi <[email protected]> Date: Wed Jan 17 20:50:24 2018 -0800 Fix TfJob operator roles and TfCNN prototype (kubeflow#130) * Fix the TFCNN prototype; the termination policy wasn't being properly set * Create service accounts and role bindings for the TfJob operator and UI * Fix kubeflow#129 TfCnn template doesn't set termination policy correctly * Fix kubeflow#125 Missing roles for tf-job operator * Fix kubeflow#95; presubmits/postsubmits need to use the code at the commit we checked out *We do this by replacing the directory in vendor with a symbolic link to where we checked out the source. * It looks like using "--as" with ksonnet leads to strange errors about the server not being able to create the config map * If we don't use "--as" need to fetch credentials a second time or else we get RBAC issues creating the cluster
jlewi · Jan 26, 2018 · 8fdcfcf · 8fdcfcf
1 parent 82f53fc
commit 8fdcfcf
Show file tree

Hide file tree

Showing 20 changed files with 1,004 additions and 355 deletions.
diff --git a/README.md b/README.md
@@ -1,49 +1,50 @@
 # Kubeflow
 
-[Prow test dashboard](https://k8s-testgrid.appspot.com/sig-big-data)
-[Prow jobs dashboard](https://prow.k8s.io/?repo=google%2Fkubeflow)
-
-The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. Our goal is **not** to recreate other services, but to provide a straightforward way for spinning up best of breed OSS solutions. Contained in this repository are manifests for creating:
+The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. Our goal is **not** to recreate other services, but to provide a straightforward way for spinning up best-of-breed OSS solutions. Contained in this repository are manifests for creating:
 
 * A JupyterHub to create & manage interactive Jupyter notebooks
-* A Tensorflow Training Controller that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting
+* A Tensorflow Training Controller that can be configured to use CPUs or GPUs and adjusted to the size of a cluster with a single setting
 * A TF Serving container
 
 This document details the steps needed to run the Kubeflow project in any environment in which Kubernetes runs.
 
+## Quick Links
+* [Prow test dashboard](https://k8s-testgrid.appspot.com/sig-big-data)
+* [Prow jobs dashboard](https://prow.k8s.io/?repo=google%2Fkubeflow)
+* [Argo UI for E2E tests](http://testing-argo.kubeflow.io)
+
 ## The Kubeflow Mission
 
 Our goal is to help folks use ML more easily, by letting Kubernetes to do what it's great at:
 - Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster <-> production cluster)
 - Deploying and managing loosely-coupled microservices
 - Scaling based on demand
 
-Because ML practitioners use so many different types of tools, it is a key goal that you can customize the stack to whatever your requirements (within reason), and let the system take care of the "boring stuff." While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.
+Because ML practitioners use so many different types of tools, it's a key goal that you can customize the stack to whatever your requirements (within reason) and let the system take care of the "boring stuff." While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.
 
 Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack _anywhere_ Kubernetes is already running and can self configure based on the cluster it deploys into.
 
 
 ## Who should consider using Kubeflow?
 
-Based on the current functionality you should consider using Kubeflow if
+Based on the current functionality you should consider using Kubeflow if:
 
   * You want to train/serve TensorFlow models in different environments (e.g. local, on prem, and cloud)
   * You want to use Jupyter notebooks to manage TensorFlow training jobs
        * kubeflow is particularly helpful if you want to launch training jobs that use more resources (more nodes or more GPUs) than your notebook.
   * You want to combine TensorFlow with other processes
-       * For example if you want to use [tensorflow/agents](https://github.com/tensorflow/agents) to run simulations to generate data for training
-         reinforcement learning models
+       * For example, you may want to use [tensorflow/agents](https://github.com/tensorflow/agents) to run simulations to generate data for training reinforcement learning models.
 
 This list is based ONLY on current capabilities. We are investing significant resources to expand the
-functionality and actively soliciting help from companies and inviduals interested in contributing (see [below](README.md#who-should-consider-contributing-to-kubeflow))
+functionality and actively soliciting help from companies and inviduals interested in contributing (see [below](README.md#who-should-consider-contributing-to-kubeflow)).
 
 ## Setup
 
 This documentation assumes you have a Kubernetes cluster already available. 
 
 If you need help setting up a Kubernetes cluster please refer to [Kubernetes Setup](https://kubernetes.io/docs/setup/).
 
-If you want to use GPUs be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
+If you want to use GPUs, be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
 
 ## Quick Start
 
@@ -54,7 +55,7 @@ If you want to use GPUs be sure to follow the Kubernetes [instructions for enabl
 
 ### Steps
 
-In order to quickly set up all components, execute the following commands,
+In order to quickly set up all components, execute the following commands:
 
 ```commandline
 # Initialize a ksonnet APP
@@ -79,7 +80,7 @@ provide prototypes that can be used to configure TensorFlow jobs and deploy Tens
 Used together, these make it easy for a user go from training to serving using Tensorflow with minimal
 effort in a portable fashion between different environments. 
 
-For more detailed instructions about how to use Kubeflow please refer to the [user guide](user_guide.md)
+For more detailed instructions about how to use Kubeflow, please refer to the [user guide](user_guide.md).
 
 ## Troubleshooting
 
@@ -105,12 +106,12 @@ kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --us
 
   * Replace `[email protected]` with the user listed in the error message.
 
-If you're using, GKE you may want to refer to [GKE's RBAC docs](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) to understand
+If you're using GKE, you may want to refer to [GKE's RBAC docs](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) to understand
 how RBAC interacts with IAM on GCP.
 
 ## Resources
 
-* [user guide](user_guide.md) provides in depth instructions for using Kubeflow
+* [user guide](user_guide.md) provides in-depth instructions for using Kubeflow
 * Katacoda has produced a [self-paced scenario](https://www.katacoda.com/kubeflow) for learning and trying out Kubeflow
 
 

diff --git a/components/k8s-model-server/README.md b/components/k8s-model-server/README.md
@@ -88,7 +88,7 @@ storage bucket you created above.
 gsutil cp -r inception gs://<bucket-name>
 ```
 
-Use [gsutil_ls](https://cloud.google.com/storage/docs/gsutil/commands/ls) to view the contents of your bucket. You 
+Use [gsutil ls](https://cloud.google.com/storage/docs/gsutil/commands/ls) to view the contents of your bucket. You 
 will see that the contents of the model are stored in the `gs://<bucket-name>/inception/1` directory. This is the 
 first version of the model that we will serve.
 
@@ -163,3 +163,148 @@ You can learn more about [updating a Deployment](https://kubernetes.io/docs/conc
 [scaling a Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment), and 
 [Pod Resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) in the 
 Kubernetes documentation.
+
+
+
+### Use the served model
+
+The [inception-client](./inception-client) directory contains a Python script you can use to make a call against the deployed model.
+
+This script is intended to be run externally to the kubernetes cluster as a demonstration that the inception model is correctly being served.
+You can run the script either directly from a Python2 environment or in a Docker container.
+
+#### Setup
+
+You will require the external IP for the inception service as well as the port it is being hosted on. The inception service should be
+listed under the value you used for the `MODEL_NAME` parameter in the ksonnet component. You can find this information using
+```commandline
+kubectl get services
+NAME         TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)			 AGE
+$MODEL_NAME  LoadBalancer   <INTERNAL IP>   <SERVICE IP>     <SERVICE PORT>:<NODE PORT>  <TIME SINCE DEPLOYMENT>
+```
+
+We will feed the `<SERVICE IP>` and `<SERVICE PORT>` to the labelling script. We will use it to label the following image of a
+cat sleeping on a comforter atop a sofa:
+
+![Cat on comforter on sofa](./inception-client/images/sleeping-pepper.jpg)
+
+You can also use to to label your own images.
+
+#### Running the script directly
+
+You can run the script directly in your local environment if Python2 is available to you. You will not be able to use the script with Python3
+as the [`tensorflow-serving-api` package](https://pypi.python.org/pypi/tensorflow-serving-api)
+is not yet Python3-capable ([Issue #117](https://github.com/google/kubeflow/issues/117)).
+
+If you would like to use a virtual environment, begin by activating your desired environment with your favorite environment manager. Then,
+```commandline
+pip install -r requirements.txt
+```
+
+Run the script as follows:
+
+```commandline
+python label.py -s <SERVICE IP> -p <SERVICE PORT> images/sleeping-pepper.jpg
+```
+
+#### Run in Docker container with publicly exposed service
+
+The [inception-client](./inception-client) directory also contains a [Dockerfile](./inception-client/Dockerfile) that will allow you to
+call out to the inception service from a container. You can run this container on your local machine if you publicly exposed your
+`inception` service. If you would like to do this on GKE, simply run
+
+```commandline
+kubectl edit service inception
+```
+
+and change the service type to `NodePort` or `LoadBalancer`.
+
+From that directory, start by building the image:
+
+```commandline
+docker build -t inception-client .
+```
+
+You can optionally specify a directory containing the JPEG files you would like to label using the
+```commandline
+--build-arg IMAGES_DIR=<path-to-image-directory>
+```
+
+By default, this build uses [inception-client/images](./inception-client/images).
+
+Then run the container with the appropriate cluster information:
+
+```commandline
+docker run -v $(pwd):/data inception-client <SERVICE IP> <SERVICE PORT>
+```
+
+#### Run container on your kubernetes cluster
+
+If your inception service is not publicly exposed, you can also run the client container directly on the kubernetes cluster on which the
+inception model is being served. To do this:
+
+1. Build the docker image as specified above. From the [inception-client](./inception-client) directory:
+```commandline
+docker build -t inception-client .
+```
+
+1. Prefix the tag with your GCR registry:
+```commandline
+GCR_TAG=gcr.io/$(gcloud config get-value project)/inception-client:latest
+docker image tag inception-client:latest $GCR_TAG
+```
+
+1. Push the image to your project's container registry:
+```commandline
+gcloud docker -- push $GCR_TAG
+```
+
+1. Run a container built from that image on your GKE cluster:
+```commandline
+kubectl run -it inception-client --image $GCR_TAG --restart=OnFailure
+```
+
+#### Output
+
+No matter how you run the script, you should see the following output:
+
+```
+outputs {
+  key: "classes"
+  value {
+    dtype: DT_STRING
+    tensor_shape {
+      dim {
+        size: 1
+      }
+      dim {
+        size: 5
+      }
+    }
+    string_val: "sleeping bag"
+    string_val: "Border terrier"
+    string_val: "tabby, tabby cat"
+    string_val: "quilt, comforter, comfort, puff"
+    string_val: "studio couch, day bed"
+  }
+}
+outputs {
+  key: "scores"
+  value {
+    dtype: DT_FLOAT
+    tensor_shape {
+      dim {
+        size: 1
+      }
+      dim {
+        size: 5
+      }
+    }
+    float_val: 8.5159368515
+    float_val: 7.85043668747
+    float_val: 5.88767671585
+    float_val: 5.706138134
+    float_val: 5.55422878265
+  }
+}
+```
diff --git a/components/k8s-model-server/inception-client/Dockerfile b/components/k8s-model-server/inception-client/Dockerfile
@@ -0,0 +1,31 @@
+# Copyright 2018 Google Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM python:2.7.14
+
+RUN pip install --no-cache-dir grpcio tensorflow tensorflow-serving-api
+
+RUN mkdir -p /opt/label /data
+
+WORKDIR /opt/label
+
+COPY label.py ./
+COPY run.sh ./
+
+ARG IMAGES_DIR=images/
+
+ADD $IMAGES_DIR /data/
+
+ENTRYPOINT ["bash", "run.sh"]
+CMD []
diff --git a/components/k8s-model-server/inception-client/images/sleeping-pepper.jpg b/components/k8s-model-server/inception-client/images/sleeping-pepper.jpg
diff --git a/components/k8s-model-server/inception-client/label.py b/components/k8s-model-server/inception-client/label.py
@@ -0,0 +1,82 @@
+# Copyright 2018 Google Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+#!/usr/bin/env python2.7
+
+"""
+Runs the Inception model being served on the kubeflow model server on an image
+that you specify.
+
+Note: This file is a modification of the inception client available on the
+TensorFlow Serving GitHub repository:
+  https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_client.py
+"""
+
+from __future__ import print_function
+
+# This is a placeholder for a Google-internal import.
+
+import argparse
+
+from grpc.beta import implementations
+import tensorflow as tf
+
+from tensorflow_serving.apis import predict_pb2
+from tensorflow_serving.apis import prediction_service_pb2
+
+
+def main(image_paths, server, port):
+  channel = implementations.insecure_channel(server, port)
+  stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
+
+  raw_images = []
+  for path in image_paths:
+    with tf.gfile.Open(path) as img:
+      raw_images.append(img.read())
+
+  # Send request
+  # See prediction_service.proto for gRPC request/response details.
+  request = predict_pb2.PredictRequest()
+  request.model_spec.name = 'inception'
+  request.model_spec.signature_name = 'predict_images'
+  request.inputs['images'].CopyFrom(
+      tf.make_tensor_proto(raw_images, shape=[len(raw_images)]))
+  result = stub.Predict(request, 10.0)  # 10 secs timeout
+  print(result)
+
+
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser('Label an image using Inception')
+  parser.add_argument(
+      '-s',
+      '--server',
+      help='URL of host serving the Inception model'
+  )
+  parser.add_argument(
+      '-p',
+      '--port',
+      type=int,
+      default=9000,
+      help='Port at which Inception model is being served'
+  )
+  parser.add_argument(
+      'images',
+      nargs='+',
+      help='Paths (local or GCS) to images you would like to label'
+  )
+
+  args = parser.parse_args()
+
+  main(args.images, args.server, args.port)
diff --git a/components/k8s-model-server/inception-client/requirements.txt b/components/k8s-model-server/inception-client/requirements.txt
@@ -0,0 +1,17 @@
+backports.weakref==1.0.post1
+bleach==1.5.0
+enum34==1.1.6
+funcsigs==1.0.2
+futures==3.2.0
+grpcio==1.8.3
+html5lib==0.9999999
+Markdown==2.6.11
+mock==2.0.0
+numpy==1.13.3
+pbr==3.1.1
+protobuf==3.5.1
+six==1.11.0
+tensorflow==1.4.1
+tensorflow-serving-api==1.4.0
+tensorflow-tensorboard==0.4.0rc3
+Werkzeug==0.14.1