Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit f10c384
Author: Jeremy Lewi <[email protected]>
Date:   Thu Jan 25 17:11:18 2018 -0800

    Need to set authenticator param.

commit 705067f
Author: Jeremy Lewi <[email protected]>
Date:   Thu Jan 25 15:53:08 2018 -0800

    Fix some bugs and start a user guide.

commit cbf67a3
Merge: c60c245 fed20a1
Author: Jeremy Lewi <[email protected]>
Date:   Thu Jan 25 14:31:46 2018 -0800

    Sync to head.

commit c60c245
Author: Jeremy Lewi <[email protected]>
Date:   Thu Jan 25 14:20:47 2018 -0800

    * Use Envoy as a reverse proxy and for JWT verification.
    * Don't run Envoy as a side car in the JupyterHub pod. This is cleaner
      and allows us to have a single reverse proxy for all services.

commit fed20a1
Author: Elson Rodriguez <[email protected]>
Date:   Thu Jan 25 06:28:22 2018 -0800

    Changing Jupyter service to ClusterIP by default. (kubeflow#139)

    * Changing default Service type for Jupyterhub to Cluster IP
       * Exposing services publicly is a security risk so we want to avoid recommending that since people may not understand the implications
    * Updated documentation to reflect ClusterIP change for Jupyter.

commit 8aaa393
Author: lluunn <[email protected]>
Date:   Thu Jan 25 06:21:55 2018 -0800

    fix guide typo (kubeflow#144)

commit 7bb642c
Author: Robert Wilkins III <[email protected]>
Date:   Tue Jan 23 23:05:10 2018 -0600

    Update README.md (kubeflow#142)

    Revised grammer and punctuation changes to the document as detailed below:

    Added (ML) after "Machine Learning" on line 3 to avoid ambiguation over the "ML" reference on line 18.

    Changed "best of breed" to "best-of-breed" on line 3 to match language used in Kubernetes/Kubernetes.

    Removed the comma after GPUs on line 6 to avoid treating a regular sentence as a run-on sentence.

    Changed "it is" to "it's" on line 23.  There's no need to get formal here if the rest of the document has a light conversational feel.

    Removed an unnecessary comma from Line 23 after (within reason).

    Added dashes in "easy to use" on line 25 to match formatting from line 3.

    Added a colon after "using Kubeflow if" on line 30 to match the format previously established in the previous section called "The Kubeflow Mission".

    Capitalized Kubeflow on line 34 to match casing from line 30.

    Changed the sentence structure and added a period for line 36 to make more sense rather than a strange run-on sentence.

    Removed unnecessary newline between lines 36 and 37 to match format established in line 34.

    Added a period after (see below) on line 39.

    Added a comma after "GPUs" to avoid a run-on sentence on line 47.

    Changed the ending comma on line 58 to a colon to follow the established format used previously in this document.

    Added a comma after Kubeflow on line 83 and added a period to finish the sentence.

    Moved the comma after "using" to after "GKE" in line 109for the sentence to make sense.

    Added a dash for "in depth" on line to follow the document's established format.

commit 80cb162
Author: Neeraj Kashyap <[email protected]>
Date:   Mon Jan 22 05:31:32 2018 -0800

    Client script for inception model server (kubeflow#92)

    Added a script that allows users to run the hosted inception model on
    images on their local filesystems or on Google Cloud Storage.

    This is, with only very slight modifications for readability, the same
    as the client provided by TensorFlow Serving -
    https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_client.py

    As such, I am completely okay with us just linking to their script.

    My initial intention was to make this as a notebook, but the problem is
    that the tensorflow-serving-api Python package is only available for
    Python 2 and the kubeflow-core environment only offers a Python 3
    backend for the Jupyter notebook.

    This is therefore a stopgap until I can introduce an appropriate image
    in place of the one used by kubeflow-core.

    * Changed model serving service type to ClusterIP from LoadBalancer

    * Added instructions for exposing service IP

commit bde2ddc
Author: Putra Manggala <[email protected]>
Date:   Fri Jan 19 17:34:59 2018 -0500

    Fix datascientists typo (kubeflow#137)

commit 698cc67
Author: Jeremy Lewi <[email protected]>
Date:   Thu Jan 18 20:45:49 2018 -0800

    Make Argo UI available publicly at testing-argo.kubeflow.ui (kubeflow#132)

    * We use Argo to run our E2E tests so the UI is very useful for debugging tests.

    * Add an ingress with a static IP to expose it publicly.

    * Fix kubeflow#131

commit 55c220d
Merge: 11d989c ca95a0d
Author: Jeremy Lewi <[email protected]>
Date:   Wed Jan 17 21:24:58 2018 -0800

    Merge remote-tracking branch 'github/iap' into iap

commit 11d989c
Merge: 8e6fb87 4c9217d
Author: Jeremy Lewi <[email protected]>
Date:   Wed Jan 17 21:24:34 2018 -0800

    Resolve conflicts.

commit 4c9217d
Author: Jeremy Lewi <[email protected]>
Date:   Wed Jan 17 20:50:24 2018 -0800

    Fix TfJob operator roles and TfCNN prototype (kubeflow#130)

    * Fix the TFCNN prototype; the termination policy wasn't being properly set

    * Create service accounts and role bindings for the TfJob operator and UI

    * Fix kubeflow#129 TfCnn template doesn't set termination policy correctly

    * Fix kubeflow#125 Missing roles for tf-job operator

    * Fix kubeflow#95; presubmits/postsubmits need to use the code at the commit we checked out

    *We do this by replacing the directory in vendor with a symbolic link to where we checked out the source.
       * It looks like using "--as" with ksonnet leads to strange errors about the server not being able to create the config map
       * If we don't use "--as" need to fetch credentials a second time or else we get RBAC issues creating the cluster
  • Loading branch information
jlewi committed Jan 26, 2018
1 parent 82f53fc commit 8fdcfcf
Show file tree
Hide file tree
Showing 20 changed files with 1,004 additions and 355 deletions.
31 changes: 16 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,50 @@
# Kubeflow

[Prow test dashboard](https://k8s-testgrid.appspot.com/sig-big-data)
[Prow jobs dashboard](https://prow.k8s.io/?repo=google%2Fkubeflow)

The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. Our goal is **not** to recreate other services, but to provide a straightforward way for spinning up best of breed OSS solutions. Contained in this repository are manifests for creating:
The Kubeflow project is dedicated to making Machine Learning on Kubernetes easy, portable and scalable. Our goal is **not** to recreate other services, but to provide a straightforward way for spinning up best-of-breed OSS solutions. Contained in this repository are manifests for creating:

* A JupyterHub to create & manage interactive Jupyter notebooks
* A Tensorflow Training Controller that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting
* A Tensorflow Training Controller that can be configured to use CPUs or GPUs and adjusted to the size of a cluster with a single setting
* A TF Serving container

This document details the steps needed to run the Kubeflow project in any environment in which Kubernetes runs.

## Quick Links
* [Prow test dashboard](https://k8s-testgrid.appspot.com/sig-big-data)
* [Prow jobs dashboard](https://prow.k8s.io/?repo=google%2Fkubeflow)
* [Argo UI for E2E tests](http://testing-argo.kubeflow.io)

## The Kubeflow Mission

Our goal is to help folks use ML more easily, by letting Kubernetes to do what it's great at:
- Easy, repeatable, portable deployments on a diverse infrastructure (laptop <-> ML rig <-> training cluster <-> production cluster)
- Deploying and managing loosely-coupled microservices
- Scaling based on demand

Because ML practitioners use so many different types of tools, it is a key goal that you can customize the stack to whatever your requirements (within reason), and let the system take care of the "boring stuff." While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.
Because ML practitioners use so many different types of tools, it's a key goal that you can customize the stack to whatever your requirements (within reason) and let the system take care of the "boring stuff." While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.

Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack _anywhere_ Kubernetes is already running and can self configure based on the cluster it deploys into.


## Who should consider using Kubeflow?

Based on the current functionality you should consider using Kubeflow if
Based on the current functionality you should consider using Kubeflow if:

* You want to train/serve TensorFlow models in different environments (e.g. local, on prem, and cloud)
* You want to use Jupyter notebooks to manage TensorFlow training jobs
* kubeflow is particularly helpful if you want to launch training jobs that use more resources (more nodes or more GPUs) than your notebook.
* You want to combine TensorFlow with other processes
* For example if you want to use [tensorflow/agents](https://github.com/tensorflow/agents) to run simulations to generate data for training
reinforcement learning models
* For example, you may want to use [tensorflow/agents](https://github.com/tensorflow/agents) to run simulations to generate data for training reinforcement learning models.

This list is based ONLY on current capabilities. We are investing significant resources to expand the
functionality and actively soliciting help from companies and inviduals interested in contributing (see [below](README.md#who-should-consider-contributing-to-kubeflow))
functionality and actively soliciting help from companies and inviduals interested in contributing (see [below](README.md#who-should-consider-contributing-to-kubeflow)).

## Setup

This documentation assumes you have a Kubernetes cluster already available.

If you need help setting up a Kubernetes cluster please refer to [Kubernetes Setup](https://kubernetes.io/docs/setup/).

If you want to use GPUs be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).
If you want to use GPUs, be sure to follow the Kubernetes [instructions for enabling GPUs](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/).

## Quick Start

Expand All @@ -54,7 +55,7 @@ If you want to use GPUs be sure to follow the Kubernetes [instructions for enabl

### Steps

In order to quickly set up all components, execute the following commands,
In order to quickly set up all components, execute the following commands:

```commandline
# Initialize a ksonnet APP
Expand All @@ -79,7 +80,7 @@ provide prototypes that can be used to configure TensorFlow jobs and deploy Tens
Used together, these make it easy for a user go from training to serving using Tensorflow with minimal
effort in a portable fashion between different environments.

For more detailed instructions about how to use Kubeflow please refer to the [user guide](user_guide.md)
For more detailed instructions about how to use Kubeflow, please refer to the [user guide](user_guide.md).

## Troubleshooting

Expand All @@ -105,12 +106,12 @@ kubectl create clusterrolebinding default-admin --clusterrole=cluster-admin --us

* Replace `[email protected]` with the user listed in the error message.

If you're using, GKE you may want to refer to [GKE's RBAC docs](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) to understand
If you're using GKE, you may want to refer to [GKE's RBAC docs](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control) to understand
how RBAC interacts with IAM on GCP.

## Resources

* [user guide](user_guide.md) provides in depth instructions for using Kubeflow
* [user guide](user_guide.md) provides in-depth instructions for using Kubeflow
* Katacoda has produced a [self-paced scenario](https://www.katacoda.com/kubeflow) for learning and trying out Kubeflow


Expand Down
147 changes: 146 additions & 1 deletion components/k8s-model-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ storage bucket you created above.
gsutil cp -r inception gs://<bucket-name>
```

Use [gsutil_ls](https://cloud.google.com/storage/docs/gsutil/commands/ls) to view the contents of your bucket. You
Use [gsutil ls](https://cloud.google.com/storage/docs/gsutil/commands/ls) to view the contents of your bucket. You
will see that the contents of the model are stored in the `gs://<bucket-name>/inception/1` directory. This is the
first version of the model that we will serve.

Expand Down Expand Up @@ -163,3 +163,148 @@ You can learn more about [updating a Deployment](https://kubernetes.io/docs/conc
[scaling a Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment), and
[Pod Resources](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) in the
Kubernetes documentation.



### Use the served model

The [inception-client](./inception-client) directory contains a Python script you can use to make a call against the deployed model.

This script is intended to be run externally to the kubernetes cluster as a demonstration that the inception model is correctly being served.
You can run the script either directly from a Python2 environment or in a Docker container.

#### Setup

You will require the external IP for the inception service as well as the port it is being hosted on. The inception service should be
listed under the value you used for the `MODEL_NAME` parameter in the ksonnet component. You can find this information using
```commandline
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
$MODEL_NAME LoadBalancer <INTERNAL IP> <SERVICE IP> <SERVICE PORT>:<NODE PORT> <TIME SINCE DEPLOYMENT>
```

We will feed the `<SERVICE IP>` and `<SERVICE PORT>` to the labelling script. We will use it to label the following image of a
cat sleeping on a comforter atop a sofa:

![Cat on comforter on sofa](./inception-client/images/sleeping-pepper.jpg)

You can also use to to label your own images.

#### Running the script directly

You can run the script directly in your local environment if Python2 is available to you. You will not be able to use the script with Python3
as the [`tensorflow-serving-api` package](https://pypi.python.org/pypi/tensorflow-serving-api)
is not yet Python3-capable ([Issue #117](https://github.com/google/kubeflow/issues/117)).

If you would like to use a virtual environment, begin by activating your desired environment with your favorite environment manager. Then,
```commandline
pip install -r requirements.txt
```

Run the script as follows:

```commandline
python label.py -s <SERVICE IP> -p <SERVICE PORT> images/sleeping-pepper.jpg
```

#### Run in Docker container with publicly exposed service

The [inception-client](./inception-client) directory also contains a [Dockerfile](./inception-client/Dockerfile) that will allow you to
call out to the inception service from a container. You can run this container on your local machine if you publicly exposed your
`inception` service. If you would like to do this on GKE, simply run

```commandline
kubectl edit service inception
```

and change the service type to `NodePort` or `LoadBalancer`.

From that directory, start by building the image:

```commandline
docker build -t inception-client .
```

You can optionally specify a directory containing the JPEG files you would like to label using the
```commandline
--build-arg IMAGES_DIR=<path-to-image-directory>
```

By default, this build uses [inception-client/images](./inception-client/images).

Then run the container with the appropriate cluster information:

```commandline
docker run -v $(pwd):/data inception-client <SERVICE IP> <SERVICE PORT>
```

#### Run container on your kubernetes cluster

If your inception service is not publicly exposed, you can also run the client container directly on the kubernetes cluster on which the
inception model is being served. To do this:

1. Build the docker image as specified above. From the [inception-client](./inception-client) directory:
```commandline
docker build -t inception-client .
```

1. Prefix the tag with your GCR registry:
```commandline
GCR_TAG=gcr.io/$(gcloud config get-value project)/inception-client:latest
docker image tag inception-client:latest $GCR_TAG
```

1. Push the image to your project's container registry:
```commandline
gcloud docker -- push $GCR_TAG
```

1. Run a container built from that image on your GKE cluster:
```commandline
kubectl run -it inception-client --image $GCR_TAG --restart=OnFailure
```

#### Output

No matter how you run the script, you should see the following output:

```
outputs {
key: "classes"
value {
dtype: DT_STRING
tensor_shape {
dim {
size: 1
}
dim {
size: 5
}
}
string_val: "sleeping bag"
string_val: "Border terrier"
string_val: "tabby, tabby cat"
string_val: "quilt, comforter, comfort, puff"
string_val: "studio couch, day bed"
}
}
outputs {
key: "scores"
value {
dtype: DT_FLOAT
tensor_shape {
dim {
size: 1
}
dim {
size: 5
}
}
float_val: 8.5159368515
float_val: 7.85043668747
float_val: 5.88767671585
float_val: 5.706138134
float_val: 5.55422878265
}
}
```
31 changes: 31 additions & 0 deletions components/k8s-model-server/inception-client/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright 2018 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM python:2.7.14

RUN pip install --no-cache-dir grpcio tensorflow tensorflow-serving-api

RUN mkdir -p /opt/label /data

WORKDIR /opt/label

COPY label.py ./
COPY run.sh ./

ARG IMAGES_DIR=images/

ADD $IMAGES_DIR /data/

ENTRYPOINT ["bash", "run.sh"]
CMD []
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 82 additions & 0 deletions components/k8s-model-server/inception-client/label.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Copyright 2018 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

#!/usr/bin/env python2.7

"""
Runs the Inception model being served on the kubeflow model server on an image
that you specify.
Note: This file is a modification of the inception client available on the
TensorFlow Serving GitHub repository:
https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/inception_client.py
"""

from __future__ import print_function

# This is a placeholder for a Google-internal import.

import argparse

from grpc.beta import implementations
import tensorflow as tf

from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2


def main(image_paths, server, port):
channel = implementations.insecure_channel(server, port)
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

raw_images = []
for path in image_paths:
with tf.gfile.Open(path) as img:
raw_images.append(img.read())

# Send request
# See prediction_service.proto for gRPC request/response details.
request = predict_pb2.PredictRequest()
request.model_spec.name = 'inception'
request.model_spec.signature_name = 'predict_images'
request.inputs['images'].CopyFrom(
tf.make_tensor_proto(raw_images, shape=[len(raw_images)]))
result = stub.Predict(request, 10.0) # 10 secs timeout
print(result)


if __name__ == '__main__':
parser = argparse.ArgumentParser('Label an image using Inception')
parser.add_argument(
'-s',
'--server',
help='URL of host serving the Inception model'
)
parser.add_argument(
'-p',
'--port',
type=int,
default=9000,
help='Port at which Inception model is being served'
)
parser.add_argument(
'images',
nargs='+',
help='Paths (local or GCS) to images you would like to label'
)

args = parser.parse_args()

main(args.images, args.server, args.port)
17 changes: 17 additions & 0 deletions components/k8s-model-server/inception-client/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
backports.weakref==1.0.post1
bleach==1.5.0
enum34==1.1.6
funcsigs==1.0.2
futures==3.2.0
grpcio==1.8.3
html5lib==0.9999999
Markdown==2.6.11
mock==2.0.0
numpy==1.13.3
pbr==3.1.1
protobuf==3.5.1
six==1.11.0
tensorflow==1.4.1
tensorflow-serving-api==1.4.0
tensorflow-tensorboard==0.4.0rc3
Werkzeug==0.14.1
Loading

0 comments on commit 8fdcfcf

Please sign in to comment.