Skip to content

Latest commit



131 lines (94 loc) · 4.98 KB

File metadata and controls

131 lines (94 loc) · 4.98 KB

Training a model

The code for our Tensorflow project can be found in the containers/training folder. It contains a python file called that contains TensorFlow code, and a Dockerfile to build it into a container image. defines a fairly straight-forward program. First it defines a simple feed-forward neural network with two hidden layers. Next, it defines tensor ops to train and evaluate the model's weights. Finally, after a number of training cycles, it saves the trained model up to a Object Storage bucket. Of course, before we can use the storage bucket, we need to create it.

The code was taken from the Kubeflow MNIST example

Setting up file storage

Our next step is to create two things: the storage bucket that will hold our trained model.

You can download Minio client to your operating system of choice to access the minio S3 compatible object storage.

mc config host add minio $S3_ENDPOINT $ACCESS_KEY $ACCESS_SECRET_KEY

Creating a bucket.


Make sure you replace userN with your username


mc mb minio/$BUCKET_NAME

mc stat minio/$BUCKET_NAME

Name      : userN/
Date      : 2019-01-22 11:27:45 CET
Size      : 0B
Type      : folder

Alternatively you can also use the Web-UI of minio. The login credentials are the same as above.

Building the Container

While we could build the container locally and push it to the OpenShift registry, we'll leverage the OpenShift build pipeline and image stream capabilities. This will allow us to build container images right in OpenShift, push them to an ImageStream and use them in our job and pod definitions.

Let's create the BuildConfig and ImageStream:

oc process -f openshift/build_config_training-template.yaml --param APPLICATION_NAME=training-userN | oc apply -f - created created

Have a look at the BuildConfig you've created:

oc describe bc/training-userN
Name:           training
Namespace:      user8
Created:        26 seconds ago
Labels:         name=training

Latest Version: Never built

Strategy:       Docker
ContextDir:     containers/training
Output to:      ImageStreamTag training:latest

Build Run Policy:       Serial
Triggered by:           <none>
Builds History Limit:
        Successful:     5
        Failed:         5

Events: <none>


If you forked the repo previously, you might have to adjust URL in the buildconig yaml

Let's trigger a build:

oc start-build training-userN --wait --follow started
Cloning "" ...
        Commit: 28e5a8a797eda167444499bf2c52cb73c9034d92 (asdf)
        Author: Marcel Hild <[email protected]>
        Date:   Mon Jan 21 19:19:45 2019 +0100
Step 1/10 : FROM tensorflow/tensorflow:1.12.0
 ---> 2054925f3b43
Step 2/10 : MAINTAINER "Marcel Hild <[email protected]>"
 ---> Using cache
 ---> 82d3bc88f2f3

Later, if you make modifications to the code, you can also start a build by uploading the contents of your directory. We call this a binary build:

oc start-build training-userN --from-dir=. --wait --follow

Now we have the training image in our ImageStream

oc describe is/training-userN
Name:                   training
Namespace:              user8
Created:                20 minutes ago
Labels:                 <none>
Annotations:  {"apiVersion":"","kind":"ImageStream","metadata":{"annotations":{},"name":"training","namespace":"user8"},"spec":{"dockerImageRepository":"training","lookupPolicy":{"local":true},"tags":[{"name":"latest"}]}}

Docker Pull Spec:       docker-registry.default.svc:5000/user8/training
Image Lookup:           local=true
Unique Images:          0
Tags:                   1

  tag without source image
  * docker-registry.default.svc:5000/user8/training@sha256:f671516ab10cb21c10567f57d18f01b69e769053efd1d24d2cd980906d9f6a7c
      About a minute ago