Skip to content

tapanalyticstoolkit/tensorsets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

#TensorSets

TensorSets are a third-party resource to manage TensorFlow training clusters running in Kubernetes.

What's new

This is the initial release of the tensorsets repo.

Known issues

This is a POC. Using this in production may result in errors.

Walkthrough

First we define our ThirdPartyResource. This declares a new Kubernetes object type called TensorSets.

kubectl create -f kubernetes/tensorset-tpr-v0.yaml

Next, we deploy our TensorSet controller. The controller is a small app that performs actions based on TensorSet objects.

kubectl create -f kubernetes/tensorset-controller-v0.yaml

Now we create our first TensorSet:

kubectl create -f kubernetes/cluster1-ts-v0.yaml

The TensorSet controller will create your training cluster, and eventually you will see a bunch of pods in your current namespace.

Once they are all ready, start a training job:

kubectl create -f kubernetes/cluster1-job-v0.yaml

To see the progress of your job:

pods=$(kubectl get pods --selector=ts-cluster-name=cluster1 --output=jsonpath={.items..metadata.name})
kubectl logs -f pods

Once done with your training cluster, delete it:

kubectl delete tensorset cluster1

And your cluster will be gone!

Roadmap

About

Manage TensorFlow training clusters in Kubernetes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •