Job Controller #1624

thockin · 2014-10-07T16:46:39Z

We need something like ReplicationController that runs RestartOnFailure and RestartNever pods "to completion", collects results, etc

bgrant0607 · 2014-10-07T18:01:44Z

Also discussed in #503.

bgrant0607 · 2014-10-11T00:07:54Z

See also OpenShift's proposed job concept: https://github.com/openshift/openshift-pep/blob/master/openshift-pep-013-openshift-3.md

Potential features include:

deadline (Restart policy should be able to specify maximum running time (aka deadline) #829)
queuing
cron-like scheduling
replication
success/failure aggregation
gang scheduling and/or admission control
inter-job ordering dependencies and/or activation control for externalized dependency management

smarterclayton · 2014-10-11T15:26:31Z

We have not yet implemented the generic concept, instead using pods directly. Initially it seems like each of the individual places run-once pods were used would be related, the actual flows were easy to control. Advantages of a unified job resource would be that you could easily extend it for your own use - downside is that you still have to control / ensure the job resource gets created (so you have two sync loops instead of one). Was hoping to see how others might use jobs before we proceeded further. The pattern was definitely common, but our uses had subtle differences that may not abstract well.

bgrant0607 · 2014-10-11T16:34:44Z

That's interesting. Another option is that we could just make individual pods easier to use for these workflow sorts of scenarios (which we should do regardless).

Some features I could imagine would be useful for that:

wait for completion via watch on events
get success/failure information (Figure out how to best convey more detailed pod status info #1370)
deadlines (Restart policy should be able to specify maximum running time (aka deadline) #829)
pod templates and bulk creation (Separate the pod template from replicationController #170)
bulk deletion / stop
graceful and immediate termination (Consistently support graceful and immediate termination for all objects #1535)
input/output (Ensure there is an easy way to provide container input (and to get output) #1503, Capture application termination messages/output #139) and/or some means of pod/image parameterization

Anything else?

smarterclayton · 2014-10-20T01:10:39Z

On Oct 11, 2014, at 12:34 PM, bgrant0607 [email protected] wrote:

That's interesting. Another option is that we could just make individual pods easier to use for these workflow sorts of scenarios (which we should do regardless).

Some features I could imagine would be useful for that:

wait for completion via watch on events
get success/failure information
deadlines
pod templates and bulk creation (#170)
input/output (#1503) and/or some means of pod/image parameterization
Anything else?

Define a standard annotation key(s) for certain job conventions, allow annotations to be atomically updated on PUT (standard if-match resource version is enough).

Logs and the ability to read logs and get pod info long after a pod is deleted.=

bgrant0607 · 2014-10-30T16:22:19Z

Some application frameworks, such as mapreduce/hadoop-style workloads, may take on the controller responsibilities themselves rather than relying upon a shared service. An example might be the Application Master in YARN.

tmrts · 2015-03-18T03:23:09Z

@thockin @smarterclayton @bgrant0607 After a long journey through the issues about job management in kubernetes, I have submitted a GSOC proposal for this topic. Your feedback would be most appreciated.

bgrant0607 · 2015-03-18T06:55:16Z

We'll take a look @TamerTas. Thanks!

bgrant0607 · 2015-03-26T22:01:05Z

To document later:

Some batch workloads are data processing/analysis workloads of independent utility.

Others support serving workloads, such as:

data cleanup / GC
serving data generation / aggregation / indexing / import
data snapshots / copies / backups / export
logs processing / billing / audit / report generation
maintenance (e.g., schema changes / data conversion)
integrity checking / validation
defense analysis (spam, abuse, dos, etc.)
online/offline feedback / adaptation / machine learning
continuous/periodic build/push

soltysh · 2015-07-21T12:55:37Z

/sub

gmarek · 2015-07-21T13:28:14Z

cc @mwielgus

smarterclayton · 2015-07-21T13:57:22Z

@soltysh please link your ongoing job proposal here so Kube folks can get a chance to look. The proposal will be coming soon here while Maciej prototypes

soltysh · 2015-07-21T13:59:37Z

It's already linked here, but here you go openshift/origin#3693. It covers both the job controller part and the cron scheduler.

bgrant0607 · 2015-07-21T19:06:10Z

Could we discuss this at an upcoming community hangout?

smarterclayton · 2015-07-21T19:35:16Z

Absolutely

On Jul 21, 2015, at 3:06 PM, Brian Grant [email protected] wrote:

Could we discuss this at an upcoming community hangout?

—
Reply to this email directly or view it on GitHub
#1624 (comment)
.

bprashanth · 2015-08-21T22:33:40Z

I'm not working on the job controller, a better owner might be Mike or @soltysh

davidopp · 2015-08-22T00:50:05Z

@erictune has been shepherding this, so assigning to him.

resouer · 2015-09-17T07:30:52Z

/sub

bgrant0607 · 2015-10-02T05:24:51Z

@aronchick Continuing from #14186 (comment)

Your primary concern is to clarify that this is a batch job (e.g., LSF, Load Leveler, Printer Job) as opposed to an indefinitely running "Job" (e.g., Borg, Aurora, Nomad)?

soltysh · 2015-10-02T14:13:27Z

And we do distinguish those, but as:

job - run-once pods
replicationcontroller (to be renamed someday Create ReplicaSet #3024) - run indefinitely pods

That's the terminology we have in k8s. @aronchick what's your opinion on that?

erictune · 2015-10-27T19:50:31Z

We now have this as a beta feature, in head and to appear in 1.1. Closing.

soltysh · 2015-10-29T15:10:37Z

@erictune are you going to close it?

UPSTREAM: 119107: Stop using deprecated API

thockin added kind/design Categorizes issue or PR as related to design. kind/enhancement priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/api Indicates an issue on api area. labels Oct 7, 2014

bgrant0607 mentioned this issue Oct 7, 2014

Proposal: scaling interface #1629

Closed

bgrant0607 added area/batch workload/workflow labels Oct 11, 2014

erictune mentioned this issue Nov 24, 2014

Hadoop Example #2563

Closed

bgrant0607 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 16, 2014

This was referenced Dec 18, 2014

Create ReplicaSet #3024

Closed

Daemon (was Feature: run-on-every-node scheduling/replication (aka per-node controller or daemon controller)) #1518

Closed

Convert ReplicationController to a plugin (was ReplicationController redesign) #3058

Closed

bgrant0607 removed the status/help-wanted label Feb 5, 2015

bgrant0607 mentioned this issue Feb 13, 2015

kubectl run should support run-once commands #4366

Closed

bgrant0607 added the kind/gsoc label Feb 19, 2015

bgrant0607 added status/help-wanted sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Feb 28, 2015

bgrant0607 mentioned this issue Mar 15, 2015

Apply for GSoC 2015 #4259

Closed

bgrant0607 mentioned this issue Jun 19, 2015

"v2" API (API/client redesign umbrella issue) #8190

Closed

derekwaynecarr mentioned this issue Jul 15, 2015

Job controller proposal openshift/origin#3693

Closed

mikedanese mentioned this issue Jul 23, 2015

Job controller proposal #11746

Merged

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

bgrant0607 mentioned this issue Jul 27, 2015

Consider publicly exposing some or all RollingUpdater annotations #7851

Closed

mikedanese added team/control-plane and removed team/master sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Aug 20, 2015

bprashanth assigned davidopp and unassigned bprashanth Aug 21, 2015

davidopp assigned erictune and unassigned davidopp Aug 22, 2015

ghodss mentioned this issue Sep 1, 2015

Higher level image and deployment concepts in Kubernetes #503

Closed

bgrant0607 mentioned this issue Sep 16, 2015

Generalize label selectors #341

Closed

soltysh mentioned this issue Sep 21, 2015

Possible future improvements for Job object #14186

Open

mikedanese closed this as completed Oct 29, 2015

bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Jul 10, 2023

Merge pull request kubernetes#1624 from soltysh/drop_deprecated_api_pick

af29f64

UPSTREAM: 119107: Stop using deprecated API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job Controller #1624

Job Controller #1624

thockin commented Oct 7, 2014

bgrant0607 commented Oct 7, 2014

bgrant0607 commented Oct 11, 2014

smarterclayton commented Oct 11, 2014

bgrant0607 commented Oct 11, 2014

smarterclayton commented Oct 20, 2014

bgrant0607 commented Oct 30, 2014

tmrts commented Mar 18, 2015

bgrant0607 commented Mar 18, 2015

bgrant0607 commented Mar 26, 2015

soltysh commented Jul 21, 2015

gmarek commented Jul 21, 2015

smarterclayton commented Jul 21, 2015

soltysh commented Jul 21, 2015

bgrant0607 commented Jul 21, 2015

smarterclayton commented Jul 21, 2015

bprashanth commented Aug 21, 2015

davidopp commented Aug 22, 2015

resouer commented Sep 17, 2015

bgrant0607 commented Oct 2, 2015

soltysh commented Oct 2, 2015

erictune commented Oct 27, 2015

soltysh commented Oct 29, 2015

Job Controller #1624

Job Controller #1624

Comments

thockin commented Oct 7, 2014

bgrant0607 commented Oct 7, 2014

bgrant0607 commented Oct 11, 2014

smarterclayton commented Oct 11, 2014

bgrant0607 commented Oct 11, 2014

smarterclayton commented Oct 20, 2014

bgrant0607 commented Oct 30, 2014

tmrts commented Mar 18, 2015

bgrant0607 commented Mar 18, 2015

bgrant0607 commented Mar 26, 2015

soltysh commented Jul 21, 2015

gmarek commented Jul 21, 2015

smarterclayton commented Jul 21, 2015

soltysh commented Jul 21, 2015

bgrant0607 commented Jul 21, 2015

smarterclayton commented Jul 21, 2015

bprashanth commented Aug 21, 2015

davidopp commented Aug 22, 2015

resouer commented Sep 17, 2015

bgrant0607 commented Oct 2, 2015

soltysh commented Oct 2, 2015

erictune commented Oct 27, 2015

soltysh commented Oct 29, 2015