Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Controller #1624

Closed
thockin opened this issue Oct 7, 2014 · 63 comments
Closed

Job Controller #1624

thockin opened this issue Oct 7, 2014 · 63 comments
Assignees
Labels
area/api Indicates an issue on api area. area/batch kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@thockin
Copy link
Member

thockin commented Oct 7, 2014

We need something like ReplicationController that runs RestartOnFailure and RestartNever pods "to completion", collects results, etc

@thockin thockin added kind/design Categorizes issue or PR as related to design. kind/enhancement priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/api Indicates an issue on api area. labels Oct 7, 2014
@bgrant0607
Copy link
Member

Also discussed in #503.

@bgrant0607
Copy link
Member

See also OpenShift's proposed job concept: https://github.com/openshift/openshift-pep/blob/master/openshift-pep-013-openshift-3.md

Potential features include:

@smarterclayton
Copy link
Contributor

We have not yet implemented the generic concept, instead using pods directly. Initially it seems like each of the individual places run-once pods were used would be related, the actual flows were easy to control. Advantages of a unified job resource would be that you could easily extend it for your own use - downside is that you still have to control / ensure the job resource gets created (so you have two sync loops instead of one). Was hoping to see how others might use jobs before we proceeded further. The pattern was definitely common, but our uses had subtle differences that may not abstract well.

@bgrant0607
Copy link
Member

That's interesting. Another option is that we could just make individual pods easier to use for these workflow sorts of scenarios (which we should do regardless).

Some features I could imagine would be useful for that:

Anything else?

@smarterclayton
Copy link
Contributor

On Oct 11, 2014, at 12:34 PM, bgrant0607 [email protected] wrote:

That's interesting. Another option is that we could just make individual pods easier to use for these workflow sorts of scenarios (which we should do regardless).

Some features I could imagine would be useful for that:

wait for completion via watch on events
get success/failure information
deadlines
pod templates and bulk creation (#170)
input/output (#1503) and/or some means of pod/image parameterization
Anything else?

Define a standard annotation key(s) for certain job conventions, allow annotations to be atomically updated on PUT (standard if-match resource version is enough).

Logs and the ability to read logs and get pod info long after a pod is deleted.=

@bgrant0607
Copy link
Member

Some application frameworks, such as mapreduce/hadoop-style workloads, may take on the controller responsibilities themselves rather than relying upon a shared service. An example might be the Application Master in YARN.

@bgrant0607 bgrant0607 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 16, 2014
@bgrant0607 bgrant0607 added status/help-wanted sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Feb 28, 2015
@tmrts
Copy link
Contributor

tmrts commented Mar 18, 2015

@thockin @smarterclayton @bgrant0607 After a long journey through the issues about job management in kubernetes, I have submitted a GSOC proposal for this topic. Your feedback would be most appreciated.

@bgrant0607
Copy link
Member

We'll take a look @TamerTas. Thanks!

@bgrant0607
Copy link
Member

To document later:

Some batch workloads are data processing/analysis workloads of independent utility.

Others support serving workloads, such as:

  • data cleanup / GC
  • serving data generation / aggregation / indexing / import
  • data snapshots / copies / backups / export
  • logs processing / billing / audit / report generation
  • maintenance (e.g., schema changes / data conversion)
  • integrity checking / validation
  • defense analysis (spam, abuse, dos, etc.)
  • online/offline feedback / adaptation / machine learning
  • continuous/periodic build/push

@soltysh
Copy link
Contributor

soltysh commented Jul 21, 2015

/sub

@gmarek
Copy link
Contributor

gmarek commented Jul 21, 2015

cc @mwielgus

@smarterclayton
Copy link
Contributor

@soltysh please link your ongoing job proposal here so Kube folks can get a chance to look. The proposal will be coming soon here while Maciej prototypes

@soltysh
Copy link
Contributor

soltysh commented Jul 21, 2015

It's already linked here, but here you go openshift/origin#3693. It covers both the job controller part and the cron scheduler.

@bgrant0607
Copy link
Member

Could we discuss this at an upcoming community hangout?

@smarterclayton
Copy link
Contributor

Absolutely

On Jul 21, 2015, at 3:06 PM, Brian Grant [email protected] wrote:

Could we discuss this at an upcoming community hangout?


Reply to this email directly or view it on GitHub
#1624 (comment)
.

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@mikedanese mikedanese added team/control-plane and removed team/master sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Aug 20, 2015
@bprashanth
Copy link
Contributor

I'm not working on the job controller, a better owner might be Mike or @soltysh

@bprashanth bprashanth assigned davidopp and unassigned bprashanth Aug 21, 2015
@davidopp davidopp assigned erictune and unassigned davidopp Aug 22, 2015
@davidopp
Copy link
Member

@erictune has been shepherding this, so assigning to him.

@resouer
Copy link
Contributor

resouer commented Sep 17, 2015

/sub

@bgrant0607
Copy link
Member

@aronchick Continuing from #14186 (comment)

Your primary concern is to clarify that this is a batch job (e.g., LSF, Load Leveler, Printer Job) as opposed to an indefinitely running "Job" (e.g., Borg, Aurora, Nomad)?

@soltysh
Copy link
Contributor

soltysh commented Oct 2, 2015

And we do distinguish those, but as:

That's the terminology we have in k8s. @aronchick what's your opinion on that?

@erictune
Copy link
Member

We now have this as a beta feature, in head and to appear in 1.1. Closing.

@soltysh
Copy link
Contributor

soltysh commented Oct 29, 2015

@erictune are you going to close it?

bertinatto pushed a commit to bertinatto/kubernetes that referenced this issue Jul 10, 2023
UPSTREAM: 119107: Stop using deprecated API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/batch kind/design Categorizes issue or PR as related to design. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests