-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garbage collection of orphan pods #1543
Conversation
* Provide a new configuration section to garbage collect orphan pods * All kubernetes agents are regularly annotated with the current timestamp. * Then look at agents managed by the current instance, and look at pods that have not been refreshed for the timeout defined in the configuration section (2 minutes minimum). * By default, the default namespace for each Kubernetes Cloud is monitored. If there are pods using different namespaces, these need to be configured accordingly in order to be picked up by the garbage collection logic.
...esources/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud/help-garbageCollection.html
Show resolved
Hide resolved
src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesPipelineTest.java
Outdated
Show resolved
Hide resolved
r.jenkins.removeNode(node); | ||
break; | ||
} | ||
} | ||
// Build is marked as failed because the agent has vanished | ||
r.assertBuildStatus(Result.FAILURE, r.waitForCompletion(b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I guess but not very realistic.
Maybe you can doKill
the build to simulate a loss of regular cleanup from podTemplate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doKill
cancels the build, but that leaves the node running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is also something that ought to be fixed I think. Rather than refreshing the timestamp when a node is online, can we do so only when it is busy? Or is there some use case for a long-lived idle K8s agent?
src/main/java/org/csanchez/jenkins/plugins/kubernetes/GarbageCollection.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesComputer.java
Outdated
Show resolved
Hide resolved
@@ -538,6 +543,51 @@ public static Builder builder() { | |||
return new Builder(); | |||
} | |||
|
|||
public void annotateTtl(TaskListener listener) { | |||
var kubernetesCloud = getKubernetesCloud(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raises an IllegalStateException
if so, which is caught in order to avoid breaking the whole loop.
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesSlave.java
Outdated
Show resolved
Hide resolved
src/main/java/org/csanchez/jenkins/plugins/kubernetes/PodTemplateBuilder.java
Outdated
Show resolved
Hide resolved
return Duration.between( | ||
Instant.ofEpochMilli(refreshTime), | ||
Instant.now()) | ||
Level.FINE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW you can use Logger.fine
with a Supplier
which is a bit more concise. (Not, alas, with a Throwable
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not, alas, with a
Throwable
Indeed, this is why I use this form for consistency.
private static Long RECURRENCE_PERIOD = SystemProperties.getLong( | ||
GarbageCollection.class.getName() + ".recurrencePeriod", TimeUnit.MINUTES.toSeconds(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have SystemProperties.getDuration
that would parse 250ms
, 5s
, 10m
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, too bad Duration.parse
takes ISO-8601
format which is not practical.
Provide a new configuration section to garbage collect orphan pods
All kubernetes agents are regularly annotated with the current timestamp.
Then look at agents managed by the current instance, and look at pods that have not been refreshed for the timeout defined in the configuration section (2 minutes minimum).
By default, the default namespace for each Kubernetes Cloud is monitored. If there are pods using different namespaces, these need to be configured accordingly in order to be picked up by the garbage collection logic.
Testing done
Submitter checklist