Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stable/jenkins - Feature: Add cache section under agent for cache vol… #16

Closed
wants to merge 9 commits into from

Conversation

krishnakvall
Copy link
Contributor

What this PR does / why we need it:

Note: this is the PR #22691 moved from combined helm-charts repo to jenkinsci repo.
REF: [stable/jenkins]: add cache section under persistence to define volumes #22691

Allows the chart to defined a set of volumes that can be mounted by agents for caching libraries and other build artifacts.
Volumes of type PVC are created by the chart and a corresponding CronJob is created to clear the cache on a periodic basis.

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

  • fixes #

Special notes for your reviewer:

Added cache section under agent thats set to false by default.

| agent.cache | Defining Cache volumes | enabled=false |

These caches can be mounted under the agent using the existing approach to mount PVC volumes.

Defining Caches for your agent pods

Its possible to define a set of cache volumes that can be mounted by agent pods for storing libraries and other artifacts for build. The volume defined, under persistence, will be created as a new PVC by the chart. The volume has to be mounted under the volumes section. Along with the cache, a CronJob can be defined under the clear section to perform cache activies, including clearing old libraries, on a periodic basis. The CronJob is required both to prevent unused libraries from accumulating over time and to avoid security issues.

Tip: Ensure a PVC and CronJob with a given componentName is defined only once

The cache is defined in the format below:

    cache:
      # create PVC
      persistence:
        enabled: true
        componentName: "{{ .Release.Name }}-maven-cache"
        storageClass:
        size: "8Gi"
      # create CronJob (which mounts all agent volumes)
      clear:
        enabled: true
        componentName: "{{ .Release.Name }}-clear-maven-cache"
        claimName: "{{ .Release.Name }}-maven-cache"
        mountPath: "/home/jenkins/agent/.m2/repository"
        schedule: "0 2 * * WED"
        image: "maven"
        tag: "3.6.3-openjdk-11"
        command: "mvn dependency:purge-local-repository -DsnapshotOnly=true"

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • DCO signed
  • Chart Version bumped
  • Variables are documented in the README.md
  • Title of the PR starts with chart name (e.g. stable/jenkins)

Signed-off-by: Krishnakumar Venkataraman [email protected]

@krishnakvall
Copy link
Contributor Author

krishnakvall commented Aug 30, 2020

Hi @torstenwalter @wmcdona89 , Have moved [stable/jenkins]: add cache section under persistence to define volumes #22691 from the main repo with some modifications. Please review.

Thank you.

@@ -415,6 +416,35 @@ $ helm install my-release -f values.yaml stable/jenkins

> **Tip**: You can use the default [values.yaml](values.yaml)

#### Defining Caches for your agent pods

Its possible to define a set of cache volumes that can be mounted by agent pods for storing libraries and other artifacts for build. The volume defined, under `persistence`, will be created as a new `PVC` by the chart. The volume has to be mounted under the `volumes` section. Along with the cache, a `CronJob` can be defined under the `clear` section to perform cache activies, including clearing old libraries, on a periodic basis. The `CronJob` is required both to prevent unused libraries from accumulating over time and to avoid security issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Its possible to define a set of cache volumes that can be mounted by agent pods for storing libraries and other artifacts for build. The volume defined, under `persistence`, will be created as a new `PVC` by the chart. The volume has to be mounted under the `volumes` section. Along with the cache, a `CronJob` can be defined under the `clear` section to perform cache activies, including clearing old libraries, on a periodic basis. The `CronJob` is required both to prevent unused libraries from accumulating over time and to avoid security issues.
Its possible to define a set of cache volumes that can be mounted by agent pods for storing libraries and other artifacts for build.
The volume defined, under `persistence`, will be created as a new `PVC` by the chart.
The volume has to be mounted under the `volumes` section.
Along with the cache, a `CronJob` can be defined under the `clear` section to perform cache activities, including clearing old libraries, on a periodic basis.
The `CronJob` is required both to prevent unused libraries from accumulating over time and to avoid security issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified this and it looks cleaner. Have reworded to follow the changed naming pvc.create and cronJob.create

@torstenwalter
Copy link
Member

For jobs which are running frequently during the day it may be easier just to configure idleMinutes:

idleMinutes Allows the Pod to remain active for reuse until the configured number of minutes has passed since the last step was executed on it.

If a job is executed often then the chance is high that a previous build pod is still available and can be re-used. In that case it will have the cache in place. In case there is no previous build job available or if more jobs should be run in parallel than agent pods are available it will spin up a new one (which does not have a cache.

Not sure it we should mention this in the docs.


Its possible to define a set of cache volumes that can be mounted by agent pods for storing libraries and other artifacts for build. The volume defined, under `persistence`, will be created as a new `PVC` by the chart. The volume has to be mounted under the `volumes` section. Along with the cache, a `CronJob` can be defined under the `clear` section to perform cache activies, including clearing old libraries, on a periodic basis. The `CronJob` is required both to prevent unused libraries from accumulating over time and to avoid security issues.

> **Tip**: Ensure a PVC and CronJob with a given `componentName` is defined only once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a chance to check this? If so then we could fail template rendering with an error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the idleminutes note, however, the cache is Read Write Many so is mounted on multiple agents as required. It makes sense to include the note.

I have modified the cache-cronjob file to fail with an error message, however, the error message coming from helm is probably clearer. Can modify cache-pvc too if this looks OK, else we can go with the default error thrown.

claimName: {{ tpl ($agent.cache.clear.claimName | required "cache clear.claimName is required") $ }}
containers:
- name: {{ tpl ($agent.cache.clear.componentName | required "cache clear componentName is required") $ }}
image: "docker-registry.default.svc:5000/{{ $agent.cache.clear.image }}:{{ $agent.cache.clear.tag }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-registry.default.svc:5000/

Why are we using a local registry here? I think it's better to have the registry as part of the cache.clear.image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, have fixed this!

@wmcdona89
Copy link
Contributor

@krishnakvall I'm working on sending you a PR with some proposed changes.

  1. Use of cache.volume to define the volume to mount
  2. Use of cache.pvc.create instead of cache.persistence.enabled as cache.persistence.enabled=false would appear to disable caching when it really just disables the creation of a PVC
  3. .Release.Name no longer needs to be included in componentName
  4. Use of mergeOverwrite instead of merge as there's a bug in merge which causes false to be overwritten by true

Here are my current values.

agent:
  cache:
    pvc:
      # create PVC
      create: true
      componentName: "agent-cache"
      storageClass:
      size: "4Gi"
    volume:
      componentName: "agent-cache"
      mountPath: "/var/cache"
    clear:
      # create CronJob
      enabled: true
      componentName: "clear-cache"
      schedule: "0 0 * * SUN"
      path: "/var/cache"

additionalAgents:
  python:
    image: jenkins/jnlp-agent-python
    tag: latest
    cache:
      # inherited from agent
      # volume:
      #   componentName: "agent-cache"
      #   mountPath: "/var/cache"
      pvc:
        create: false
      clear:
        enabled: false

@krishnakvall
Copy link
Contributor Author

@krishnakvall I'm working on sending you a PR with some proposed changes.

Thank you. I have made the naming changes, much clearer now. However, may need some pointers on (1), (4).

Referring back to the earlier comment numbers:

  1. Mounting the volume too was done in an earlier iteration but felt a bit like a corner case that could be better catered to using agent.volumes
  2. The naming changes make things much clearer!
  3. Depends on (1), if we can mount the volumes too, then Release.Name is not required, else it might be better to include it in the name just to avoid naming clashes.
  4. I am trying to reproduce this issue, but not able to. Will continue to test.

There are still changes required on the PR for (1), (4) above and the deduplication error message, if required. Working on these.

@wmcdona89
Copy link
Contributor

wmcdona89 commented Sep 6, 2020

@krishnakvall @torstenwalter consider the scenario where the user simply enables caching features for agent (agent.cache.pvc.create=true and agent.cache.clear.enabled=true)...should additional agents inherit these values and also create PVC and CronJob resources? Will users think they're just creating a cache to be shared by all agents or will they expect the chart to create agent-specific caches?

if NO to inheritance of these values, then an additional agent will only create a PVC or CronJob when it explicitly sets cache.pvc.create=true or agent.cache.clear.enabled=true

if YES to inheritance of these values, then we'll need a fixed naming format for resources such as [release name]-[agent key]-[component name] (e.g. my-release-maven-cache) in order to avoid naming conflicts

...and if we have a fixed naming format then we'll need to account for the scenario where one agent wants to use a PVC created by another agent. a field like cache.volume.pvcCreator would reference an agent key so the chart could generate the name of the PVC to mount

additionalAgents:
  maven:
    cache:
      pvc:
        create: true
        componentName: "cache"
      volume:
        pvcCreator: "maven"
        mountPath: "/var/cache"
  maven-xl:
    cache:
      volume:
        pvcCreator: "maven"
        mountPath: "/var/cache"

@wmcdona89
Copy link
Contributor

wmcdona89 commented Sep 8, 2020

@krishnakvall I sent feedback in a PR with a series of commits. let's consider the NFS cache scenario to be out of scope.

additionalAgents:
  python:
    cache:
      clearJob:
        # create CronJob (which mounts all agent volumes)
        create: true
        componentName: "clear-pip-cache"
        workingDir: /var/myapp/mynfs/pip
        ...
    volumes:
      - type: Nfs
        mountPath: /var/myapp/mynfs/pip
        readOnly: false
        serverAddress: "192.0.2.0"
        serverPath: /var/lib/containers

@wmcdona89
Copy link
Contributor

FYI. In my feedback PR, I went with NO inheritance of agent.cache.pvc.create or agent.cache.clearJob.create as I think this results in a configuration that's more intuitive. let me know if you think otherwise.

- The volume has to be mounted under the `volumes` section.
- Along with the cache, a `CronJob` can be defined under the `clearJob` section to perform cache activities, including clearing old libraries, on a periodic basis.
- The `CronJob` is required both to prevent unused libraries from accumulating over time and to avoid security issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add one working example here?

  • defining a cache
  • a job which is using that cache (might be something as simple as echo "writing files which should be cached" > /mycache
  • a CRON job which cleans things up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Aaron, I have merged the changes. I can see that apart from no inheritance, there is also auto mounting of the created PVC volumes.

@torstenwalter, will test with these changes and create a commit with a simple example this week.

@torstenwalter
Copy link
Member

I was thinking if we cache volumes should be directly linked to agents at all.
An alternative would be to just define a number of cache volumes, which could then be used by multiple agents. In some cases it may even make sense to have a single cache volume. If it is ReadWriteMany then one can simply mount different folders of that volume to different agents if needed.

In cloud environments that the number of IO operations is typically related to the size of the volume. Therefor it might help to have one larger volume instead of many smaller ones.

I like the idea of the clean up job. I am just not sure if we should use a Kubernetes CronJob for that. I see a risk that the CronJob runs while a Jenkins Agents is using that volume. If you know you setup you can schedule it at a time where no jobs are running, but people really need to understand that.

Instead one could use a Jenkins job for that. The benefit is that you could use https://plugins.jenkins.io/lockable-resources/ to lock and unlock the volume.

But at some point this becomes very specific to a given setup and I am not sure if we want to deal in a helm chart with problems like people not thinking about locking resources and wondering why there jobs are failing.

[jenkins] feedback on agent cache volumes
{{- if $agent.cache.pvc.create }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: {{ tpl ($agent.cache.pvc.componentName | required "cache pvc componentName is required") $ }}
name: {{ include "jenkins.fullname" $ }}-{{ $agent.cache.pvc.componentName }}
namespace: {{ template "jenkins.master.slaveKubernetesNamespace" $ }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace: {{ template "jenkins.master.slaveKubernetesNamespace" $ }}
namespace: {{ template "jenkins.master.slaveKubernetesNamespace" . }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace statement is failing with . when I test with minikube, maybe because of restricted scope in the range loop?

@wmcdona89
Copy link
Contributor

wmcdona89 commented Sep 11, 2020

the current configuration is intended to support these 5 scenarios

  1. one cache shared by all agents as cache.volume is inherited
  2. one cache shared by some agents via cache.volume.pvcComponentName
  3. one or more caches with each being for a specific agent type
  4. one or more caches with each being for a specific agent type and some shared via cache.volume.pvcComponentName
  5. clear a cache on a volume mounted via agent.volumes

Examples (not all values are listed)

  1. one cache shared by all agents as cache.volume is inherited
agent:
  cache:
    pvc:
      create: true
      componentName: "agent-cache"
    volume:
      pvcComponentName: "agent-cache"
      mountPath: "/var/cache"
    clearJob:
      create: true
      workingDir: "/var/cache"
additionalAgents:
  maven:
  maven-xlarge:
  nodejs:
  python:
  1. one cache shared by some agents via cache.volume.pvcComponentName
agent:
  cache:
    pvc:
      create: true
      componentName: "agent-cache"
    clearJob:
      create: true
      workingDir: "/var/cache"
additionalAgents:
  maven:
    cache:
      volume:
        pvcComponentName: "agent-cache"
        mountPath: "/var/cache"
  maven-xlarge:
    cache:
      volume:
        pvcComponentName: "agent-cache"
        mountPath: "/var/cache"
  nodejs:
    cache:
      volume:
        pvcComponentName: "agent-cache"
        mountPath: "/var/cache"
  python:
  1. one or more caches with each being for a specific agent type
agent:
  cache:
    pvc:
      create: false
    clearJob:
      create: false
additionalAgents:
  maven:
  maven-xlarge:
  nodejs:
    cache:
      pvc:
        create: true
        componentName: "nodejs-cache"
      volume:
        pvcComponentName: "nodejs-cache"
        mountPath: "/var/cache"
      clearJob:
        create: true
        workingDir: "/var/cache"
  python:
    cache:
      pvc:
        create: true
        componentName: "python-cache"
      volume:
        pvcComponentName: "python-cache"
        mountPath: "/var/cache"
      clearJob:
        create: true
        workingDir: "/var/cache"
  1. one or more caches with each being for a specific agent type and some shared via cache.volume.pvcComponentName
agent:
  cache:
    pvc:
      create: false
    clearJob:
      create: false
additionalAgents:
  maven:
    cache:
      pvc:
        create: true
        componentName: "maven-cache"
      volume:
        pvcComponentName: "maven-cache"
        mountPath: "/var/cache"
      clearJob:
        create: true
        workingDir: "/var/cache"
  maven-xlarge:
    cache:
      volume:
        pvcComponentName: "maven-cache"
        mountPath: "/var/cache"
  nodejs:
  python:
  1. clear a cache on a volume mounted via agent.volumes
agent:
  cache:
    pvc:
      create: false
    clearJob:
      create: false
additionalAgents:
  maven:
  maven-xlarge:
  nodejs:
  python:
    cache:
      clearJob:
        create: true
        componentName: "clear-pip-cache"
        workingDir: /var/myapp/mynfs/pip
    volumes:
      - type: Nfs
        mountPath: /var/myapp/mynfs/pip
        serverAddress: "192.0.2.0"
        serverPath: /var/lib/containers

@wmcdona89
Copy link
Contributor

wmcdona89 commented Sep 11, 2020

we could potentially limit this PR to the simplest configuration required to support scenario #1

  1. one cache shared by all agents

this would simplify...

  • the PVC and CronJob templates as they wouldn't need to process additionalAgents because additionalAgents wouldn't be able to create PVC or CronJob resources
  • the CronJob template as it wouldn't need to mount agent.volumes
  • the values as cache.volumes could be eliminated since agents would no longer be able to reference a specific PVC created by another agent

Example (not all values are listed)

agent:
  cache:
    pvc:
      create: true
      componentName: "agent-cache"
      mountPath: "/var/cache"
    clearJob:
      create: true
      workingDir: "/var/cache"
additionalAgents:
  maven:
  maven-xlarge:
  nodejs:
  python:

@krishnakvall
Copy link
Contributor Author

krishnakvall commented Sep 14, 2020

we could potentially limit this PR to the simplest configuration required to support scenario #1

  1. one cache shared by all agents

Have been exploring the comments on a single cache and that certainly simplifies things for the users. Just create a single large cache that can be used by all agents, potentially for all package managers.

The Kubernetes plugin already allows you to achieve this by setting the workspace volume, the yaml will look very similar to what Aaron shared above. Under kubernetes-plugin , see the section: workspaceVolume . By default the plugin creates and mounts an emptyDir at this path but we can define a PVC and mount to be shared between all agents.

However, it does look like any files (settings, creds etc...) that are baked into the home directory will be overwritten. However, its a best practice to mount these through separate config maps, anyways. We could add a note for this.

JENKINS-45635
JENKINS-48067

NOTE: also checked out the alternative method to Create a single large PVC and mount different cache directories (.m2, .npm etc...) on the same PVC. However, subPath is not yet supported by the kubernetes plugin when mounting agent volumes, so that's a no-go.

P.S. might be a good idea to create the PVC using the chart instead of dynamicPVC. While we want the helm chart to have the permissions to create PVC's not sure we want the Jenkins Service account to have the same permissions.

@krishnakvall
Copy link
Contributor Author

krishnakvall commented Sep 25, 2020

we could potentially limit this PR to the simplest configuration required to support scenario #1

  1. one cache shared by all agents

Have been exploring this but looks like a non-starter with the current setup - details below. With this situation, it might be best to stick with the current pattern of agent-specific caches. This is a necessity, I believe, as the current agent architecture seems to be writing to the container filesystem which is very slow.

  1. The workspace volume I mentioned in the previous comment was moved from /home/jenkins to /home/jenkins/agent so that settings that teams have baked into the agent image are not overwritten
  2. All libraries are pulled into sub-directories under /home/jenkins and so these are being written to the container filesystem at this time
  3. The single large cache has to be mounted under /home/jenkins so that all the caches libraries etc are written into the PVC. However, realized when developing the PR that nested volume mounts are not supported in Kubernetes
    Kubernetes volumes
Volumes can not mount onto other volumes or have hard links to other volumes

Funny thing, I did not get an error when trying this out with a simple deployment, but one of the mounts just doesn't get reflected. At best, the behavior is undefined. In the case of the PR, the agent just doesn't start up.
Issue with nested mounts
4. There is no option in the Kubernetes Jenkins Plugin to NOT set the workspace volume K8s plugin Jenkins. The tag does not have an option for none

workspaceVolume The type of volume to use for the workspace. Can be emptyDirWorkspaceVolume (default), dynamicPVC(), hostPathWorkspaceVolume(), nfsWorkspaceVolume(), or persistentVolumeClaimWorkspaceVolume().

So effectively, we are left with the agent/ technology specific cache option. Let me know your thoughts @torstenwalter @wmcdona89 . Can clean up this PR with remaining comments and fix merge conflicts if we are good with the approach.

@stale
Copy link

stale bot commented Oct 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 25, 2020
@stale
Copy link

stale bot commented Nov 9, 2020

This issue is being automatically closed due to inactivity.

@stale stale bot closed this Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants