Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runners continuously get terminated immediately with my custom runner image and custom cert #1834

Closed
5 tasks done
rpall08 opened this issue Sep 22, 2022 · 8 comments
Closed
5 tasks done
Labels

Comments

@rpall08
Copy link

rpall08 commented Sep 22, 2022

Checks

Controller Version

summerwind/actions-runner-controller:latest

Helm Chart Version

v3.9.0

CertManager Version

No response

Deployment Method

Helm

cert-manager installation

Used self signed cert for actions-controller deployment.

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue

Resource Definitions

# runner.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
#kind: Runner
metadata:
  name: oraganization-runner
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: runner
        securityContext:
          privileged: true
      # image: CUSTOME IMAGE HERE
      imagePullPolicy: IfNotPresent
      organization: organization
      group: organization-org-runner-group
      labels:
        - k8s-runner
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: gha-runner-deployment-autoscaler
spec:
  # Runners in the targeted RunnerDeployment won't be scaled down
  # for 5 minutes instead of the default 10 minutes now
  scaleDownDelaySecondsAfterScaleOut: 300
  scaleTargetRef:
    name: gha-runner
    # Uncomment the below in case the target is not RunnerDeployment but RunnerSet
    #kind: RunnerSet
  minReplicas: 3
  maxReplicas: 6
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.75'    # The percentage of busy runners at which the number of desired runners are re-evaluated to scale up
    scaleDownThreshold: '0.3'   # The percentage of busy runners at which the number of desired runners are re-evaluated to scale down
    scaleUpFactor: '1.4'        # The scale up multiplier factor applied to desired count
    scaleDownFactor: '0.7'      # The scale down multiplier factor applied to desired count

To Reproduce

Custom build actions-runner image as per instructions in https://github.com/actions-runner-controller/actions-runner-controller/blob/master/runner/actions-runner.dockerfile.

Update custom image repository in https://github.com/actions-runner-controller/actions-runner-controller/blob/master/charts/actions-runner-controller/values.yaml

image:
  repository: "***********"
  actionsRunnerRepositoryAndTag: "*******/******/summerwind/actions-runner:customVersion"
  dindSidecarRepositoryAndTag: "******/*****/docker:dind"
  pullPolicy: IfNotPresent
  # The default image-pull secrets name for self-hosted runner container.
  # It's added to spec.ImagePullSecrets of self-hosted runner pods.
  actionsRunnerImagePullSecrets: []

deploy using helm command:
helm upgrade --install --namespace **** --wait actions-runner-controller ./actions-runner-controller --values ./actions-runner-controller/values.yaml --debug --cleanup-on-fail

Describe the bug

runner pod(s) are getting terminated immediately and not registered with github enterprise.

Describe the expected behavior

Expecting to start runner pods and register with github enterprise.

Controller Logs

2022-09-22T17:45:18Z	INFO	actions-runner-controller.runnerpod	Runner pod is marked as already unregistered.	{"runnerpod": "runnerNS/organization-runner-4wxk9-zq68r"}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runner	Runner appears to have been registered and running.	{"runner": "runnerNS/organization-runner-4wxk9-mx57q", "podCreationTimestamp": "2022-09-22 17:45:18 +0000 UTC"}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runnerreplicaset	Created replica(s)	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "lastSyncTime": null, "effectiveTime": "<nil>", "templateHashDesired": "596756d896", "replicasDesired": 1, "replicasPending": 0, "replicasRunning": 0, "replicasMaybeRunning": 0, "templateHashObserved": [], "created": 1}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-9jl95", "pods": null}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-9jl95", "pods": null}
2022-09-22T17:45:20Z	INFO	actions-runner-controller.runner	Removed finalizer	{"runner": "runnerNS/organization-runner-4wxk9-mx57q"}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-9jl95", "pods": null}
2022-09-22T17:45:20Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-9jl95", "pods": null}
2022-09-22T17:45:20Z	INFO	actions-runner-controller.runner	Updated registration token	{"runner": "organization-runner-4wxk9-9jl95", "repository": ""}
2022-09-22T17:45:20Z	DEBUG	events	Normal	{"object": {"kind":"Runner","namespace":"runnerNS","name":"organization-runner-4wxk9-9jl95","uid":"415acf3c-d396-476a-b4d3-e14bcce2efe4","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"343474358"}, "reason": "RegistrationTokenUpdated", "message": "Successfully update registration token"}
2022-09-22T17:45:20Z	INFO	actions-runner-controller.runnerpod	Unregistration started before runner ID is assigned. Perhaps the runner pod was terminated by anyone other than ARC? Was it OOM killed? Marking unregistration as completed anyway because there's nothing ARC can do.	{"runnerpod": "runnerNS/organization-runner-4wxk9-mx57q"}
2022-09-22T17:45:20Z	INFO	actions-runner-controller.runner	Created runner pod	{"runner": "runnerNS/organization-runner-4wxk9-9jl95", "repository": ""}
2022-09-22T17:45:20Z	DEBUG	events	Normal	{"object": {"kind":"Runner","namespace":"runnerNS","name":"organization-runner-4wxk9-9jl95","uid":"415acf3c-d396-476a-b4d3-e14bcce2efe4","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"343474361"}, "reason": "PodCreated", "message": "Created pod 'organization-runner-4wxk9-9jl95'"}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runner	Runner appears to have been registered and running.	{"runner": "runnerNS/organization-runner-4wxk9-9jl95", "podCreationTimestamp": "2022-09-22 17:45:20 +0000 UTC"}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runnerreplicaset	Created replica(s)	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "lastSyncTime": null, "effectiveTime": "<nil>", "templateHashDesired": "596756d896", "replicasDesired": 1, "replicasPending": 0, "replicasRunning": 0, "replicasMaybeRunning": 0, "templateHashObserved": [], "created": 1}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-pxljh", "pods": null}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-pxljh", "pods": null}
2022-09-22T17:45:22Z	INFO	actions-runner-controller.runner	Removed finalizer	{"runner": "runnerNS/organization-runner-4wxk9-9jl95"}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-pxljh", "pods": null}
2022-09-22T17:45:22Z	INFO	actions-runner-controller.runner	Updated registration token	{"runner": "organization-runner-4wxk9-pxljh", "repository": ""}
2022-09-22T17:45:22Z	DEBUG	events	Normal	{"object": {"kind":"Runner","namespace":"runnerNS","name":"organization-runner-4wxk9-pxljh","uid":"cfc0c217-7029-463b-ad97-c9bfd750d401","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"343474398"}, "reason": "RegistrationTokenUpdated", "message": "Successfully update registration token"}
2022-09-22T17:45:22Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-pxljh", "pods": null}
2022-09-22T17:45:22Z	INFO	actions-runner-controller.runnerpod	Unregistration started before runner ID is assigned. Perhaps the runner pod was terminated by anyone other than ARC? Was it OOM killed? Marking unregistration as completed anyway because there's nothing ARC can do.	{"runnerpod": "runnerNS/organization-runner-4wxk9-9jl95"}
2022-09-22T17:45:22Z	INFO	actions-runner-controller.runner	Created runner pod	{"runner": "runnerNS/organization-runner-4wxk9-pxljh", "repository": ""}
2022-09-22T17:45:22Z	DEBUG	events	Normal	{"object": {"kind":"Runner","namespace":"runnerNS","name":"organization-runner-4wxk9-pxljh","uid":"cfc0c217-7029-463b-ad97-c9bfd750d401","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"343474400"}, "reason": "PodCreated", "message": "Created pod 'organization-runner-4wxk9-pxljh'"}
2022-09-22T17:45:24Z	DEBUG	actions-runner-controller.runner	Runner appears to have been registered and running.	{"runner": "runnerNS/organization-runner-4wxk9-pxljh", "podCreationTimestamp": "2022-09-22 17:45:22 +0000 UTC"}
2022-09-22T17:45:24Z	DEBUG	actions-runner-controller.runnerreplicaset	Created replica(s)	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "lastSyncTime": null, "effectiveTime": "<nil>", "templateHashDesired": "596756d896", "replicasDesired": 1, "replicasPending": 0, "replicasRunning": 0, "replicasMaybeRunning": 0, "templateHashObserved": [], "created": 1}
2022-09-22T17:45:24Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-49t4n", "pods": null}
2022-09-22T17:45:24Z	INFO	actions-runner-controller.runner	Removed finalizer	{"runner": "runnerNS/organization-runner-4wxk9-pxljh"}
2022-09-22T17:45:24Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-49t4n", "pods": null}
2022-09-22T17:45:24Z	DEBUG	actions-runner-controller.runnerreplicaset	Skipped reconcilation because owner is not synced yet	{"runnerreplicaset": "runnerNS/organization-runner-4wxk9", "owner": "runnerNS/organization-runner-4wxk9-49t4n", "pods": null}

Runner Pod Logs

Runner pods are terminated immediately, not able to get logs.

Additional Context

No response

@rpall08 rpall08 added the bug Something isn't working label Sep 22, 2022
@toast-gear
Copy link
Collaborator

toast-gear commented Sep 22, 2022

https://github.com/actions-runner-controller/actions-runner-controller#autoscaling

Important!!! If you opt to configure autoscaling, ensure you remove the replicas: attribute in the RunnerDeployment / RunnerSet kinds that are configured for autoscaling #206 (comment)

that isn't helping

@rpall08
Copy link
Author

rpall08 commented Sep 22, 2022

removed replica attribute, but still same issue, runner pod getting created and terminated.

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
#kind: Runner
metadata:
name: organization-runner
spec:
#replicas: 1
template:
spec:
containers:
- name: runner
securityContext:
privileged: true
# image: CUSTOME IMAGE HERE
imagePullPolicy: IfNotPresent
organization: *************
group: *******-org-runner-group
labels:
- k8s-runner

Logs from Controller:

2022-09-22T22:32:28Z DEBUG events Normal {"object": {"kind":"Runner","namespace":"runnerNS","name":"organization-runner-6wp5z-b997h","uid":"7d47f1c0-7b66-411d-87b9-1067f34bcc32","apiVersion":"actions.summerwind.dev/v1alpha1","resourceVersion":"343787784"}, "reason": "PodCreated", "message": "Created pod 'organization-runner-6wp5z-b997h'"}
2022-09-22T22:32:28Z INFO organization-runner-controller.runnerpod Unregistration started before runner ID is assigned. Perhaps the runner pod was terminated by anyone other than ARC? Was it OOM killed? Marking unregistration as completed anyway because there's nothing ARC can do. {"runnerpod": "runnerNS/organization-runner-6wp5z-

@mumoshu
Copy link
Collaborator

mumoshu commented Sep 22, 2022

@rpall08 As the log Marking unregistration as completed anyway because there's nothing ARC can do implies, this might be due to your custom runner image issues, perhaps around the entrypoint. Try running kubectl logs --previous to see the logs before it restarted. Run kubectl describe po to see what the pod events.

Also, just to be extra sure, how you considered this as a bug? 🤔
You seem to be using custom certs and a custom runner image. Either can result in issues that can't be addressed at all by ARC.
Can we move this to Discussions as Q&A until it becomes crystal clear this is a bug?

@mumoshu mumoshu changed the title Bug Runners continuously get terminated immediately with my custom runner image and custom cert Sep 22, 2022
@mumoshu mumoshu removed the bug Something isn't working label Sep 22, 2022
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Oct 23, 2022
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2022
@chetanyanamandra
Copy link

we are facing the same issue, did you find a resolution to this?

@jamesloosli
Copy link

Also having this same issue with a custom runner container. Unsure what the workaround is.

@VetonShalaRB
Copy link

Hello,
has anyone been able to resolve this issue? I'm facing the same issue now in runner container.

@gopikris83
Copy link

Hi, I am also facing the same issue and it would be really helpful if someone could suggest moving on. We need this custom image running for developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants