Bug: pod setup fails internally #1621

bigwheel · 2022-07-11T06:01:25Z

Controller Version

0.23.0

Helm Chart Version

0.18.0

CertManager Version

1.8.0

Deployment Method

ArgoCD

cert-manager installation

I followed README installation guide and Argo CD install cert-manager.
Argo CD showed green and it is not changed.

Checks

This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
My actions-runner-controller version (v0.x.y) does support the feature
I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  labels:
    argocd.argoproj.io/instance: some-instance-one
  name: custom-action-runner
  namespace: actions-runner-system
spec:
  replicas: 4
  template:
    spec:
      dockerdWithinRunnerContainer: true
      image: summerwind/actions-runner-dind
      imagePullSecrets:
        - name: dockerconfig
      labels:
        - custom-action-runner-staging
      organization: speee
      resources:
        limits:
          cpu: '1'
          memory: 3Gi
        requests:
          cpu: 500m
          memory: 1Gi
      volumeMounts:
        - mountPath: /mount/staging
          name: efs-billing-report-staging
      volumes:
        - name: efs-billing-report-staging
          persistentVolumeClaim:
            claimName: efs-billing-report-staging-fix

To Reproduce

1. Install argo cd
2. run job from github
3. error (see following)

Describe the bug

Today, github job on custom actions runner fails suddenly because we don't change workflow definitions and AC in few weeks.
I started to I investigate this problem.

hint 1

This is job log until yesterday.

This is today's it.

There is unknown step and it is fail cause.

Set up runner
A job started hook has been configured by the self-hosted runner administrator
Error: File doesn't exist

hint 2

In our production environment, job fails several times already.
But n our staging env, job succeeded.
Then, I re-created pods.
Finally, job becomes to fail in staging.

From the above, I guessed docker image was changed recently and it might case problem. Because github job in staging is fewer than production and it sometimes retain old images.

hint 3

See runner pod's log.
I would be suspicious of update-status.

Describe the expected behavior

Run jobs successfully.

Controller Logs

https://gist.github.com/bigwheel/6be187b0e0cd0d73ef001b20d62be7f1

Runner Pod Logs

https://gist.github.com/bigwheel/79b27df9947420bd771f6dcb4d8baf7e

Additional Context

We are using ARC over 1 years. Thank you great operator!

The text was updated successfully, but these errors were encountered:

bigwheel · 2022-07-11T06:17:19Z

I just change image label from latest to v2.294.0-ubuntu-20.04-e3deb0d and job succeed.

-      image: 'summerwind/actions-runner-dind'
+      image: 'summerwind/actions-runner-dind:v2.294.0-ubuntu-20.04-e3deb0d'

By this history https://github.com/actions-runner-controller/actions-runner-controller/commits/master/runner/actions-runner.dockerfile and runner pod log, 11cb9b7 must have a problem around update-status .

mumoshu · 2022-07-11T07:56:40Z

@bigwheel Hey! Thanks for reporting and the detailed analysis of the problem.

Probably it's due to that actions-runner-dind.dockerfile lacks COPYs the update-status and hooks we added to actions-runner.dockerfile in #1268. We have two dockerfiles for non-dind and dind respectively, and only the dind seems to be affected.

11cb9b7#diff-45ecac98435a59bccdffc03779dd18d22ed154a507be7b20cc0828ed8d1a31b6

I'd greatly appreciate it if you could submit a pull request to fix it 🙏

Fixes #1621

mumoshu · 2022-07-12T01:17:18Z

This has been fixed via #1623 and #1624 and the new image is available via actions-runner-dind:v2.294.0-ubuntu-20.04-98b17dc.
Thanks a lot for reporting and fixing it @bigwheel and @gi0baro!

bigwheel · 2022-07-12T01:51:11Z

@mumoshu I checkd new image ( actions-runner-dind:v2.294.0-ubuntu-20.04-98b17dc ) and worked well as last week. Thank you kindful update 😁

mumoshu · 2022-07-12T02:00:53Z

Thanks for confirming! Enjoy ☺️

bigwheel added the bug Something isn't working label Jul 11, 2022

gi0baro added a commit to casavo/actions-runner-controller that referenced this issue Jul 11, 2022

fix actions#1621: add missing COPY statements to dind docker image

01ddd81

gi0baro mentioned this issue Jul 11, 2022

fix #1621: add missing COPY statements to dind docker image #1623

Merged

bigwheel mentioned this issue Jul 11, 2022

fix 1621:discover runner statuses feature to dind image #1624

Merged

mumoshu closed this as completed in #1623 Jul 11, 2022

mumoshu pushed a commit that referenced this issue Jul 11, 2022

fix #1621: add missing COPY statements to dind docker image

c658dcf

mumoshu pushed a commit that referenced this issue Jul 12, 2022

Fix the dind image to work with the latest entrypoint.sh (#1624)

98b17dc

Fixes #1621

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: pod setup fails internally #1621

Bug: pod setup fails internally #1621

bigwheel commented Jul 11, 2022 •

edited

Loading

bigwheel commented Jul 11, 2022 •

edited

Loading

mumoshu commented Jul 11, 2022

mumoshu commented Jul 12, 2022

bigwheel commented Jul 12, 2022

mumoshu commented Jul 12, 2022

Bug: pod setup fails internally #1621

Bug: pod setup fails internally #1621

Comments

bigwheel commented Jul 11, 2022 • edited Loading

Controller Version

Helm Chart Version

CertManager Version

Deployment Method

cert-manager installation

Checks

Resource Definitions

To Reproduce

Describe the bug

hint 1

hint 2

hint 3

Describe the expected behavior

Controller Logs

Runner Pod Logs

Additional Context

bigwheel commented Jul 11, 2022 • edited Loading

mumoshu commented Jul 11, 2022

mumoshu commented Jul 12, 2022

bigwheel commented Jul 12, 2022

mumoshu commented Jul 12, 2022

bigwheel commented Jul 11, 2022 •

edited

Loading

bigwheel commented Jul 11, 2022 •

edited

Loading