Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: pod setup fails internally #1621

Closed
4 tasks done
bigwheel opened this issue Jul 11, 2022 · 5 comments · Fixed by #1623
Closed
4 tasks done

Bug: pod setup fails internally #1621

bigwheel opened this issue Jul 11, 2022 · 5 comments · Fixed by #1623
Labels
bug Something isn't working

Comments

@bigwheel
Copy link
Contributor

bigwheel commented Jul 11, 2022

Controller Version

0.23.0

Helm Chart Version

0.18.0

CertManager Version

1.8.0

Deployment Method

ArgoCD

cert-manager installation

I followed README installation guide and Argo CD install cert-manager.
Argo CD showed green and it is not changed.

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  labels:
    argocd.argoproj.io/instance: some-instance-one
  name: custom-action-runner
  namespace: actions-runner-system
spec:
  replicas: 4
  template:
    spec:
      dockerdWithinRunnerContainer: true
      image: summerwind/actions-runner-dind
      imagePullSecrets:
        - name: dockerconfig
      labels:
        - custom-action-runner-staging
      organization: speee
      resources:
        limits:
          cpu: '1'
          memory: 3Gi
        requests:
          cpu: 500m
          memory: 1Gi
      volumeMounts:
        - mountPath: /mount/staging
          name: efs-billing-report-staging
      volumes:
        - name: efs-billing-report-staging
          persistentVolumeClaim:
            claimName: efs-billing-report-staging-fix

To Reproduce

1. Install argo cd
2. run job from github
3. error (see following)

Describe the bug

Today, github job on custom actions runner fails suddenly because we don't change workflow definitions and AC in few weeks.
I started to I investigate this problem.

hint 1

This is job log until yesterday.
スクリーンショット 2022-07-11 14 44 56

This is today's it.
スクリーンショット 2022-07-11 14 45 25

There is unknown step and it is fail cause.

Set up runner
A job started hook has been configured by the self-hosted runner administrator
Error: File doesn't exist

hint 2

In our production environment, job fails several times already.
But n our staging env, job succeeded.
Then, I re-created pods.
Finally, job becomes to fail in staging.

From the above, I guessed docker image was changed recently and it might case problem. Because github job in staging is fewer than production and it sometimes retain old images.

hint 3

See runner pod's log.
I would be suspicious of update-status.

Describe the expected behavior

Run jobs successfully.

Controller Logs

https://gist.github.com/bigwheel/6be187b0e0cd0d73ef001b20d62be7f1

Runner Pod Logs

https://gist.github.com/bigwheel/79b27df9947420bd771f6dcb4d8baf7e

Additional Context

We are using ARC over 1 years. Thank you great operator!

@bigwheel bigwheel added the bug Something isn't working label Jul 11, 2022
@bigwheel
Copy link
Contributor Author

bigwheel commented Jul 11, 2022

I just change image label from latest to v2.294.0-ubuntu-20.04-e3deb0d and job succeed.

-      image: 'summerwind/actions-runner-dind'
+      image: 'summerwind/actions-runner-dind:v2.294.0-ubuntu-20.04-e3deb0d'

By this history https://github.com/actions-runner-controller/actions-runner-controller/commits/master/runner/actions-runner.dockerfile and runner pod log, 11cb9b7 must have a problem around update-status .

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 11, 2022

@bigwheel Hey! Thanks for reporting and the detailed analysis of the problem.

Probably it's due to that actions-runner-dind.dockerfile lacks COPYs the update-status and hooks we added to actions-runner.dockerfile in #1268. We have two dockerfiles for non-dind and dind respectively, and only the dind seems to be affected.

11cb9b7#diff-45ecac98435a59bccdffc03779dd18d22ed154a507be7b20cc0828ed8d1a31b6

I'd greatly appreciate it if you could submit a pull request to fix it 🙏

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 12, 2022

This has been fixed via #1623 and #1624 and the new image is available via actions-runner-dind:v2.294.0-ubuntu-20.04-98b17dc.
Thanks a lot for reporting and fixing it @bigwheel and @gi0baro!

@bigwheel
Copy link
Contributor Author

@mumoshu I checkd new image ( actions-runner-dind:v2.294.0-ubuntu-20.04-98b17dc ) and worked well as last week. Thank you kindful update 😁

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 12, 2022

Thanks for confirming! Enjoy ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants