Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply velero crds manually during apps v0.36 upgrade #2008

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

crssnd
Copy link
Contributor

@crssnd crssnd commented Feb 23, 2024

Warning

This is public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request
  • business confidential information, such as customer names

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

What does this PR do / why do we need this PR?

During the upgrade to apps v0.36 the kubectl container used by Velero to apply the CRDs was failing with:

/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/sh)
/tmp/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/sh)

As this error seems to appear only sometimes I decided to apply the Velero CRDs using a migration script instead.

Additional information to reviewers

Screenshots

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packages in the NetworkPolicy Dashboard
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Bug checks:
    • The bug fix is covered by regression tests

Copy link
Contributor

@simonklb simonklb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is fine but it's a bit worrying that we aren't doing anything about the error messages. Even if it fails once and works the second time I think it would be good to at least know why.

log_info " - applying the Velero CRDs on wc"
kubectl_do wc apply --server-side --force-conflicts -f "${ROOT}"/helmfile.d/upstream/vmware-tanzu/velero/crds

helmfile_upgrade wc app=velero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file seem to indicate that it upgrades the CRDs only but this will also upgrade Velero itself as well right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, upgrade both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a nit-pick this file could be renamed to "50-velero-upgrade.sh" to make it more clear that it is upgrading all of velero. But it is ok if you skip that.

@crssnd
Copy link
Contributor Author

crssnd commented Feb 23, 2024

The change is fine but it's a bit worrying that we aren't doing anything about the error messages. Even if it fails once and works the second time I think it would be good to at least know why.

I do not see any other action items that allows us to finish with/fix the apps v0.36 release (in a good way). I am open to suggestions.
Once I manage to investigate this more I will see if other action can be taken from our side.

@crssnd crssnd requested a review from simonklb February 23, 2024 10:29
@crssnd crssnd requested a review from davidumea February 26, 2024 08:51
@crssnd
Copy link
Contributor Author

crssnd commented Feb 26, 2024

@simonklb although I wasn't able yet to fully test this myself, but here are some upstream issues for the same problem we had:
vmware-tanzu/velero#7462
vmware-tanzu/helm-charts#550

I think, for us, applying the new CRDs via a script is a good solution for this release. Let me know if you think there is a better solution.

@Pavan-Gunda, @viktor-f and @raviranjanelastisys please also check this workaround and let me now if you see any issues with it.

@simonklb
Copy link
Contributor

@simonklb although I wasn't able yet to fully test this myself, but here are some upstream issues for the same problem we had: vmware-tanzu/velero#7462 vmware-tanzu/helm-charts#550

I think, for us, applying the new CRDs via a script is a good solution for this release. Let me know if you think there is a better solution.

What about overriding the kubectl image?

@crssnd
Copy link
Contributor Author

crssnd commented Feb 26, 2024

@simonklb although I wasn't able yet to fully test this myself, but here are some upstream issues for the same problem we had: vmware-tanzu/velero#7462 vmware-tanzu/helm-charts#550
I think, for us, applying the new CRDs via a script is a good solution for this release. Let me know if you think there is a better solution.

What about overriding the kubectl image?

I was thinking about that, but at some point in the future we will need to remember to update it (or return to default). This assuming that the new image will not create other issues, now or in the future.

@simonklb
Copy link
Contributor

@simonklb although I wasn't able yet to fully test this myself, but here are some upstream issues for the same problem we had: vmware-tanzu/velero#7462 vmware-tanzu/helm-charts#550
I think, for us, applying the new CRDs via a script is a good solution for this release. Let me know if you think there is a better solution.

What about overriding the kubectl image?

I was thinking about that, but at some point in the future we will need to remember to update it (or return to default). This assuming that the new image will not create other issues, now or in the future.

Yea, if you decide to override the image create an issue to revert it when a fix has been released.

Copy link
Contributor

@viktor-f viktor-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm ok with this fix

@crssnd crssnd merged commit 8f46ff2 into main Feb 27, 2024
9 checks passed
@crssnd crssnd deleted the crssnd/velero-apply-crds-manually branch February 27, 2024 09:52
@Ajarmar Ajarmar mentioned this pull request Mar 8, 2024
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants