Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch] Extend custom Application healthcheck to detect Helm chart rendering failures #1201

Merged
merged 1 commit into from
Aug 23, 2024

Conversation

tomklapiscak
Copy link
Contributor

@tomklapiscak tomklapiscak commented Aug 23, 2024

By default, an Application's health status will not be affected if ArgoCD fails to render its Helm template (e.g. due to a bad secret reference causing AVP to error).

For example, in the following screenshot the application in syncwave 1 is showing as synced and healthy even though its Helm chart rendering failed. Wave 2 was allowed to proceed even though the resources in wave 1 were not deployed. This would cause problems if the application in wave 2 depended on a resource being configured in wave 1 first:
image

If we look inside the application in wave 1, we can see that it is in an error state and no resources were deployed:
image

ArgoCD is working as intended here; the Application in wave 1 is showing as synced, since this is the sync status of the Application CR itself (which synced just fine). It is showing as healthy since the health of a resource in ArgoCD is determined solely by the health of its direct children (and the application in wave 1 has no children since it failed to deploy any resources). The problem is that the intended behaviour means we cannot rely on syncwaves to control deployment ordering when Helm template rendering fails.

There are a number of issues already open against ArgoCD relating to this, but none have come to any sort of conclusion about what should be changed. The most relevant one I have found is this: argoproj/argo-cd#10088.
That includes the idea of adding logic to the custom Application healthcheck to check for the "ComparisonError" condition seen when the helm chart fails to render. This at least prevents ArgoCD from allowing sibling applications in later waves from syncing.

This PR extends the custom Application healthcheck established by gitops-bootstrap to set Application health to degraded when this ComparisonError condition is present. I've verified that this works as expected and (crucially) blocks sibling applications in later syncwaves from progressing:
image

https://jsw.ibm.com/browse/MASCORE-3669

@tomklapiscak tomklapiscak marked this pull request as draft August 23, 2024 11:31
@tomklapiscak tomklapiscak marked this pull request as ready for review August 23, 2024 12:05
@whitfiea whitfiea merged commit b76fdf7 into master Aug 23, 2024
8 checks passed
@whitfiea whitfiea deleted the mascore3669 branch August 23, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants