Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App health status is Healthy even if repoURL is wrong and Application healthcheck is enabled #10088

Open
3 tasks done
yellowmegaman opened this issue Jul 24, 2022 · 12 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@yellowmegaman
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

App health status is Healthy even if repoURL is wrong and resource.customizations.health.argoproj.io_Application is in place.
To Reproduce

  1. Install ArgoCD on kubernetes
  2. Add healthcheck to helm chart values:
      resource.customizations.health.argoproj.io_Application: |
        hs = {}
        if obj.status ~= nil then
            if obj.status.conditions ~= nil then
                for i, condition in ipairs(obj.status.conditions) do
                    if condition.type == "ComparisonError" and condition.status ~= "False" then
                        hs.status = "Degraded"
                        hs.message = "can't deploy"
                        return hs
                    elseif obj.status.phase == nil then
                        hs.status = "Degraded"
                        hs.message = "phase nil"
                        return hs
                    elseif ( obj.status.phase == "Succeeded" or obj.status.phase == "Initialized" ) then
                      hs.status = "Healthy"
                    end
                end
            end
        elseif obj.status == nil then
          hs.status = "Degraded"
          hs.message = "unknown status"
          return hs
        end
        hs.status = "Healthy"
        hs.message = "Seems to be fine"
        return hs
  1. Apply Application with wrong definition:
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: wrong
  namespace: argocd
spec:
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - Validate=true
      - CreateNamespace=true
  source:
    repoURL: https://github.com/wrongurl/wrongrepo.git
    path: absolutelywrong
    targetRevision: v3.0.4
  destination:
    name: 'in-cluster'
    namespace: wrong

Expected behavior

Application health status should not be Healthy
Screenshots

image

Version

argocd: v2.4.7+81630e6
  BuildDate: 2022-07-18T21:17:35Z
  GitCommit: 81630e6d5075ac53ac60457b51343c2a09a666f4
  GitTreeState: clean
  GoVersion: go1.18.4
  Compiler: gc
  Platform: linux/amd64

Logs

$ kubectl -n argocd get app wrong -o yaml
...
status:
  conditions:
  - lastTransitionTime: "2022-07-24T20:20:26Z"
    message: 'rpc error: code = Unknown desc = authentication required'
    type: ComparisonError
  health:
    status: Healthy
  reconciledAt: "2022-07-24T20:43:58Z"
  sync:
    comparedTo:
      destination:
        name: in-cluster
        namespace: wrong
      source:
        path: absolutelywrong
        repoURL: https://github.com/wrongurl/wrongrepo.git
        targetRevision: v3.0.4
    status: Unknown

Question: is there a way to modify health.lua to set proper health status? It is working for the app-of-apps Application (Status is Degraded, if wrong application is part of it), but not for the actual wrong app.

@yellowmegaman yellowmegaman added the bug Something isn't working label Jul 24, 2022
@aagu
Copy link

aagu commented Jul 25, 2022

same problem here, be closed without solution

#9745

@yellowmegaman
Copy link
Author

same problem here, be closed without solution

#9745

Thanks for the reply!
The difference is that I do address ComparisonError in health.lua for kind: Application

@yellowmegaman
Copy link
Author

If relevant, if wrong app is part of app-of-apps, everything is almost great. Almost because app-of-apps is degraded, wrong app is degraded too, but the UI is showing Health OK if I go to the wrong app.

App-of-apps:
image
Wrong app:
image

API status for app-of-apps:

...
  - group: argoproj.io
    health:
      message: can't deploy
      status: Degraded
    kind: Application
    name: wrong-sonic
    namespace: argocd
    status: Synced
    version: v1alpha1
...

API status for worng app as part of app-of-apps:

...
status:
  conditions:
  - lastTransitionTime: "2022-07-25T19:12:30Z"
    message: 'rpc error: code = Unknown desc = `helm pull --destination /tmp/8be01600-0064-48f0-be17-1d8b813041b6
      --version v3.0.4 --repo https://chartmuseum.wrong.url chart-wrong-sonic --pass-credentials`
      failed exit status 1: Error: looks like "https://chartmuseum.wrong.url" is not
      a valid chart repository or cannot be reached: Get "https://chartmuseum.wrong.url/index.yaml":
      dial tcp: lookup chartmuseum.wrong.url on 10.43.0.10:53: no such host'
    type: ComparisonError
  health:
    status: Healthy
...

@venkatamutyala
Copy link
Contributor

venkatamutyala commented Jan 21, 2023

Another impact of this issue, is that the false "Healthy" status appears to fool the sync-wave and causes later syncs to run earlier than expected

@GaryClarkElekta
Copy link

I see this issue all the time. At the top level it is all HEALTHY but looking at an indevidual app we see:-
image
but the unknown status is not shown at the top level where all say healthy
image
This makes it very difficult to know what is wrong as I need to know which app is not healthy in order to look at the reason!

@crenshaw-dev
Copy link
Member

I believe the definition of an app health is "unhealthy if and only if one of its immediate child resources is unhealthy." Since an Application is not a child of itself, it is not considered unhealthy, even if its health check would fail. That's why it works in an app of apps - the unhealthy app is the immediate child of an app.

I think it would probably be okay to surface, at least in the UI, the health check result of the root node (the app itself).

@GaryClarkElekta
Copy link

I believe the definition of an app health is "unhealthy if and only if one of its immediate child resources is unhealthy." Since an Application is not a child of itself, it is not considered unhealthy, even if its health check would fail. That's why it works in an app of apps - the unhealthy app is the immediate child of an app.

I think it would probably be okay to surface, at least in the UI, the health check result of the root node (the app itself).

I really have no idea what you mean here. If a child app is in an unknown state the parent app should show this. Its common sense that we see the error at the top level.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 16, 2023

I don't disagree. I'm just describing the current behavior and why it is the way it is.

The following describes the current behavior. None of these is correct, it's just the way it is today.

  1. The Application resource, by default, has no health check.
  2. The health of the root node of an Application is determined by the health of its immediate child resources, and nothing else.
  3. Adding a custom health check for Application will not change #2 - the root node's health is still not determined by the root node's own health check, only those of its child resources.
  4. An app of apps will take advantage of any custom health check added for the Application resource.

I think that, to get the behavior requested in this issue, we'd need to change at least #2 - we'd need the root node (Application) to reflect the results of its own health check, not just that of its child resources.

For the requested behavior to be the default, we'd need to change #1 and include an Application health check by default.

@GaryClarkElekta
Copy link

The health of a parent should be the health of all children and chidren of children. If a single child is unhealthy the parent, parent of the parent etc is not healthy.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 16, 2023

The health of a parent should be the health of all children and children of children.

This is intentionally not the case.

For example, imagine this:

app
  -> Rollout (healthy)
    -> ReplicaSet A (healthy)
      -> Pod 1 (healthy)
    -> ReplicaSet B [canary] (degraded)
      -> Pod 2 (degraded)

Argo CD would be wrong to assume that the degraded ReplicaSet/Pod mean that the app is degraded. In fact, the Argo Rollout is handling the situation fine. It's aware of the current state of the canary ReplicaSet and is routing traffic appropriately.

Argo CD trusts the health check of a parent resource to describe the aggregate health of its child resources. Argo CD does not and should not make assumptions about what the health of a child resource means for the health of a parent resource. Only the parent resource knows its current aggregate health.

@GaryClarkElekta
Copy link

All I am asking for is some kind of notice at the top level that there is an error! I do not care how this is done but if there is an error somewhere I should be informed without needing to know which application to look for

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 18, 2023

Understood. I care how it is done, because someone will need to design, write, and test the code. Hopefully someone who is reading this thread. :-)

I also care how it is done, because "some kind of notice at the top level" is ambiguous. What should the notice look like? Will it affect Application filters in the Applications list view? If it means that we need to ship an Application health check by default, how do we make that opt-in to avoid changing the behavior of existing apps-of-apps?

Maintaining software requires collaboration between customers and coders in order to establish complete, clear, and accurate requirements. In this case, both the customers and the coders are on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants