Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report: Alarm is not fired for certain detected bugs #224

Closed
Spedoske opened this issue Jun 12, 2023 · 6 comments
Closed

Bug report: Alarm is not fired for certain detected bugs #224

Spedoske opened this issue Jun 12, 2023 · 6 comments
Assignees

Comments

@Spedoske
Copy link
Collaborator

I was trying to reproduce two bugs of the three detected bugs in Rabbitmq Operator, which is documented in bugs.md.
rabbitmq/cluster-operator#992
rabbitmq/cluster-operator#928

Bug #992

Acto changed storageClassName, and there is no changes in PVC. So there should be an alarm for inconsistency.
(trial-00-0006)

---------- INPUT DELTA  ----------
{
      "values_changed": {
            "root['spec']['persistence']['storageClassName']": {
                  "prev": "standard",
                  "curr": "ACTOKEY",
                  "path": [
                        "spec",
                        "persistence",
                        "storageClassName"
                  ]
            }
      }
}
---------- SYSTEM DELTA ----------
{
      "pod": {},
      "deployment_pods": {},
      "stateful_set": {},
      "deployment": {},
      "config_map": {},
      "service": {},
      "service_account": {},
      "pvc": {},
      "cronjob": {},
      "ingress": {},
      "network_policy": {},
      "pod_disruption_budget": {},
      "secret": {},
      "endpoints": {},
      "job": {},
      "role": {},
      "role_binding": {},
      "custom_resource_spec": {
            "values_changed": {
                  "root['persistence']['storageClassName']": {
                        "prev": "standard",
                        "curr": "ACTOKEY",
                        "path": [
                              "persistence",
                              "storageClassName"
                        ]
                  }
            }
      },
      "custom_resource_status": {}
}

Bug # 928

The state 1 is shown as the following yaml file.
(trial-02-0023)

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: test-cluster
  namespace: rabbitmq-system
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: null
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - test-cluster
        topologyKey: kubernetes.io/hostname
  image: null
  imagePullSecrets: null
  override:
    statefulSet:
      spec:
        template:
          spec:
            affinity:
              podAntiAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                    - key: app.kubernetes.io/name
                      operator: In
                      values:
                      - test-cluster
                  topologyKey: kubernetes.io/hostname
            containers: []
  ... omitted ...

The state 2 is shown as the following yaml file.

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: test-cluster
  namespace: rabbitmq-system
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: null
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - test-cluster
        topologyKey: kubernetes.io/hostname
  image: null
  imagePullSecrets: null
  override:
    statefulSet:
      spec:
        template:
          spec:
            affinity: null
            containers: []

We found that override.statefulSet.spec.template.spec.affinity become null, but there is no change in system delta.

---------- SYSTEM DELTA ----------
{
      "pod": {},
      "deployment_pods": {},
      "stateful_set": {},
      "deployment": {},
      "config_map": {},
      "service": {},
      "service_account": {},
      "pvc": {},
      "cronjob": {},
      "ingress": {},
      "network_policy": {},
      "pod_disruption_budget": {},
      "secret": {},
      "endpoints": {},
      "job": {},
      "role": {},
      "role_binding": {},
      "custom_resource_spec": {
            "dictionary_item_removed": {
                  "root['override']['statefulSet']['spec']['template']['spec']['affinity'][podAntiAffinity][requiredDuringSchedulingIgnoredDuringExecution][0][labelSelector][matchExpressions][0][key]": {
                        "prev": "app.kubernetes.io/name",
                        "curr": "NotPresent",
                        "path": [
                              "override",
                              "statefulSet",
                              "spec",
                              "template",
                              "spec",
                              "affinity",
                              "podAntiAffinity",
                              "requiredDuringSchedulingIgnoredDuringExecution",
                              0,
                              "labelSelector",
                              "matchExpressions",
                              0,
                              "key"
                        ]
                  },
                  "root['override']['statefulSet']['spec']['template']['spec']['affinity'][podAntiAffinity][requiredDuringSchedulingIgnoredDuringExecution][0][labelSelector][matchExpressions][0][operator]": {
                        "prev": "In",
                        "curr": "NotPresent",
                        "path": [
                              "override",
                              "statefulSet",
                              "spec",
                              "template",
                              "spec",
                              "affinity",
                              "podAntiAffinity",
                              "requiredDuringSchedulingIgnoredDuringExecution",
                              0,
                              "labelSelector",
                              "matchExpressions",
                              0,
                              "operator"
                        ]
                  },
                  "root['override']['statefulSet']['spec']['template']['spec']['affinity'][podAntiAffinity][requiredDuringSchedulingIgnoredDuringExecution][0][labelSelector][matchExpressions][0][values][0]": {
                        "prev": "test-cluster",
                        "curr": "NotPresent",
                        "path": [
                              "override",
                              "statefulSet",
                              "spec",
                              "template",
                              "spec",
                              "affinity",
                              "podAntiAffinity",
                              "requiredDuringSchedulingIgnoredDuringExecution",
                              0,
                              "labelSelector",
                              "matchExpressions",
                              0,
                              "values",
                              0
                        ]
                  },
                  "root['override']['statefulSet']['spec']['template']['spec']['affinity'][podAntiAffinity][requiredDuringSchedulingIgnoredDuringExecution][0][topologyKey]": {
                        "prev": "kubernetes.io/hostname",
                        "curr": "NotPresent",
                        "path": [
                              "override",
                              "statefulSet",
                              "spec",
                              "template",
                              "spec",
                              "affinity",
                              "podAntiAffinity",
                              "requiredDuringSchedulingIgnoredDuringExecution",
                              0,
                              "topologyKey"
                        ]
                  }
            }
      },
      "custom_resource_status": {}
}

So there should be an alarm for inconsistency, but Acto did not alarm.

@tylergu
Copy link
Member

tylergu commented Jun 13, 2023

Will sync with @Spedoske on this

@tianyin
Copy link
Member

tianyin commented Jun 13, 2023

This looks like a very serious problem.

If the alarm is not fired for silent, non-crashing issues, then the tool is broken at this point.

@tylergu
Copy link
Member

tylergu commented Jun 14, 2023

Discussed with @Spedoske. The reason for bug 992 is because it is not supposed to be found by the consistency oracle, it is supposed to be found by the differential oracle. But the differential oracle is not being automatically run inside the acto_main.py, but run through a separate entry point. I fixed it a while ago by including the entrypoint inside acto_main.py so that it will be run automatically next time: https://github.com/xlab-uiuc/acto/tree/run_difftest_in_acto.
The reason for bug 928 is because I changed the behavior during refactoring, but forgot to apply the corresponding change to the blackbox version. I have a unittest to make sure the bug can be reproduced, but it was only running whitebox:

def test_rbop_928(self):
. It will be an easy fix too.

Will close this after https://github.com/xlab-uiuc/acto/tree/run_difftest_in_acto gets merged, @Spedoske will help test if the change works or not.

I think having a routined test is becoming important to help us make sure everything works and merge things quickly. I started writing tests for each bug in the /test directory, but it is still very incomplete. We should also have e2e test too.

Todos before closing this:

@tianyin
Copy link
Member

tianyin commented Jun 14, 2023

@tylergu do you have a handful of technical tasks that @Spedoske can help with?

With @Spedoske become more and more familiar with the codebase, I hope he could take more work for better engineering (e.g., resolve technical debts) for the project.

@tylergu
Copy link
Member

tylergu commented Jun 14, 2023

I opened an issue to keep track of the cleanup tasks #222 , and we should keep adding and picking up tasks in that list as Kashun uses Acto.

@KashunCheng
Copy link
Collaborator

Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants