Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding file /tmp/sonobuoy/..../error.json cannot unmarshal object into Go value of type string #994

Closed
Joseph94m opened this issue Nov 13, 2019 · 4 comments · Fixed by #995
Assignees
Labels
kind/bug Behavior isn't as expected or intended
Milestone

Comments

@Joseph94m
Copy link

What steps did you take and what happened:
We've setup a k8s infra with the following output for kubectl get pods --all-namespaces -o wide

NAMESPACE      NAME                               READY   STATUS      RESTARTS   AGE    IP             NODE       NOMINATED NODE   READINESS GATES
kube-system    calico-node-kvcnk                  1/1     Running     0          17h    192.168.0.21   worker-1   <none>           <none>
kube-system    calico-node-n5ddv                  1/1     Running     0          17h    192.168.0.10   master-0   <none>           <none>
kube-system    calico-node-xr99b                  1/1     Running     1          17h    192.168.0.20   worker-0   <none>           <none>
kube-system    calico-typha-8fcbdb475-wngwv       1/1     Running     1          17h    192.168.0.20   worker-0   <none>           <none>
kube-system    coredns-bd856fb48-4rtb4            1/1     Running     1          17h    10.4.1.2       worker-0   <none>           <none>
kube-system    coredns-bd856fb48-92xjp            1/1     Running     1          17h    10.4.1.3       worker-0   <none>           <none>
kube-system    kube-apiserver-master-0            1/1     Running     0          17h    192.168.0.10   master-0   <none>           <none>
kube-system    kube-controller-manager-master-0   1/1     Running     0          17h    192.168.0.10   master-0   <none>           <none>
kube-system    kube-proxy-c8cxz                   1/1     Running     0          17h    192.168.0.21   worker-1   <none>           <none>
kube-system    kube-proxy-q79hs                   1/1     Running     1          17h    192.168.0.20   worker-0   <none>           <none>
kube-system    kube-proxy-qqjh8                   1/1     Running     0          17h    192.168.0.10   master-0   <none>           <none>
kube-system    kube-scheduler-master-0            1/1     Running     0          17h    192.168.0.10   master-0   <none>           <none>

Then we've run:
sonobuoy run --wait --mode quick
results=$(sonobuoy retrieve)
sonobuoy status returns:

PLUGIN   STATUS    RESULT   COUNT
e2e       failed    unknown   1 

sonobuoy e2e $results returns:
ERRO[0000] could not get tests from archive: failed to find results file "plugins/e2e/results/global/junit_01.xml" in archive

I decided to view the logs on the concerned pod using: sonobuoy logs -p e2e -f

The truncated result is the following:

time="2019-11-13T09:18:18Z" level=info msg="Creating host results for master-0 under /tmp/sonobuoy/d5b45f6d-c3bb-40d2-bf42-93e700855c86/hosts/master-0\n"
time="2019-11-13T09:18:18Z" level=info msg="Creating host results for worker-0 under /tmp/sonobuoy/d5b45f6d-c3bb-40d2-bf42-93e700855c86/hosts/worker-0\n"
time="2019-11-13T09:18:18Z" level=info msg="Creating host results for worker-1 under /tmp/sonobuoy/d5b45f6d-c3bb-40d2-bf42-93e700855c86/hosts/worker-1\n"
time="2019-11-13T09:18:18Z" level=info msg="Running cluster queries"
time="2019-11-13T09:18:18Z" level=info msg="Running ns query (default)"
time="2019-11-13T09:18:18Z" level=info msg="Running ns query (kube-node-lease)"
time="2019-11-13T09:18:18Z" level=info msg="Running ns query (kube-public)"
time="2019-11-13T09:18:19Z" level=info msg="Running ns query (kube-system)"
time="2019-11-13T09:18:20Z" level=info msg="Running ns query (sonobuoy)"
time="2019-11-13T09:18:20Z" level=info msg="Running ns query (subpath-4492)"
time="2019-11-13T09:18:21Z" level=info msg="Namespace default Matched=false"
time="2019-11-13T09:18:21Z" level=info msg="Namespace kube-node-lease Matched=false"
time="2019-11-13T09:18:21Z" level=info msg="Namespace kube-public Matched=false"
time="2019-11-13T09:18:21Z" level=info msg="Namespace kube-system Matched=false"
time="2019-11-13T09:18:21Z" level=info msg="Namespace sonobuoy Matched=true"
time="2019-11-13T09:18:21Z" level=info msg="Namespace subpath-4492 Matched=false"
time="2019-11-13T09:18:21Z" level=info msg="Collecting Pod Logs by namespace (sonobuoy)"
time="2019-11-13T09:18:21Z" level=info msg="Collecting Pod Logs by FieldSelectors []"
time="2019-11-13T09:18:21Z" level=info msg="Log lines after this point will not appear in the downloaded tarball."
time="2019-11-13T09:18:21Z" level=error msg="Error processing file /tmp/sonobuoy/d5b45f6d-c3bb-40d2-bf42-93e700855c86/plugins/e2e/errors/global/error.json: decoding file /tmp/sonobuoy/d5b45f6d-c3bb-40d2-bf42-93e700855c86/plugins/e2e/errors/global/error.json: json: cannot unmarshal object into Go value of type string"
time="2019-11-13T09:18:21Z" level=info msg="Results available at /tmp/sonobuoy/201911130912_sonobuoy_d5b45f6d-c3bb-40d2-bf42-93e700855c86.tar.gz"
time="2019-11-13T09:18:21Z" level=info msg="no-exit was specified, sonobuoy is now blocking"

What did you expect to happen:
I would expect the Unmarshal error not to occur.

Environment:

  • Sonobuoy version: 0.16.3
  • Kubernetes version: (use kubectl version): 1.15.3
  • Cloud provider or hardware configuration: physical machines
  • OS (e.g. from /etc/os-release): Centos el7, Linux ns3150165 3.10.0-1062.4.1.el7.x86_64 # 1 SMP Fri Oct 18 17:15:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Sonobuoy tarball (which contains * below)
    201911130912_sonobuoy_d5b45f6d-c3bb-40d2-bf42-93e700855c86.tar.gz
@johnSchnake
Copy link
Contributor

Thanks for the tarball; it helps immensely when debugging this.

So the Error processing file is from the post-processing logic and the post-processing of the errors directory is a relatively new code path. This definitely is a bug in that logic.

Also, the sonobuoy results command is a newer improvement over the legacy sonobuoy e2e in most cases. Using it we can output the general results (which it just reports a single error due to the errors.json) and when getting the file contents of that error you can see:

$ sonobuoy results $tarball --mode=detailed
e2e|error.json {"error":"Container e2e is terminated state (exit code 0) due to reason: Completed: ","pod": ...
...

So so I can tell what happened is that the aggregator pod saw that the e2e container had completed and after waiting a while it still never got results from the plugin so it recorded the error and unblocked itself.

So I opened up the tarball and looked at the sonobuoy worker logs to see if it tried to report results. It did, but kept getting an error:

time="2019-11-13T09:17:31Z" level=error msg="error entry for attempt: 2, verb: PUT, time: 2019-11-13 09:17:31.979034291 +0000 UTC m=+272.505963855, URL: https://[10.88.0.16]:8080/api/v1/results/global/e2e: Put https://[10.88.0.16]:8080/api/v1/results/global/e2e: dial tcp 10.88.0.16:8080: connect: connection timed out"

So it kept trying to send results to the aggregator but seemed to keep failing due to a timeout.

Unfortunately, there doesn't seem to be any particular logs regarding a problem during that time but if the pod isn't able to PUT the results then there isn't much we can do to resolve that.

The first/main thing to resolve in this situation is the post-processing error you saw; improperly trying to parse the errors.json file.

@johnSchnake johnSchnake added the kind/bug Behavior isn't as expected or intended label Nov 13, 2019
@johnSchnake johnSchnake added this to the v0.16.4 milestone Nov 13, 2019
@johnSchnake johnSchnake self-assigned this Nov 13, 2019
@Joseph94m
Copy link
Author

Thanks. I will await further advancements on this issue. 👍

@johnSchnake
Copy link
Contributor

@Joseph94m Also I wanted to make sure you knew that, according to the pod logs for the e2e run, the 1 test you ran did succeed (in case you were unsure).

The PR to resolve the error parsing is up as well if you want to take a look at it.

@Joseph94m
Copy link
Author

@johnSchnake Thanks. I saw the PR too and the changes seem logical

johnSchnake added a commit that referenced this issue Nov 13, 2019
Currently we were handling only the simple case where the errors.json
file contains a map of strings. However, there are situations where
the json file may contain more structured data which needs handled.

Fixes #994

Signed-off-by: John Schnake <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Behavior isn't as expected or intended
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants