Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes reporter is broken on katacoda #2049

Closed
errordeveloper opened this issue Nov 30, 2016 · 14 comments
Closed

kubernetes reporter is broken on katacoda #2049

errordeveloper opened this issue Nov 30, 2016 · 14 comments
Assignees
Labels
bug Broken end user or developer functionality; not working as the developers intended it k8s Pertains to integration with Kubernetes
Milestone

Comments

@errordeveloper
Copy link
Contributor

You can reproduce it here: https://www.weave.works/guides/cloud-testdrive-part-1-setup-troubleshooting/.

The symptoms:

  • now kubernetes view in the app
  • probes log error generating report: open /sys/hypervisor/uuid: no such file or directory

Details:

Katacoda doesn't have /sys/class/dmi/id/product_uuid, and we incorrectly fallback to /sys/hypervisor/uuid, which is also missing.

The code is in probe/kubernetes/reporter.go#L319 and it looks very much identical to what we have in vendor/github.com/weaveworks/go-checkpoint/checkpoint.go#L342, except that we ingnore the error in the checkpointing code.

What this code is trying to do is find this node in Kubernetes API, which is something I've looked into from kubeadm perspective, and it's not as simple as you'd wish...

@errordeveloper
Copy link
Contributor Author

I think we should try using kubelet's API (by default it's read-only/non-TLS on port 10255 and read-write/TLS on port 10250), so for example node info can be obtained from http://localhost:10255/spec/. Perhaps we could keep the current logic for the worst case scenario, e.g. when kubelet API had been disabled, or is on a port where we cannot find it (although I think kubelet registers it's port with the API server, so we should be able to tell what it is, if it's not the default 10255).

@errordeveloper errordeveloper added bug Broken end user or developer functionality; not working as the developers intended it k8s Pertains to integration with Kubernetes labels Nov 30, 2016
@errordeveloper errordeveloper self-assigned this Nov 30, 2016
@errordeveloper
Copy link
Contributor Author

Another option is to try and match any non-loopback IP addresses.

@errordeveloper errordeveloper changed the title kubernetes reports is broken on katacoda kubernetes reporter is broken on katacoda Dec 1, 2016
@2opremio 2opremio added this to the December2016 milestone Dec 5, 2016
@errordeveloper errordeveloper removed their assignment Dec 6, 2016
@errordeveloper
Copy link
Contributor Author

@2opremio I've un-assigned this from myself, as I don't think I'm gonna find time to work on this soon, we have also pinned Katacoda to 1.0 for the time being.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Dec 7, 2016

I am afraid this was broken in 1.0 already, not sure why I was expecting it to work...

@2opremio 2opremio modified the milestones: December2016, EOY 2016 Dec 13, 2016
@2opremio 2opremio self-assigned this Jan 5, 2017
@2opremio
Copy link
Contributor

Katacoda doesn't have /sys/class/dmi/id/product_uuid, and we incorrectly fallback to /sys/hypervisor/uuid, which is also missing.

Question is, where does kubernetes obtain the uui (key system_uuid) served by http://localhost:10255/spec/ ? Because we could surely do the same.

@2opremio
Copy link
Contributor

except that we ingnore the error in the checkpointing code.

We use the node-name to avoid reporting the pods from other nodes.

We could consider it an optimization and report all pods when the uuid isn't found (printing an appropiate warning).

@2opremio
Copy link
Contributor

2opremio commented Jan 10, 2017

Another option is to try and match any non-loopback IP addresses.

Of the pods? How?

@2opremio
Copy link
Contributor

Question is, where does kubernetes obtain the uui (key system_uuid) served by http://localhost:10255/spec/ ? Because we could surely do the same.

This is how they do it.

But it's very similar to how we do it. Looking forward to the output of http://localhost:10255/spec/ @errordeveloper

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Jan 11, 2017

// http://localhost:10255/spec/
{
  "num_cores": 2,
  "cpu_frequency_khz": 2599996,
  "memory_capacity": 2097504256,
  "machine_id": "ccd70b39268d46cd810ca7e170c55b50",
  "system_uuid": "ccd70b39268d46cd810ca7e170c55b50",
  "boot_id": "875f6622-4e08-4413-b983-a413ed7ffa1c",
  "filesystems": [
   {
    "device": "/dev/vda1",
    "capacity": 6382899200,
    "type": "vfs",
    "inodes": 800000,
    "has_inodes": true
   }
  ],
  "disk_map": {
   "253:0": {
    "name": "vda",
    "major": 253,
    "minor": 0,
    "size": 6656360448,
    "scheduler": "none"
   },
   "253:16": {
    "name": "vdb",
    "major": 253,
    "minor": 16,
    "size": 385024,
    "scheduler": "none"
   },
   "2:0": {
    "name": "fd0",
    "major": 2,
    "minor": 0,
    "size": 4096,
    "scheduler": "deadline"
   }
  },
  "network_devices": [
   {
    "name": "ens3",
    "mac_address": "02:42:ac:11:00:22",
    "speed": -1,
    "mtu": 1500
   }
  ],
  "topology": [
   {
    "node_id": 0,
    "memory": 2097504256,
    "cores": [
     {
      "core_id": 0,
      "thread_ids": [
       0
      ],
      "caches": [
       {
        "size": 32768,
        "type": "Data",
        "level": 1
       },
       {
        "size": 32768,
        "type": "Instruction",
        "level": 1
       },
       {
        "size": 4194304,
        "type": "Unified",
        "level": 2
       }
      ]
     }
    ],
    "caches": null
   },
   {
    "node_id": 1,
    "memory": 0,
    "cores": [
     {
      "core_id": 0,
      "thread_ids": [
       1
      ],
      "caches": [
       {
        "size": 32768,
        "type": "Data",
        "level": 1
       },
       {
        "size": 32768,
        "type": "Instruction",
        "level": 1
       },
       {
        "size": 4194304,
        "type": "Unified",
        "level": 2
       }
      ]
     }
    ],
    "caches": null
   }
  ],
  "cloud_provider": "Unknown",
  "instance_type": "Unknown",
  "instance_id": "None"
 }

@errordeveloper
Copy link
Contributor Author

Another option is to try and match any non-loopback IP addresses.

Of the pods?

Yeah, Scope probe pods, which in host network, but anyway...

How?

I'm not sure why are you asking... Anyhow, I'd obtain a list of all IPs, which we probably already have, then calllientset.Core().Nodes().List(v1.ListOptions{}) and look for a node that has exactly the same set of addresses.
You might even be able to pass v1.ListOptions{Fields: "<selectorExpr>"}, but I'm not 100% sure if it will work, since the addresses are an array an I'm don't know if filed selectors work with arrays.

@2opremio
Copy link
Contributor

2opremio commented Jan 11, 2017

Alright, after opening a terminal in the machine I realized that the uuid is gathered from /etc/machine-id (last branch in the cadvisor function). What I am going to do is vendor that cadvisor package and use that function instead.

@lukemarsden
Copy link
Contributor

lukemarsden commented Jan 18, 2017

@2opremio could you summarize progress towards fixing this issue please? I've read the referenced issues and it seems somewhat complex, I'm not 100% clear on where you ended up. I'd really like to be able to start demoing the kubernetes integration with Katacoda sooner rather than later :)

@rade
Copy link
Member

rade commented Jan 18, 2017

Should be fixed once #2132 gets merged.

@2opremio
Copy link
Contributor

@lukemarsden Yes, it depends on #2132 which in turn depends on #2136 (working on this last bit right now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Broken end user or developer functionality; not working as the developers intended it k8s Pertains to integration with Kubernetes
Projects
None yet
Development

No branches or pull requests

4 participants