kubernetes reporter is broken on katacoda #2049

errordeveloper · 2016-11-30T12:55:34Z

You can reproduce it here: https://www.weave.works/guides/cloud-testdrive-part-1-setup-troubleshooting/.

The symptoms:

now kubernetes view in the app
probes log error generating report: open /sys/hypervisor/uuid: no such file or directory

Details:

Katacoda doesn't have /sys/class/dmi/id/product_uuid, and we incorrectly fallback to /sys/hypervisor/uuid, which is also missing.

The code is in probe/kubernetes/reporter.go#L319 and it looks very much identical to what we have in vendor/github.com/weaveworks/go-checkpoint/checkpoint.go#L342, except that we ingnore the error in the checkpointing code.

What this code is trying to do is find this node in Kubernetes API, which is something I've looked into from kubeadm perspective, and it's not as simple as you'd wish...

The text was updated successfully, but these errors were encountered:

errordeveloper · 2016-11-30T13:01:50Z

I think we should try using kubelet's API (by default it's read-only/non-TLS on port 10255 and read-write/TLS on port 10250), so for example node info can be obtained from http://localhost:10255/spec/. Perhaps we could keep the current logic for the worst case scenario, e.g. when kubelet API had been disabled, or is on a port where we cannot find it (although I think kubelet registers it's port with the API server, so we should be able to tell what it is, if it's not the default 10255).

errordeveloper · 2016-11-30T16:19:18Z

Another option is to try and match any non-loopback IP addresses.

errordeveloper · 2016-12-06T17:30:36Z

@2opremio I've un-assigned this from myself, as I don't think I'm gonna find time to work on this soon, we have also pinned Katacoda to 1.0 for the time being.

errordeveloper · 2016-12-07T12:03:04Z

I am afraid this was broken in 1.0 already, not sure why I was expecting it to work...

2opremio · 2017-01-10T21:08:26Z

Katacoda doesn't have /sys/class/dmi/id/product_uuid, and we incorrectly fallback to /sys/hypervisor/uuid, which is also missing.

Question is, where does kubernetes obtain the uui (key system_uuid) served by http://localhost:10255/spec/ ? Because we could surely do the same.

2opremio · 2017-01-10T21:11:45Z

except that we ingnore the error in the checkpointing code.

We use the node-name to avoid reporting the pods from other nodes.

We could consider it an optimization and report all pods when the uuid isn't found (printing an appropiate warning).

2opremio · 2017-01-10T21:12:41Z

Another option is to try and match any non-loopback IP addresses.

Of the pods? How?

2opremio · 2017-01-10T21:21:50Z

Question is, where does kubernetes obtain the uui (key system_uuid) served by http://localhost:10255/spec/ ? Because we could surely do the same.

This is how they do it.

But it's very similar to how we do it. Looking forward to the output of http://localhost:10255/spec/ @errordeveloper

errordeveloper · 2017-01-11T14:28:51Z

// http://localhost:10255/spec/
{
  "num_cores": 2,
  "cpu_frequency_khz": 2599996,
  "memory_capacity": 2097504256,
  "machine_id": "ccd70b39268d46cd810ca7e170c55b50",
  "system_uuid": "ccd70b39268d46cd810ca7e170c55b50",
  "boot_id": "875f6622-4e08-4413-b983-a413ed7ffa1c",
  "filesystems": [
   {
    "device": "/dev/vda1",
    "capacity": 6382899200,
    "type": "vfs",
    "inodes": 800000,
    "has_inodes": true
   }
  ],
  "disk_map": {
   "253:0": {
    "name": "vda",
    "major": 253,
    "minor": 0,
    "size": 6656360448,
    "scheduler": "none"
   },
   "253:16": {
    "name": "vdb",
    "major": 253,
    "minor": 16,
    "size": 385024,
    "scheduler": "none"
   },
   "2:0": {
    "name": "fd0",
    "major": 2,
    "minor": 0,
    "size": 4096,
    "scheduler": "deadline"
   }
  },
  "network_devices": [
   {
    "name": "ens3",
    "mac_address": "02:42:ac:11:00:22",
    "speed": -1,
    "mtu": 1500
   }
  ],
  "topology": [
   {
    "node_id": 0,
    "memory": 2097504256,
    "cores": [
     {
      "core_id": 0,
      "thread_ids": [
       0
      ],
      "caches": [
       {
        "size": 32768,
        "type": "Data",
        "level": 1
       },
       {
        "size": 32768,
        "type": "Instruction",
        "level": 1
       },
       {
        "size": 4194304,
        "type": "Unified",
        "level": 2
       }
      ]
     }
    ],
    "caches": null
   },
   {
    "node_id": 1,
    "memory": 0,
    "cores": [
     {
      "core_id": 0,
      "thread_ids": [
       1
      ],
      "caches": [
       {
        "size": 32768,
        "type": "Data",
        "level": 1
       },
       {
        "size": 32768,
        "type": "Instruction",
        "level": 1
       },
       {
        "size": 4194304,
        "type": "Unified",
        "level": 2
       }
      ]
     }
    ],
    "caches": null
   }
  ],
  "cloud_provider": "Unknown",
  "instance_type": "Unknown",
  "instance_id": "None"
 }

errordeveloper · 2017-01-11T14:40:45Z

Another option is to try and match any non-loopback IP addresses.

Of the pods?

Yeah, Scope probe pods, which in host network, but anyway...

How?

I'm not sure why are you asking... Anyhow, I'd obtain a list of all IPs, which we probably already have, then calllientset.Core().Nodes().List(v1.ListOptions{}) and look for a node that has exactly the same set of addresses.
You might even be able to pass v1.ListOptions{Fields: "<selectorExpr>"}, but I'm not 100% sure if it will work, since the addresses are an array an I'm don't know if filed selectors work with arrays.

2opremio · 2017-01-11T14:58:06Z

Alright, after opening a terminal in the machine I realized that the uuid is gathered from /etc/machine-id (last branch in the cadvisor function). What I am going to do is vendor that cadvisor package and use that function instead.

lukemarsden · 2017-01-18T14:04:52Z

@2opremio could you summarize progress towards fixing this issue please? I've read the referenced issues and it seems somewhat complex, I'm not 100% clear on where you ended up. I'd really like to be able to start demoing the kubernetes integration with Katacoda sooner rather than later :)

rade · 2017-01-18T14:08:53Z

Should be fixed once #2132 gets merged.

2opremio · 2017-01-18T15:21:04Z

@lukemarsden Yes, it depends on #2132 which in turn depends on #2136 (working on this last bit right now)

errordeveloper added bug Broken end user or developer functionality; not working as the developers intended it k8s Pertains to integration with Kubernetes labels Nov 30, 2016

errordeveloper self-assigned this Nov 30, 2016

errordeveloper changed the title ~~kubernetes reports is broken on katacoda~~ kubernetes reporter is broken on katacoda Dec 1, 2016

2opremio added this to the December2016 milestone Dec 5, 2016

errordeveloper removed their assignment Dec 6, 2016

2opremio modified the milestones: December2016, EOY 2016 Dec 13, 2016

2opremio self-assigned this Jan 5, 2017

This was referenced Jan 11, 2017

k8s: Make node-name obtention more robust #2122

Closed

Obtain local pods from kubelet #2132

Merged

2opremio closed this as completed in #2132 Jan 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes reporter is broken on katacoda #2049

kubernetes reporter is broken on katacoda #2049

errordeveloper commented Nov 30, 2016

errordeveloper commented Nov 30, 2016

errordeveloper commented Nov 30, 2016

errordeveloper commented Dec 6, 2016

errordeveloper commented Dec 7, 2016 •

edited

Loading

2opremio commented Jan 10, 2017

2opremio commented Jan 10, 2017

2opremio commented Jan 10, 2017 •

edited

Loading

2opremio commented Jan 10, 2017

errordeveloper commented Jan 11, 2017 •

edited

Loading

errordeveloper commented Jan 11, 2017

2opremio commented Jan 11, 2017 •

edited

Loading

lukemarsden commented Jan 18, 2017 •

edited

Loading

rade commented Jan 18, 2017

2opremio commented Jan 18, 2017

kubernetes reporter is broken on katacoda #2049

kubernetes reporter is broken on katacoda #2049

Comments

errordeveloper commented Nov 30, 2016

errordeveloper commented Nov 30, 2016

errordeveloper commented Nov 30, 2016

errordeveloper commented Dec 6, 2016

errordeveloper commented Dec 7, 2016 • edited Loading

2opremio commented Jan 10, 2017

2opremio commented Jan 10, 2017

2opremio commented Jan 10, 2017 • edited Loading

2opremio commented Jan 10, 2017

errordeveloper commented Jan 11, 2017 • edited Loading

errordeveloper commented Jan 11, 2017

2opremio commented Jan 11, 2017 • edited Loading

lukemarsden commented Jan 18, 2017 • edited Loading

rade commented Jan 18, 2017

2opremio commented Jan 18, 2017

errordeveloper commented Dec 7, 2016 •

edited

Loading

2opremio commented Jan 10, 2017 •

edited

Loading

errordeveloper commented Jan 11, 2017 •

edited

Loading

2opremio commented Jan 11, 2017 •

edited

Loading

lukemarsden commented Jan 18, 2017 •

edited

Loading