-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s: Make node-name obtention more robust #2122
Conversation
@errordeveloper Mind testing if the fix works in katacoda? |
396f276
to
ffe7381
Compare
We know from weaveworks/weave#2427 that the product_uuid, which is the first thing |
Great point
TLDR: In the case of Scope, badly We use the uuid to obtain the current node name and, in turn, to filter out pods not scheduled in the current machine. If the uuids are not unique then the node-name obtention will likely be wrong and/or duplicated which may cause us to miss pods. e.g:
I need to think about how to solve this. In the worse case scenario we can report all pods from all probes (which is what we do on ECS) if the performance impact is not too high. |
It seems the nodes have a http://stackoverflow.com/a/35022237/1914440 It seems it was called |
* Base node-name obtention on the hostname instead of the system uuid (which can be hard to find and may not be unique per machine). * Fallback to report all pods (printing a warning) if the node-name cannot be obtained.
ffe7381
to
de224e5
Compare
Done. @paulbellamy PTAL |
I don't have any concrete concerns, hostname just seems less unique than a uuid, but apparently some hosting services scupper that anyway. As a side note, I thought checkpoint wrote some random data into a file on first launch to uniquely identify a host. but maybe that was a previous version But, checkpoint has to uniquely id a host in the whole planet, scope just needs to be unique within scope's view, so I don't think this is an issue. |
After @paulbellamy's comment I am considering to revert de224e5 @rade thoughts? |
I don't understand what problem this is actually trying to solve. Obtaining a unique id for every node in a cluster? That is precisely the problem we had to tackle in Weave Net. |
It seems there is more to than just unique IDs though. You write
which suggests that the ID information is somehow correlated with info provided by k8s. |
No, we are trying to identify the pods scheduled in the current node (to avoid reporting all pods from all nodes). For that we are using the pods' nodename. The problem is obtaining the nodename. For that, we are obtaining the metadata from all the k8s nodes and narrowing it down by checking against known metadata unique to the current node. So far we have tried the |
The question we are trying to answer here is "what pods are running on this node?" Correct? So the challenge here then is to construct some form of id from the information we can obtain from the host itself, that in turn can be matched to an id obtained from the meta data associated with a pod. Yes? |
Yes, see above. |
How does k8s come up with the nodename? Surely we should just do the same? |
Let me try to figure this out. |
OK, it doesn't seem easy. Apparently it depends on the cloud provider, and in order to obtain it you may need cloud-provider credentials :S See https://github.com/kubernetes/kubernetes/blob/2df5d4d980e8a6692b00f9ced2a23b8dd5f26bdf/pkg/kubelet/kubelet.go#L298-L313 and https://github.com/kubernetes/kubernetes/blob/ee49906c45171b1da354910bce4f4ed2095dd0d4/cmd/kubelet/app/server.go#L183-L206 |
Could we ask the kubelet on the node? |
Here's an idea. Instead of assuming the uuids are unique, we verify it in the node list. If they are, we use the uuid. If not, we report all the pods.
Unfortunately it doesn't provide the |
So what we are trying to do here cannot be done? |
But ... kubelet seems to expose |
Overridden by #2132 |
Align with kubernetes on how the system uuid is obtainedFixes #2049