-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner separation of kubeadm and machine bootstrapping #5294
Comments
/assign @killianmuldoon |
One excellent suggestion from the lengthy discussion in that proposal was that we should https://github.com/mozilla/sops as the encryption envelope for private key material. |
/cc @t-lo |
Just wanted to clarify what we are talking about here, is the purpose of this issue to define an interface for various providers to implement (for different OSs?) or to define a tool that will be implemented? Reading through some of this it seemed as if this is talking about building some bootstrap binary that would allow configuration of various OSs, but I initially had assumed this would define an interface |
I think it would be both: An interface with a default implementation. |
/milestone v1.0 |
/kind proposal |
I've been reviewing some PRs in CAPA, namely kubernetes-sigs/cluster-api-provider-aws#2854 and it looks like EKS has the same challenges for some areas, e.g. User Story 10, and the way it's being tackled there is to add shell scripts to the EKS image builder equivalent and then add an API in CAPA. I wonder if we should consider this as a new project and get some folk together. |
It does have some of the same challenges for sure. And yes it would be good to get some people together to start discussions on this. |
@randomvariable can we include a story/req here to satisfy existing ability for users to plugin their own bootstrapping mechanism? This can be achieved today two different ways: |
Have added as U14 and R17 respectively. Have also captured the comment from #4172 around payload size in U13 / R16. |
Another use case in #3782 |
/area bootstrap |
I'm adding a bunch of comments/suggestions about the existing state of the document. Hope it's not too much 😬 |
Note to self and to anyone else involved: We should keep #6539 in mind in case we touch k8s object references as part of the proposed design. |
Hi @johanan, we are currently actively investigating the approaches based on the original CAEP and the doc @richardcase mentioned. Can you share the current state of the design, so the document will reflect it better? |
Hi @Danil-Grigorev. Glad to see more people are getting involved 🙂 My current impression is that while the original proposal touches some important aspects of this issue, the most important concern described by @randomvariable above -- the separation of bootstrappers such as kubeadm from the provisioning tools (cloud-init, Ignition etc.) -- isn't handled in it. In its current state, the proposal sounds like we're starting from the solution (machineadm) and working our way back to the requirements rather than the other way around. In addition, in my opinion the original proposal includes a lot of user stories, some of which don't seem directly related to the issue at hand (e.g. Active Directory domain joins) and might be better handled in separate proposals. So, I'm not saying we shouldn't pursue the machineadm direction or that all of the user stories aren't important, I just think that adding a new binary which runs on the nodes without first solving the conflation we have in the API isn't going to lead us to where we're aiming, at least not on its own. @richardcase what do you think about the above? Am I missing something? In the meantime I started working on a separate design proposal which specifically addresses the conflation of bootstrap (e.g. kubeadm) and provisioning (e.g. cloud-init) because this is arguably the main thing we need to solve and I couldn't find any work around that so far. I'm happy to join efforts if there is any existing/prior work around that. I'm still actively working on the proposal and it's by no means ready for review, but I'll share the WIP so that people can start to follow my train of thought and perhaps provide very early feedback. Here it is: https://docs.google.com/document/d/1Fz5vWwhWA-d25_QDqep0LWF6ae0DnTqd5-8k8N0vDDM/edit?usp=sharing |
Thanks @johananl .
This is actually one of the motivations of the original proposal by @randomvariable. So cloud-init/ignition are only used to transmit a "machine config" file to the machine and then machineadm takes action based on that file (which might be running kubeadm). I would agree that it covers more than the strict separation of concerns between the commands required to bootstrap a cluster and the means to get those commands executed. The idea of the doc was that the original proposal was the starting point and could be updated. With quite a few interested parties it feels like it would be good to start a feature group? Like whats been done around in-place upgrades and karpenter. wdyt? |
And forgot to say, thanks for sharing the doc @johananl 🙇 I'll take a read. |
@richardcase yes, I understand. This basically moves the bootstrap (e.g. kubeadm) and provisioning (e.g. cloud-init) process into the nodes, which isn't clear to me we want to do. Here are a few problems I can currently see with this approach:
While I agree that we want to improve separation of concerns and protect CAPI components (e.g. infra providers) from provisioner API changes (as stated in the original proposal's motivation section), it's not clear to me that we also have to move this logic out of k8s and into the node while we're at it: AFAICT we could solve the same problem by introducing a provisioner contract and isolating provisioning data from other data (bootstrap, infrastructure etc.) in the CAPI types. We can then rely on references in the API for loosely coupling together the relevant bootstrapper and provisioner implementations. Since we'd have contracts for both, any bootstrapper could be used with any provisioner (and also any infra provider since the infra provider would hopefully just receive a blob of text representing the provisioning config and expose it to machines using instance metadata while remaining agnostic about the specific format). Rather than explicitly supporting every bootstrapper-provisioner combination (which would likely become a serious maintenance burden given that we already have 6 bootstrap providers and at least 2 provisioners and would likely add more in the future), we could isolate the two concerns using contracts which would make them orthogonal -- which they technically are, though currently not in CAPI (more details in my WIP proposal). I'm not sure that's a goal in CAPI, but my intuition tells me we might want to move to the nodes only things which absolutely have to run on the nodes, and the rest should live on the management cluster. This way we can use things such as k8s object status fields to track the various stages of workload cluster operations. I hope that's clear. We certainly have to discuss this further since all of the above are my initial thoughts on the matter and it's very likely I am missing things.
Sure, we can do that. I'm not quite sure what a "feature group" means and how to form it though. Care to elaborate? |
Thanks for taking the time to explain @johananl, that's really helpful. I may not agree with all aspects of the problems listed but generally i agree. You have highlighted the importance of revisiting the original proposal with fresh eyes. I should say i'm not particularly attached to the original proposal, it was just a starting point.
I agree and the approach you are taking helps aid this. Some of the things that could require running on the node would fall into the day 2 operations area but as you say these can be separate and outside this scope. Another area that the approach you are suggesting (admittedly i only skim read so far so i may be off) is around the re-use of controlplane logic. Currently other control-plane providers (like k3s, rke2) have very similar requirements around scale up, scale down, upgrades etc and currently most providers copy what kubeadm is doing to a greater or lesser extend...this results in differences in functionality and potentially bugs etc. I see this approach helping with that.....its a pain i feel a lot. I'm looking forward to having a proper read. |
Thanks a lot @richardcase. Happy to learn where the disagreement points are (feel free to comment on my proposal if that helps). I do see parts of the original proposal which directly overlap with my current vision of how things could work. Also, I am not opposed to having an on-node agent if we realize this is necessary. Maybe it is and I trust you and @randomvariable had good reasons for thinking in that direction. My proposal is an initial thought process, too. I expect it to change quite a lot before it could have a chance at becoming something we implement. We may also realize that we need both proposals since each touches at least some different/unique aspects of the problems at hand. Looking forward to any feedback you might have on the new proposal. I on my end will keep working on it and will advertise it more loudly when I feel it's ready for a wider round of reviews. |
/priority backlog |
A quick update on this task: Got some good initial feedback on the WIP proposal. Thanks! |
Merging also requirements from #9631
|
Thanks @fabriziopandini, this seems highly relevant. To clarify, IIUC the requirement is: We shouldn't store plain-text secrets in the provisioning configuration accessible from cloud machines via user data or similar mechanisms. The separation of bootstrap and provisioning is already the main story in this WIP proposal so the "keep secrets out" part is the new addition. Correct? |
@johananl let's see if @AndiDog, the author of the other issue issue chimes in. Please, feel free to decide if to keep this in scope of your proposal or not (in case of not, just list it as a non goal or a future goal, so we keep track of this) |
Is there an updated link to the document where the CAEP is being written (if that's the case)? |
@vincepri the WIP is here: https://docs.google.com/document/d/1Fz5vWwhWA-d25_QDqep0LWF6ae0DnTqd5-8k8N0vDDM/edit?usp=sharing I've been less available to drive this lately due to internal priorities but I think there is already a lot to discuss in what I've written so far. |
User Story
As a cluster operator, with development teams requiring the use of multiple operating systems, I would like a better machine bootstrapping abstraction.
Detailed Description
Cluster API Bootstrap Provider Kubeadm currently conflates two activities:
The relationship between Cluster API and machine bootstrapping has created a number of challenges:
How to secure kubeadm node joins
How to secure control plane instantiation
How to extensibly support different bootstrappers without increasing the spaghettiness
Bootstrap reporting
It can be hard to find out what happened when bootstrapping failed. To be fair, the amount of requests for this has gone down over time due to improvements in CABPK and kubeadm, but it's still nice to have ideally.
Anything else you would like to add:
For completeness, and to avoid folk having to work through an unwieldy closed PR, I'm including the user stories and requirements in their entirety from #4221:
User Stories
For Windows or Linux hosts joining an Active Directory, they must effectively be given a set of bootstrap credentials to join the directory and persist a Kerberos keytab for the host.
Cloud providers often have limited sizes for bootstrap data (e.g. AWS/Azure and vSphere)
Requirements Specification
We define three modalities of the node bootstrapper:
/kind feature
An example of the current flow for AWS is here (courtesy of @PushkarJ )
![image](https://user-images.githubusercontent.com/1441070/136400384-a035be55-f80c-4795-bcfe-c512efce1176.png)
The text was updated successfully, but these errors were encountered: