-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP kubeadm join --master workflow #1707
KEP kubeadm join --master workflow #1707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for moving this (at least as a first step)
/CC @krousey |
|
||
* This proposal doesn't include a solution for API server load balancing. | ||
|
||
NB. Nothing in this proposal should prevent the user in choosing its own preferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/the user in choosing its/users from choosing their/
new `kubeadm join --master` option. | ||
|
||
As a consequence, kubeadm will provide a best-practice, “fast path” for creating a | ||
minimum viable, conformant Kubernetes clusters with one or more master nodes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/clusters/cluster/
Support for high availability is one of the most requested features for kubeadm. | ||
|
||
Even if, as of today, there is already the possibility to create an HA cluster | ||
using kubeadm in combination with some scripts and/or automation tools, this KEP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the HA docs -> https://kubernetes.io/docs/setup/independent/high-availability/
kubelet configuration, self-hosting, component config, control plane checkpointing | ||
etc. | ||
|
||
*> When `kubeadm join --master` should rely entirely on dependencies/features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems off here. You seem to lack a bullet point.
|
||
* The cluster was initialized with `kubeadm init`. | ||
* The cluster was initialized with `--feature-gates=HighAvailability=true`. | ||
* The cluster uses and external etcd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and/an
- does this mean I won't be able to use this on an existing cluster?
plane components due to the fact that Kubernetes itself will take care of deploying | ||
corresponding pods on nodes. | ||
|
||
Unfortunately, at the time of writing it is not know when this feature will graduate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/not know/unknown/
corresponding pods on nodes. | ||
|
||
Unfortunately, at the time of writing it is not know when this feature will graduate | ||
to beta/GA or when this feature will became the new kubeadm default; as a consequence, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/became/become/
`--feature-gates=SelfHosting=true,StoreCertsInSecrets=true` | ||
|
||
"Storing cluster certificates in secrets" is a solution that we expect - *in the | ||
long term* - will became mainstream, because it simplify certificates distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/became/become/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/simplify/simplifies/
and also certificate rotation, due to the fact that Kubernetes itself will take | ||
care of distributing certs on nodes. | ||
|
||
Unfortunately, at the time of writing it is not know when this feature will graduate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/not know/unknown/
care of distributing certs on nodes. | ||
|
||
Unfortunately, at the time of writing it is not know when this feature will graduate | ||
to beta/GA or when this feautre will became the new kubeadm default; as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/became/become/
9435ea7
to
7c55517
Compare
|
||
* Provide support for a dynamic bootstrap flow | ||
|
||
At the time a user is running `kubeadm init`, he might not know what the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/he/an operator
Higher-level tools could create nodes in parallel (both masters and workers) | ||
for reducing the overall cluster startup time. | ||
`kubeadm join --master` should support natively this practice without requiring | ||
the implementation of any synchronization mechanics by higher-level tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided there has been an initial seeded master.
|
||
* In-place vs. Replace reconciliation strategies | ||
|
||
Higher-level automation tools could decide for any reason to apply changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to support in-place transition from node->master? I'm kind of confused by what you're saying here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-phrased. Let me know if now it is more clear.
|
||
The solution for `kubeadm join --master` will rely on a set of dependencies/other | ||
features which are still in the process for graduating to GA like e.g. dynamic | ||
kubelet configuration, self-hosting, component config, control plane checkpointing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most standard HA setups, so long as dynamic kubelet configuration, self-hosting, component config exist I think we're good. Whole DC recovery from checkpointing imo is a non-goal, as there are many ways to handle that scenario which are probably safer then turning on a switch.
still under graduation vs provide compatibility with older/less convenient but | ||
more consolidated approaches?* | ||
|
||
* *> Should we support `kubeadm join --master` for cluster with a control plane |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to only support self hosted, b/c this limits the supported configs, but I would like to know what others think and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning of HA discussion I was 100% with you. However I moved to a more cautious position for the following reasons:
- Demand from the community to go GA with kubeadm asap (and afaik with static pods)
- Lack of convergence on cert-in-secrets, which limits the benefits of a self-hosted solution
As a consequence, I'm worried that the transition period to self-hosting will be longer than expected...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, then we need to be certain we test the heck out of it.
* *> Should we support `kubeadm join --master` for cluster with a control plane | ||
deployed as static pods? What about clusters with a self-hosted | ||
control plane?* | ||
* *> Should we support `kubeadm join --master` only for cluster storing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to gate this feature until we settle on the cert issue long term.
@Kargakis thanks for the nits
The proposal currently goes for the safe/simple path, that is to allow I will be more than happy relax this constraint if we can do this without introducing too many/too much complexity for this first release. Any suggestion? Have you in mind specific use cases? Another option we can consider is to add a |
7c55517
to
9242f7c
Compare
|
||
We are extending the kubeadm distinctive `init` and `join` workflow, introducing the | ||
capability to add more than one master node to an existing cluster by means of the | ||
new `kubeadm join --master` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that for historic reasons this is what has been decided, but I believe that for the purpose of the KEP, it's worth discussing an alternative option where:
- TLS assets (and potentially all config files, like static pods) are pre-provisioned, so that the bootstrap process is very deterministic
- all master nodes run exactly the same command
Which I would see being done like this:
- a set of TLS assets is created and packaged in a tarball or encoded otherwise
- user would have to have a list of static IPs and/or DNS names for master nodes
- the assets would be uploaded to some secure storage provider or copied onto each of the machines
- on each master user runs
kubeadm init [flags] <assetsConfigPathOrURL>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm suggesting is a step towards an automated implementation of what's currently documented, that doesn't involve manual creation of certificates and could be based around MasterConfiguration
as input and tarball of /etc/kubernetes
as output. I believe this would be a simpler option to start with and far more useful to many folks (including our team).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that such kind of discussion should be extended to a wider audience in the next kubeadm office hours meeting.
I anticipate some comment here:
- I think that nothing in this proposal prevents from pre-provision TLS assets before; however, for simplicity, I prefer to delegate this task to the initial master node
- I like the idea of kubeadm accessing some paths/URL during the join phase for retrieving certs (as alternative to the copy), even if I expect that some kind of security should be defined around this.
- While most of the documented approach today execute
kubeadm init
on all masters, I strongly believe we should deprecate - and possibly block - this at least for two reasons:- The init sequence executes a lot of steps with are un-necessary on a secondary master; now those steps are mostly indepondent, so basically now no warm is done by executing them two or three times (some gotchas already known, but nothing blocking so far), but I think this is not sustainable in the long term
- the init sequence doesn't go through the TLS boostrap process, and this expose the cluster to serious security concerns. IMO any additional master should be forced to go through the TLS bootstrap process, ideally with a dedicated token (different than the token used for joining nodes)
- I think that embedding in TLS assets static IPs and/or DNS names for master nodes is a strategy that prevents/make really complex any future changes to the master nodes (see alternative approach described in this KEP)
|
||
Even if, as of today, there is already the possibility to create an HA cluster | ||
using kubeadm in combination with some scripts and/or automation tools (e.g. | ||
[this](https://goo.gl/yStTLs)), this KEP was designed with the objective to introduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of shortened URL here? An explicit URL to docs would be much more obvious without having to follow it, and if docs move – it's easier to grep out old URLs.
Even if, as of today, there is already the possibility to create an HA cluster | ||
using kubeadm in combination with some scripts and/or automation tools (e.g. | ||
[this](https://goo.gl/yStTLs)), this KEP was designed with the objective to introduce | ||
an "up-stream" simple and reliable solution for achieving the same goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually one word – upstream, and I don't think we need to double quote it.
* Provide support for a dynamic bootstrap flow | ||
|
||
At the time a user is running `kubeadm init`, s/he/an operator might not know what | ||
the cluster setup will look like eventually. For instance, the user may start with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this has to be the case, it seems like more of a nice-to-have, in contrast to providing simple path of getting an HA cluster with reliable and deterministic bootstrap process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO if we focus too much on a deterministic bootstrap process we are implicitly failing in addressing many other use cases and most importantly we are failing to address what can happen during the cluster lifecycle e.g. a master node should be replaced
As a consequence, I consider the dynamic workflow bootstrap flow an important part of the value that kubeadm can bring to HA in Kubernestes; I propose to discuss this as well in the next kubeadm office hours meeting
9242f7c
to
db51ff9
Compare
I'd like to aim to merge this for inclusion in 1.10 at the same time update the existing HA document. So there is a documented "hard way" and a path forwards on a easier way. |
* Replace reconciliation strategies | ||
|
||
Especially in case of cloud deployments, higher-level automation tools could | ||
decide for any reason to replace an existing nodes with new ones (instead of apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete "an" --> `replace existing nodes..."
|
||
* This proposal doesn't include a solution for API server load balancing. | ||
|
||
NB. Nothing in this proposal should prevent users from choosing their preferred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB 4.
NB 3. Nothing in this proposal should prevent kubeadm to implement in future a | ||
solution for provisioning an etcd cluster based on static pods/pods. | ||
|
||
* This proposal doesn't include a solution for API server load balancing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems out of place with the "list of notes" structure here. Would it fit better up near line 124 where it's mentioned that this proposal doesn't include a solution for etcd? Or maybe as part of NB 4
?
* Upgradability | ||
|
||
* How to setup an high available cluster in order to simplify the execution | ||
of cluster updates, both manually or with the support of `kubeadm update`?_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster upgrades
(just to be consistent) and kubeadm upgrade
(because update
is not a command)
|
||
2. Check if the cluster is ready for joining a new master node: | ||
|
||
a. check if the cluster was created with `--feature-gates=HighAvailability=true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize Check
for consistency
for reducing the overall cluster startup time. | ||
`kubeadm join --master` should support natively this practice without requiring | ||
the implementation of any synchronization mechanics by higher-level tools | ||
(provided there has been an initial seeded master). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this caveat in parentheses. Is this stating that an initial seed master must exist before another master can join using kubeadm join --master
? Maybe just the phrasing is throwing me off. I don't think this is necessary, as explained further down in this proposal.
db51ff9
to
1eea943
Compare
@errordeveloper I have updated the proposal to keep track of your comments (see e.g alternative approaches) Nb. In kubernetes/kubeadm#726 there is a proposal about how to overcome some limitations of the current proposal; however I would like to get this first release merged before starting further iterations ... |
|
||
> We assume that all the necessary conditions where already checked | ||
during `kubeadm init`: | ||
> * The cluster uses and external etcd. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/and/an/
|
||
3. Executes the TLS bootstrap process, including [No changes to this step]: | ||
|
||
1. Start kubelet the bootstrap token as identity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incomplete sentence
|
||
## Drawbacks | ||
|
||
This proposal provide support for a single, well defined HA scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/provide/provides/
1) Execute `kubeadm init` on many nodes | ||
|
||
The approach based on execution of `kubeadm init` on each master was considered as well, | ||
but not considered because this approach seems to have several draw backs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/but not considered because this approach/but not chosen because it/
and this can lead to unpredictable inconsistent configurations. | ||
* The init sequence for secondary master won't go through the TLS boostrap process, | ||
and this can be perceived security concern. | ||
* The init sequence executes a lot of steps with are un-necessary on a secondary master; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/with/which/
* The init sequence for secondary master won't go through the TLS boostrap process, | ||
and this can be perceived security concern. | ||
* The init sequence executes a lot of steps with are un-necessary on a secondary master; | ||
now those steps are mostly indepondent, so basically now no warm is done by executing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/indepondent/idempotent/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/warm/harm/
this can be better explained by looking at two implementation based on this option: | ||
|
||
* [kubernetes the hard way](https://github.com/kelseyhightower/kubernetes-the-hard-way) | ||
uses the IP address of all master nodes for creating a new the API server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/a new the API server/a new API server/
* [Creating HA cluster with kubeadm](https://kubernetes.io/docs/setup/independent/high-availability/) | ||
uses a different API server serving certificate for each master, and this | ||
could increases the complexity of the first implementation because: | ||
* the `kubeadm join --master` flow have to generate different certificates for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/have/has/
each master node. | ||
* self-hosting control plane, should be adapted to mount different certificates | ||
for each master. | ||
* bootstrap check pointing should checkpoint be designed to checkpoin a different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/bootstrap check pointing should checkpoint be designed to checkpoin/bootstrap check pointing should be designed to checkpoint/
for each master. | ||
* bootstrap check pointing should checkpoint be designed to checkpoin a different | ||
set of certificates for each master. | ||
* upgrades should be adatped to consider master specific settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/adatped/adapted/
Looks good overall! |
@fabriziopandini - lgtm just address some of the other comments. /cc @detiber |
1eea943
to
a467a79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good pending the outstanding comments.
@jbeda So what are we going to call it if not master? I don't have an easy moniker at the ready. |
@timothysc @jbeda How about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
To unblock this work and we can iterate.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: timothysc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…-master KEP kubeadm join --master workflow
Adapted from the kubeadm HA design document:
https://goo.gl/QpD5h8
It still needs some work, but it makes sense to get the first iteration checked in to start moving it through the approval process (and to deprecate the google doc).
This KEP matches what is implemented in this PR:
kubernetes/kubernetes#58261
/CC @timothysc @roberthbailey @luxas @errordeveloper @jamiehannaford