Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP kubeadm join --master workflow #1707

Merged

Conversation

fabriziopandini
Copy link
Member

Adapted from the kubeadm HA design document:
https://goo.gl/QpD5h8

It still needs some work, but it makes sense to get the first iteration checked in to start moving it through the approval process (and to deprecate the google doc).

This KEP matches what is implemented in this PR:
kubernetes/kubernetes#58261

/CC @timothysc @roberthbailey @luxas @errordeveloper @jamiehannaford

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 30, 2018
@k8s-github-robot k8s-github-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jan 30, 2018
Copy link
Contributor

@jamiehannaford jamiehannaford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for moving this (at least as a first step)

@rsdcastro
Copy link

/CC @krousey

@k8s-ci-robot k8s-ci-robot requested a review from krousey January 31, 2018 17:22

* This proposal doesn't include a solution for API server load balancing.

NB. Nothing in this proposal should prevent the user in choosing its own preferred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the user in choosing its/users from choosing their/

new `kubeadm join --master` option.

As a consequence, kubeadm will provide a best-practice, “fast path” for creating a
minimum viable, conformant Kubernetes clusters with one or more master nodes and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/clusters/cluster/

Support for high availability is one of the most requested features for kubeadm.

Even if, as of today, there is already the possibility to create an HA cluster
using kubeadm in combination with some scripts and/or automation tools, this KEP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubelet configuration, self-hosting, component config, control plane checkpointing
etc.

*> When `kubeadm join --master` should rely entirely on dependencies/features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here. You seem to lack a bullet point.


* The cluster was initialized with `kubeadm init`.
* The cluster was initialized with `--feature-gates=HighAvailability=true`.
* The cluster uses and external etcd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and/an - does this mean I won't be able to use this on an existing cluster?

plane components due to the fact that Kubernetes itself will take care of deploying
corresponding pods on nodes.

Unfortunately, at the time of writing it is not know when this feature will graduate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/not know/unknown/

corresponding pods on nodes.

Unfortunately, at the time of writing it is not know when this feature will graduate
to beta/GA or when this feature will became the new kubeadm default; as a consequence,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/became/become/

`--feature-gates=SelfHosting=true,StoreCertsInSecrets=true`

"Storing cluster certificates in secrets" is a solution that we expect - *in the
long term* - will became mainstream, because it simplify certificates distribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/became/become/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/simplify/simplifies/

and also certificate rotation, due to the fact that Kubernetes itself will take
care of distributing certs on nodes.

Unfortunately, at the time of writing it is not know when this feature will graduate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/not know/unknown/

care of distributing certs on nodes.

Unfortunately, at the time of writing it is not know when this feature will graduate
to beta/GA or when this feautre will became the new kubeadm default; as a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/became/become/


* Provide support for a dynamic bootstrap flow

At the time a user is running `kubeadm init`, he might not know what the cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/he/an operator

Higher-level tools could create nodes in parallel (both masters and workers)
for reducing the overall cluster startup time.
`kubeadm join --master` should support natively this practice without requiring
the implementation of any synchronization mechanics by higher-level tools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provided there has been an initial seeded master.


* In-place vs. Replace reconciliation strategies

Higher-level automation tools could decide for any reason to apply changes to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to support in-place transition from node->master? I'm kind of confused by what you're saying here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-phrased. Let me know if now it is more clear.


The solution for `kubeadm join --master` will rely on a set of dependencies/other
features which are still in the process for graduating to GA like e.g. dynamic
kubelet configuration, self-hosting, component config, control plane checkpointing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most standard HA setups, so long as dynamic kubelet configuration, self-hosting, component config exist I think we're good. Whole DC recovery from checkpointing imo is a non-goal, as there are many ways to handle that scenario which are probably safer then turning on a switch.

still under graduation vs provide compatibility with older/less convenient but
more consolidated approaches?*

* *> Should we support `kubeadm join --master` for cluster with a control plane
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to only support self hosted, b/c this limits the supported configs, but I would like to know what others think and why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning of HA discussion I was 100% with you. However I moved to a more cautious position for the following reasons:

  • Demand from the community to go GA with kubeadm asap (and afaik with static pods)
  • Lack of convergence on cert-in-secrets, which limits the benefits of a self-hosted solution

As a consequence, I'm worried that the transition period to self-hosting will be longer than expected...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, then we need to be certain we test the heck out of it.

* *> Should we support `kubeadm join --master` for cluster with a control plane
deployed as static pods? What about clusters with a self-hosted
control plane?*
* *> Should we support `kubeadm join --master` only for cluster storing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to gate this feature until we settle on the cert issue long term.

@fabriziopandini
Copy link
Member Author

@Kargakis thanks for the nits

does this mean I won't be able to use this on an existing cluster? (comment on line 217)

The proposal currently goes for the safe/simple path, that is to allow kubeadm join --master only when kubeadm can rely on some assumptions about the current state of the cluster (as described in the user story).

I will be more than happy relax this constraint if we can do this without introducing too many/too much complexity for this first release. Any suggestion? Have you in mind specific use cases?

Another option we can consider is to add a --force flag...


We are extending the kubeadm distinctive `init` and `join` workflow, introducing the
capability to add more than one master node to an existing cluster by means of the
new `kubeadm join --master` option.
Copy link
Member

@errordeveloper errordeveloper Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that for historic reasons this is what has been decided, but I believe that for the purpose of the KEP, it's worth discussing an alternative option where:

  1. TLS assets (and potentially all config files, like static pods) are pre-provisioned, so that the bootstrap process is very deterministic
  2. all master nodes run exactly the same command

Which I would see being done like this:

  1. a set of TLS assets is created and packaged in a tarball or encoded otherwise
    • user would have to have a list of static IPs and/or DNS names for master nodes
    • the assets would be uploaded to some secure storage provider or copied onto each of the machines
  2. on each master user runs kubeadm init [flags] <assetsConfigPathOrURL>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm suggesting is a step towards an automated implementation of what's currently documented, that doesn't involve manual creation of certificates and could be based around MasterConfiguration as input and tarball of /etc/kubernetes as output. I believe this would be a simpler option to start with and far more useful to many folks (including our team).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that such kind of discussion should be extended to a wider audience in the next kubeadm office hours meeting.

I anticipate some comment here:

  • I think that nothing in this proposal prevents from pre-provision TLS assets before; however, for simplicity, I prefer to delegate this task to the initial master node
  • I like the idea of kubeadm accessing some paths/URL during the join phase for retrieving certs (as alternative to the copy), even if I expect that some kind of security should be defined around this.
  • While most of the documented approach today execute kubeadm init on all masters, I strongly believe we should deprecate - and possibly block - this at least for two reasons:
    • The init sequence executes a lot of steps with are un-necessary on a secondary master; now those steps are mostly indepondent, so basically now no warm is done by executing them two or three times (some gotchas already known, but nothing blocking so far), but I think this is not sustainable in the long term
    • the init sequence doesn't go through the TLS boostrap process, and this expose the cluster to serious security concerns. IMO any additional master should be forced to go through the TLS bootstrap process, ideally with a dedicated token (different than the token used for joining nodes)
  • I think that embedding in TLS assets static IPs and/or DNS names for master nodes is a strategy that prevents/make really complex any future changes to the master nodes (see alternative approach described in this KEP)


Even if, as of today, there is already the possibility to create an HA cluster
using kubeadm in combination with some scripts and/or automation tools (e.g.
[this](https://goo.gl/yStTLs)), this KEP was designed with the objective to introduce
Copy link
Member

@errordeveloper errordeveloper Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of shortened URL here? An explicit URL to docs would be much more obvious without having to follow it, and if docs move – it's easier to grep out old URLs.

Even if, as of today, there is already the possibility to create an HA cluster
using kubeadm in combination with some scripts and/or automation tools (e.g.
[this](https://goo.gl/yStTLs)), this KEP was designed with the objective to introduce
an "up-stream" simple and reliable solution for achieving the same goal.
Copy link
Member

@errordeveloper errordeveloper Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually one word – upstream, and I don't think we need to double quote it.

* Provide support for a dynamic bootstrap flow

At the time a user is running `kubeadm init`, s/he/an operator might not know what
the cluster setup will look like eventually. For instance, the user may start with
Copy link
Member

@errordeveloper errordeveloper Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this has to be the case, it seems like more of a nice-to-have, in contrast to providing simple path of getting an HA cluster with reliable and deterministic bootstrap process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if we focus too much on a deterministic bootstrap process we are implicitly failing in addressing many other use cases and most importantly we are failing to address what can happen during the cluster lifecycle e.g. a master node should be replaced
As a consequence, I consider the dynamic workflow bootstrap flow an important part of the value that kubeadm can bring to HA in Kubernestes; I propose to discuss this as well in the next kubeadm office hours meeting

@timothysc
Copy link
Member

I'd like to aim to merge this for inclusion in 1.10 at the same time update the existing HA document. So there is a documented "hard way" and a path forwards on a easier way.

* Replace reconciliation strategies

Especially in case of cloud deployments, higher-level automation tools could
decide for any reason to replace an existing nodes with new ones (instead of apply

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete "an" --> `replace existing nodes..."


* This proposal doesn't include a solution for API server load balancing.

NB. Nothing in this proposal should prevent users from choosing their preferred

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB 4.

NB 3. Nothing in this proposal should prevent kubeadm to implement in future a
solution for provisioning an etcd cluster based on static pods/pods.

* This proposal doesn't include a solution for API server load balancing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems out of place with the "list of notes" structure here. Would it fit better up near line 124 where it's mentioned that this proposal doesn't include a solution for etcd? Or maybe as part of NB 4?

* Upgradability

* How to setup an high available cluster in order to simplify the execution
of cluster updates, both manually or with the support of `kubeadm update`?_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster upgrades (just to be consistent) and kubeadm upgrade (because update is not a command)


2. Check if the cluster is ready for joining a new master node:

a. check if the cluster was created with `--feature-gates=HighAvailability=true`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Check for consistency

for reducing the overall cluster startup time.
`kubeadm join --master` should support natively this practice without requiring
the implementation of any synchronization mechanics by higher-level tools
(provided there has been an initial seeded master).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this caveat in parentheses. Is this stating that an initial seed master must exist before another master can join using kubeadm join --master? Maybe just the phrasing is throwing me off. I don't think this is necessary, as explained further down in this proposal.

@fabriziopandini
Copy link
Member Author

@errordeveloper I have updated the proposal to keep track of your comments (see e.g alternative approaches)
@mattkelly thanks!
@timothysc all the feedback received so far are addressed; IMO this KEP is ready for now final review

Nb. In kubernetes/kubeadm#726 there is a proposal about how to overcome some limitations of the current proposal; however I would like to get this first release merged before starting further iterations ...


> We assume that all the necessary conditions where already checked
during `kubeadm init`:
> * The cluster uses and external etcd.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and/an/


3. Executes the TLS bootstrap process, including [No changes to this step]:

1. Start kubelet the bootstrap token as identity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete sentence


## Drawbacks

This proposal provide support for a single, well defined HA scenario.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/provide/provides/

1) Execute `kubeadm init` on many nodes

The approach based on execution of `kubeadm init` on each master was considered as well,
but not considered because this approach seems to have several draw backs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/but not considered because this approach/but not chosen because it/

and this can lead to unpredictable inconsistent configurations.
* The init sequence for secondary master won't go through the TLS boostrap process,
and this can be perceived security concern.
* The init sequence executes a lot of steps with are un-necessary on a secondary master;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/with/which/

* The init sequence for secondary master won't go through the TLS boostrap process,
and this can be perceived security concern.
* The init sequence executes a lot of steps with are un-necessary on a secondary master;
now those steps are mostly indepondent, so basically now no warm is done by executing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/indepondent/idempotent/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/warm/harm/

this can be better explained by looking at two implementation based on this option:

* [kubernetes the hard way](https://github.com/kelseyhightower/kubernetes-the-hard-way)
uses the IP address of all master nodes for creating a new the API server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/a new the API server/a new API server/

* [Creating HA cluster with kubeadm](https://kubernetes.io/docs/setup/independent/high-availability/)
uses a different API server serving certificate for each master, and this
could increases the complexity of the first implementation because:
* the `kubeadm join --master` flow have to generate different certificates for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/have/has/

each master node.
* self-hosting control plane, should be adapted to mount different certificates
for each master.
* bootstrap check pointing should checkpoint be designed to checkpoin a different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bootstrap check pointing should checkpoint be designed to checkpoin/bootstrap check pointing should be designed to checkpoint/

for each master.
* bootstrap check pointing should checkpoint be designed to checkpoin a different
set of certificates for each master.
* upgrades should be adatped to consider master specific settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/adatped/adapted/

@0xmichalis
Copy link
Contributor

Looks good overall!

@timothysc
Copy link
Member

@fabriziopandini - lgtm just address some of the other comments.

/cc @detiber

@k8s-ci-robot k8s-ci-robot requested a review from detiber March 14, 2018 15:17
Copy link
Member

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good pending the outstanding comments.

@timothysc
Copy link
Member

@jbeda So what are we going to call it if not master? I don't have an easy moniker at the ready.

@jamiehannaford
Copy link
Contributor

@timothysc @jbeda How about leader? or control-plane?

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

To unblock this work and we can iterate.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 2, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2018
@k8s-ci-robot k8s-ci-robot merged commit fd283d1 into kubernetes:master Apr 2, 2018
@fabriziopandini fabriziopandini deleted the kubeadm-join--master branch April 14, 2018 08:54
calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018
…-master

KEP kubeadm join --master workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants