Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installconfig: separate control plane and compute machine pools #1157

Merged
merged 2 commits into from
Feb 15, 2019
Merged

installconfig: separate control plane and compute machine pools #1157

merged 2 commits into from
Feb 15, 2019

Conversation

staebler
Copy link
Contributor

@staebler staebler commented Jan 30, 2019

There should always be a control plane machine pool, so the declaration is moved from a generic machines field to a specific controlPlane field. The machines field contains only compute machines, so it is renamed to compute.

This is the first step in replacing references to master and worker with references to control plane and compute throughout the installer.

@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 30, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2019
@@ -55,7 +57,7 @@ func (a *InstallConfig) Generate(parents asset.Parents) error {

a.Config = &types.InstallConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: "v1beta1",
APIVersion: "v1beta2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want a public constant for "the current version". For an example, see here and other changes in that PR.

@abhinavdahiya
Copy link
Contributor

regarding fabb911 I think converters must exist in pkg/types as part of the api.

}
for i, p := range c.Compute {
if p.Replicas == nil {
c.Compute[i].Replicas = &defaultReplicaCount
Copy link
Member

@wking wking Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to default to 1 (on libvirt) or 3 (elsewhere) for compute pools we don't recognize? I think the previous zero default may be more appropriate as long as at least one compute pool has a non-zero replicas. I'm also fine revisiting the "we can pick a default for you" approach from #1146 and just erroring out if folks leave replicas unset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the only compute machine pool accepted by the validator is "worker". My expectation in the future is that the installer will not care what the name of the compute machine pools are. The installer will not treat the "worker" or "compute" machine pool any different than any other compute machine pool. Do you have different expectations?

@@ -9,6 +9,8 @@ import (
// MachinePool is a pool of machines to be installed.
type MachinePool struct {
// Name is the name of the machine pool.
// For the control plane machine pool, the name will always be "master".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care what they name it? I'm fine leaving this up to the user.

Copy link
Contributor Author

@staebler staebler Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, we may not care what they name it. For now, (1) the installer does care, or (2) I don't care to rationalize about whether we really do care. As the rest of the changes from master and worker to control plane and compute play out, this restriction may be relaxed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a data point, we had this conversation in hive back when I was on that team. We could not come up with a compelling use case for the user to provide a custom name for the control plane machines.

@@ -9,6 +9,8 @@ import (
// MachinePool is a pool of machines to be installed.
type MachinePool struct {
// Name is the name of the machine pool.
// For the control plane machine pool, the name will always be "master".
// For the compute machine pools, the only valid name is "worker".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to lift this restriction. All we're doing for compute nodes is pushing MachineSets which reference an installer-created worker IAM instance profile, right? We can do that regardless of the name we set in the MachineSet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this restriction will be lifted. This is just the first step in a larger transition that will include that lifted restriction.

@@ -88,7 +88,7 @@ func (w *Worker) Generate(dependencies asset.Parents) error {
}

ic := installconfig.Config
pool := workerPool(ic.Machines)
pool := workerPool(ic.Compute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be:

for _, pool := range ic.Compute {

couldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, yes. But baby steps. And the validator only accepts a single compute machine pool named "worker" at the moment anyway.

@wking
Copy link
Member

wking commented Jan 30, 2019

I am super-excited about this :). CC @dgoodwin, @vrutkovs about the pending install-config schema change.

@dgoodwin
Copy link
Contributor

Awesome, and looks like it matches openshift/hive#164 nicely. CC @abutcher

@staebler
Copy link
Contributor Author

regarding fabb911 I think converters must exist in pkg/types as part of the api.

@abhinavdahiya Yes, I agree. I did not want to tackle doing long-term converters as part of this PR. I only wanted to put something in place temporarily so that current users such as hive and byo have a chance to switch over to v1beta2 without breaking. Once people have had a chance to make the switch, all the v1beta1 types and converted will be removed.

@wking
Copy link
Member

wking commented Jan 31, 2019

@dgoodwin, @vrutkovs, do you need short-term compat with v1beta1, or can you bump your generators when you bump installer versions?

@dgoodwin
Copy link
Contributor

I think we'll be ok, we'll adjust as soon as we vendor this in.

@vrutkovs
Copy link
Member

@wking our GCP test was scrapped, scaleup test uses AWS cluster, so any option is fine by me

@staebler
Copy link
Contributor Author

OK. If nobody needs the v1beta1 support, then I will remove it. Makes things easier.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 31, 2019
@staebler
Copy link
Contributor Author

staebler commented Feb 1, 2019

/retest

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2019
@staebler
Copy link
Contributor Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019
Replicas: func(x int64) *int64 { return &x }(3),
},
ControlPlane: &types.MachinePool{
Name: "master",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a desire to eliminate the term "master" generally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just asking as that is a broader change to push through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The broader change of replacing the terms master and worker with control-plane and compute are tackled in #1227.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2019
@flaper87
Copy link
Contributor

If you don't mind, I'd like to test this patch with OpenStack before it goes in. I'll wait for the next rebase to test it.

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 13, 2019
@staebler
Copy link
Contributor Author

If you don't mind, I'd like to test this patch with OpenStack before it goes in. I'll wait for the next rebase to test it.

A test on OpenStack sounds wonderful. The PR has been rebased.

// Default on AWS and OpenStack is 3 masters and 3 workers.
// Default on Libvirt is 1 master and 1 worker.
Machines []MachinePool `json:"machines,omitempty"`
ControlPlane *MachinePool `json:"controlPlane,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to bump InstallConfigVersion for this. Possibly also with a temporary hack to get through CI (e.g. like the one being removed by #1251).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure whether the version needed to be bumped with every change. Or whether the version change covered the collections of API changes between beta1 and beta2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cut 0.12.0 with v1beta2, so I think we need another bump. But I'm fine combining future changes that land between releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I will bump up to v1beta3.

@staebler
Copy link
Contributor Author

/hold cancel

The feature exception for this has been approved.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2019
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 14, 2019

@staebler: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 05cc74b5b542c91a77f78e0f3b09715a36b6c437 link /test e2e-openstack
ci/prow/e2e-libvirt 05cc74b5b542c91a77f78e0f3b09715a36b6c437 link /test e2e-libvirt

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

There should always be a control plane machine pool, so the declaration
is moved from a generic machines field to a specific controlPlane field.
The machines field contains only compute machines, so it is renamed to
compute.
The Machines field was replaced with ControlPlane and Compute fields,
so the API version of InstallConfig needs to be bumped.
// Default on AWS and OpenStack is 3 masters and 3 workers.
// Default on Libvirt is 1 master and 1 worker.
Machines []MachinePool `json:"machines,omitempty"`
ControlPlane *MachinePool `json:"controlPlane,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this omitempty? and nilable.

from the commit message There should always be a control plane machine pool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the installer can fill in the default.

Copy link
Contributor Author

@staebler staebler Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is omitempty and nilable because the user does not have to provide it. If it is omitted, then the installer will add default values to the InstallConfig used. The validator will ensure that the field is not nil in the final InstallConfig.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😕 sure

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 15, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,staebler]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

staebler added a commit to staebler/release that referenced this pull request Feb 15, 2019
… machines.

In the cases where the replica count is not customizable, the controlPlane and compute
have been removed to use the defaults for the machine pools.

Update the apiVersion for install-config.yaml to v1beta3, which is the version that
incorporates the controlPlane and compute fields.

See openshift/installer#1157
@openshift-merge-robot openshift-merge-robot merged commit e1030ce into openshift:master Feb 15, 2019
markmc added a commit to markmc/dev-scripts that referenced this pull request Feb 20, 2019
machines is now split into compute and controlPlane

See openshift/installer#1157
wking added a commit to wking/openshift-installer that referenced this pull request Feb 26, 2019
When we pivoted to v1beta3 in ccdc32e (installconfig: separate
control plane and compute machine pools, 2019-01-29, openshift#1157), we
dropped support for the old 'machines' JSON property.  Instead of
silently ignoring that property in v1beta2 configs, error out to avoid
surprising users later when they notice us not picking up their
machines configuration.

Or internal consumers pivoted to v1beta3 in openshift/release@e1d729c6
(Modify install-config.yaml to use controlPlane and compute instead of
machines, 2019-02-05, openshift/release#2787) and
openshift/hive@3eda6d12 (Bump to installer master branch, 2019-02-21,
openshift/hive#228).
montaguethomas added a commit to montaguethomas/openshift-installer that referenced this pull request Jul 13, 2021
After openshift#1157 there is no need to restrict what name is used for the
controlPlane Machines or compute MachineSets. By removing this
restriction, multiple compute MachineSets can be defined at install
time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants