-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: doc: Begin a document on adding a new OpenShift platform #1112
Conversation
8823456
to
c76b440
Compare
c76b440
to
a556da8
Compare
|
||
### Enable core platform | ||
|
||
1. **Boot** - Ensure RH CoreOS boots on the desired platform, that Ignition works, and that you have VM / machine images to test with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also note here that for new cloud platforms, Ignition may need support upstream. For example here's a PR for a non-top-tier cloud: coreos/ignition#667
docs/dev/adding-new-platform.md
Outdated
|
||
To boot RHCoS to a new platform, you must: | ||
|
||
1. Ensure ignition supports that platform via an OEM ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see you cover this here. I'd point to coreos/fedora-coreos-tracker#95
and actually in an ideal world patches land in FCOS first and we later backport them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to reference ignition directly for now so as to make it clear what the priority ordering is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link is what I was alluding to
5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown | ||
6. **Enable Platform** Ensure all operators treat your platform as a no-op | ||
7. **CI Job** Add a new CI job to the installer that uses the credentials above to run the installer against the platform and correctly tear down resources | ||
8. **Publish Images** Ensure RH CoreOS images on the platform are being published to a location CI can test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should publish images be before #7?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not actually required to get the PR up, which is just why I ordered it (you can publish one yourself into the CI infra)
5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls | ||
6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>` | ||
7. Has an internal DNS CNAME pointing to each master called `etcd-N.<BASE_DOMAIN>` that | ||
8. Has an optional internal load balancer that TCP load balances all master nodes, with a DNS name `internal-api.<BASE_DOMAIN>` pointing to the load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the DNS name optional too (or just the load balancer)? Would the external DNS need the internal-api name registered for use by the cluster without internal DNS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DNS isn't optional for cert signing, but I guess you could technically sign your IP.
2. **Arch** - Identify the correct opinionated configuration for a desired platform supporting the default features. | ||
3. **CI** - Identify credentials and setup for a CI environment, ensure those credentials exist and can be used in the CI enviroment | ||
4. **Name** - Identify and get approved the correct naming for adding a new platform to the core API objects (specifically the [infrastructure config](https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go) and the installer config (https://github.com/openshift/installer/blob/master/pkg/types/aws/doc.go)) so that we are consistent | ||
5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we chronicle those out in a separate doc or section? Below, we identify the DNS and load balancer requirements ( L48-L76). We should be able to identify those, bucket and networking reqs for the current product and identify the IPI and UPI behaviors/expectations of those components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should be in Enable Provisioning.
1a8030e
to
ef85962
Compare
3. Have low latency interconnections connections (<5ms RTT) and persistent disks that survive reboot and are provisoned for at least 300 IOPS | ||
4. Have cloud or infrastructure firewall rules that at minimum allow the standard ports to be opened (see AWS provider) | ||
5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls | ||
6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<CLUSTER_NAME>-api.<BASE_DOMAIN)>
4. Have cloud or infrastructure firewall rules that at minimum allow the standard ports to be opened (see AWS provider) | ||
5. Do *not* have automatic cloud provider permissions to perform infrastructure API calls | ||
6. Have a domain name pointing to the load balancer IP(s) that is `api.<BASE_DOMAIN>` | ||
7. Has an internal DNS CNAME pointing to each master called `etcd-N.<BASE_DOMAIN>` that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<CLUSTER_NAME>-etcd-N.<BASE_DOMAIN>
This covers the minimal steps and process to go from "nothing" to "OpenShift is fully capable of running on your platform". Heavily work in progress, but should capture the why, our support levels, and our target config, as well as mechanical steps to get down the line.
ef85962
to
974d6cc
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3. **CI** - Identify credentials and setup for a CI environment, ensure those credentials exist and can be used in the CI enviroment | ||
4. **Name** - Identify and get approved the correct naming for adding a new platform to the core API objects (specifically the [infrastructure config](https://github.com/openshift/api/blob/master/config/v1/types_infrastructure.go) and the installer config (https://github.com/openshift/installer/blob/master/pkg/types/aws/doc.go)) so that we are consistent | ||
5. **Enable Provisioning** Add a hidden installer option to this repo for the desired platform as a PR and implement the minimal features for bootstrap as well as a reliable teardown | ||
6. **Enable Platform** Ensure all operators treat your platform as a no-op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a general policy for "operators treat unrecognized platforms as if they were none", then this step would not be required when adding a new platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you have that policy down here. I think you can drop this list entry, and we can file bugs with any operators that are currently non-compliant.
Once the platform can be launched and tested, system features must be implemented. The sections below are roughly independent: | ||
|
||
* General requirements: | ||
* Replace the installer terraform destroy with one that doesn't rely on terraform state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "terraform" -> "Terraform".
And maybe mention that this is because, once cluster components can create additional resources on the target platform, we'll still need to clean them up, and Terraform won't know about them.
1. Runs RH CoreOS | ||
2. Is reachable by control plane nodes over the network | ||
3. Is part of the control plane load balancer until it is removed | ||
4. Can reach a network endpoint that hosts the bootstrap ignition file securely, or has the bootstrap ignition injected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "ignition" -> "Ignition" here and elsewhere in this doc.
The following clarifications to configurations are noted: | ||
|
||
1. The control plane load balancer does not need to be exposed to the public internet, but the DNS entry must be visible from the location the installer is run. | ||
2. Master nodes are not required to expose external IPs for SSH access, but can instead allow SSH from a bastion inside a protected network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop "Master" and the following list entry? This applies equally to master and compute nodes; I don't see an upside to splitting over two entries.
|
||
Red Hat CoreOS uses ignition to receive initial configuration from a remote source. Ignition has platform specific behavior to read that configuration that is determined by the `oemID` embedded in the VM image. | ||
|
||
To boot RHCoS to a new platform, you must: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "RHCoS" -> "RHCOS", here and elsewhere in this doc? I think the acronym is [R]ed [H]at [C]ore [OS], not, [R]ed [H]at [Co]re O[S].
Continuous Integration | ||
---------------------- | ||
|
||
To enable a new platform, require a core continuous integration testing loop that verifies that new changes do not regress our support for the platform. The minimum steps required are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "require" -> "we require", or similar.
|
||
To enable a new platform, require a core continuous integration testing loop that verifies that new changes do not regress our support for the platform. The minimum steps required are: | ||
|
||
1. Have an infrastructure that can receive API calls from the OpenShift CI system to provision/destroy instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"instances" -> "infrastructure".
|
||
1. Add a new hidden provisioner | ||
2. Define the minimal platform parameters that the provisioner must support | ||
3. Use Terraform or direct Go code to provision that platform via the credentials provided to the installer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"provision" -> "provision and destroy"? One benefit of Terraform is that it makes centralized bootstrap teardown fairly straightforward, although you could certainly switch on the platform to invoke platform-specific Go bootstrap-teardown code. And we need to destroy resources for destroy cluster
to keep the account from filling with cruft, although that doesn't need to be as specific as bootstrap teardown.
2. Define the minimal platform parameters that the provisioner must support | ||
3. Use Terraform or direct Go code to provision that platform via the credentials provided to the installer. | ||
|
||
A minimal provisioner must be able to launch the control plane and bootstrap node via an API call and accept any "environmental" settings like network or region as inputs. The installer should use the Route53 DNS provisioning code to set up round robin to the bootstrap and control plane nodes if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Route 53 reference intentional? For example, libvirt uses its own DNS configuration for RRDNS, and doesn't involve Route 53.
/retest |
|
||
1. The control plane nodes: | ||
1. Run RH CoreOS, allowing in-place updates | ||
2. Are fronted by a load balancer that allows raw TCP connections to port 6443 and exposes port 443 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be IaaS neutral, wouldn't it be possible to use Keepalive (within Kube since RHCOS is immutable)? It could be used either as LB or failover handler. Not using AWS doesn't automatically means having a hardware LB in front of a cluster.
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Closing due to this being open for a long time, Please feel free to reopen /close |
@abhinavdahiya: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This covers the minimal steps and process to go from "nothing" to
"OpenShift is fully capable of running on your platform". Heavily
work in progress, but should capture the why, our support levels,
and our target config, as well as mechanical steps to get down the
line.