-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modules/aws/bootstrap: Pull AWS bootstrap setup into a module #217
modules/aws/bootstrap: Pull AWS bootstrap setup into a module #217
Conversation
modules/aws/bootstrap/main.tf
Outdated
|
||
role = "${join("|", aws_iam_role.bootstrap.*.name)}" | ||
|
||
#"${var.iam_role == "" ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, no, I need to get that working again... Will hopefully have a fix up shortly ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed?
Fixed with 1efb239 -> f29ab92.
ok to test |
volume_tags = "${var.tags}" | ||
} | ||
|
||
resource "aws_elb_attachment" "bootstrap" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/installer/blob/master/modules/aws/master/main.tf#L128-L144
When we move this module to be part of the infra step; we will end up with the same problem as var.elbs
will not be countable at plan stage.
cc: @crawford
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will end up with the same problem as var.elbs will not be countable at plan stage.
This is a reference to hashicorp/terraform#12570, right? I expect we can work around that when we consolidate the stages, but am happy to adjust things here if there's something I can do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will end up with the same problem as
var.elbs
will not be countable at plan stage.This is a reference to hashicorp/terraform#12570, right?
I did indeed hit this, and pushed b8eec241 to #268, which seems like an only-moderately-hideous workaround ;).
|
||
tags = "${merge(map( | ||
"Name", "${var.tectonic_cluster_name}-bootstrap", | ||
"kubernetes.io/cluster/${var.tectonic_cluster_name}", "owned", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why drop this tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why drop this tag?
Because we're eventually going to tear down the bootstrap stuff as part of installation, so we aren't leaving it around for Kubernetes to own.
1efb239
to
f29ab92
Compare
#213 just landed, so I've rebased this and think it's good to go (unless more review suggestions come in ;). |
The e2e-aws error was:
Probably a flake. /retest Also run the smoke tests: retest this please |
/approve |
f29ab92
to
10f717b
Compare
Both e2e-aws and the smoke tests died with:
I've pushed f29ab92 -> 10f717b adding explicit brackets to hopefully fix that. The underlying issue may be Terraform occasionally forgetting type information. |
/retest |
retest this please |
An earlier e2e-aws run failed with:
Trying to reproduce locally, I spun up a cluster on Friday. On the bootstrap node: $ systemctl --failed | head -n2
UNIT LOAD ACTIVE SUB DESCRIPTION
● bootkube.service loaded failed failed Bootstrap a Kubernetes cluster
$ journalctl -u bootkube.service -n6
-- Logs begin at Fri 2018-09-07 20:26:48 UTC, end at Sat 2018-09-08 04:28:19 UTC. --
Sep 07 22:08:37 ip-10-0-10-162 bash[3486]: https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.5.134:2379: getsockopt: connection refused
Sep 07 22:08:37 ip-10-0-10-162 bash[3486]: Error: unhealthy cluster
Sep 07 22:08:38 ip-10-0-10-162 bash[3486]: etcdctl failed too many times.
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: Failed to start Bootstrap a Kubernetes cluster.
$ docker run --rm --env ETCDCTL_API=3 --volume /opt/tectonic/tls:/opt/tectonic/tls:ro,z quay.io/coreos/etcd:v3.2.14 etcdctl --cacert=/opt/tectonic/tls/etcd-client-ca.crt --cert=/opt/tectonic/tls/etcd-client.crt --key=/opt/tectonic/tls/etcd-client.key --endpoints=https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 endpoint health
https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.5.134:2379: getsockopt: connection refused
Error: unhealthy cluster
$ dig trking-87205-etcd-0.coreservices.team.coreos.systems +short
10.0.5.134
$ dig trking-87205-master-0.coreservices.team.coreos.systems +short Uh. Checking from my dev box: $ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-87205-master-0']].PublicIpAddress" --output text
18.215.14.161
$ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name']]" --output text | grep '^0\|Name\|IPADDRESS\|ASSOCIATION' | cut -b -80
0 x86_64 False True xen ami-00cc4337762ba4a52 i-00c28bd06d8eb77a1 t2.medium 201
PRIVATEIPADDRESSES True ip-10-0-133-231.ec2.internal 10.0.133.231
TAGS Name trking-87205-worker-0
0 x86_64 False True xen ami-00cc4337762ba4a52 i-072fdf9d1e0beaf3a t2.medium 201
PRIVATEIPADDRESSES True ip-10-0-158-83.ec2.internal 10.0.158.83
TAGS Name trking-87205-worker-1
0 x86_64 False True xen ami-00cc4337762ba4a52 i-01cd2e4e6ecaea69e t2.medium 201
ASSOCIATION amazon ec2-18-215-14-161.compute-1.amazonaws.com 18.215.14.161
PRIVATEIPADDRESSES True ip-10-0-5-134.ec2.internal 10.0.5.134
ASSOCIATION amazon ec2-18-215-14-161.compute-1.amazonaws.com 18.215.14.161
TAGS Name trking-87205-master-0
0 x86_64 False True xen ami-00cc4337762ba4a52 i-0011bb9aa98d56684 t2.medium 201
ASSOCIATION amazon ec2-34-205-252-140.compute-1.amazonaws.com 34.205.252.140
PRIVATEIPADDRESSES True ip-10-0-10-162.ec2.internal 10.0.10.162
ASSOCIATION amazon ec2-34-205-252-140.compute-1.amazonaws.com 34.205.252.140
TAGS Name trking-87205-bootstrap
0 x86_64 False True xen ami-00cc4337762ba4a52 i-01d196ea977f59b3b t2.medium 201
PRIVATEIPADDRESSES True ip-10-0-165-55.ec2.internal 10.0.165.55
TAGS Name trking-87205-worker-2 So indeed master-0's internal IP is 10.0.5.134, and its public IP is 18.215.14.161. I'm not sure why I can't resolve it via DNS from the bootstrap node. Anyhow, back to the bootstrap node: $ ssh -v [email protected]
OpenSSH_7.6p1, OpenSSL 1.0.2n 7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 18.215.14.161 [18.215.14.161] port 22.
debug1: connect to address 18.215.14.161 port 22: Connection refused
ssh: connect to host 18.215.14.161 port 22: Connection refused
$ ssh -v [email protected]
OpenSSH_7.6p1, OpenSSL 1.0.2n 7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 10.0.5.134 [10.0.5.134] port 22.
debug1: connect to address 10.0.5.134 port 22: Connection refused
ssh: connect to host 10.0.5.134 port 22: Connection refused And from my dev box: $ ssh -v [email protected]
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips 26 Jan 2017
debug1: Reading configuration data /home/trking/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: Connecting to 18.215.14.161 [18.215.14.161] port 22.
debug1: connect to address 18.215.14.161 port 22: Connection refused
ssh: connect to host 18.215.14.161 port 22: Connection refused What's up with this node? $ aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[].State.Name' --output text
running
$ aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[].SecurityGroups' --output text
sg-0061338459d264b41 terraform-20180907202140389700000002 Compare that with the bootstrap node: $ aws ec2 describe-instances --instance-id i-0011bb9aa98d56684 --query 'Reservations[].Instances[].SecurityGroups' --output text
sg-0061338459d264b41 terraform-20180907202140389700000002 Same values. Actually, let's just diff the states: $ BOOTSTRAP="$(aws ec2 describe-instances --instance-id i-0011bb9aa98d56684 --query 'Reservations[].Instances[]' --output json)"
$ MASTER="$(aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[]' --output json)"
$ diff -u <(echo "${BOOTSTRAP}") <(echo "${MASTER}")
--- /dev/fd/63 2018-09-07 21:59:42.091362945 -0700
+++ /dev/fd/62 2018-09-07 21:59:42.091362945 -0700
@@ -3,22 +3,22 @@
"Monitoring": {
"State": "disabled"
},
- "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com",
+ "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com",
"State": {
"Code": 16,
"Name": "running"
},
"EbsOptimized": false,
- "LaunchTime": "2018-09-07T20:25:50.000Z",
- "PublicIpAddress": "34.205.252.140",
- "PrivateIpAddress": "10.0.10.162",
+ "LaunchTime": "2018-09-07T20:22:56.000Z",
+ "PublicIpAddress": "18.215.14.161",
+ "PrivateIpAddress": "10.0.5.134",
"ProductCodes": [],
"VpcId": "vpc-0b6626eba63c20d20",
"StateTransitionReason": "",
- "InstanceId": "i-0011bb9aa98d56684",
+ "InstanceId": "i-01cd2e4e6ecaea69e",
"EnaSupport": true,
"ImageId": "ami-00cc4337762ba4a52",
- "PrivateDnsName": "ip-10-0-10-162.ec2.internal",
+ "PrivateDnsName": "ip-10-0-5-134.ec2.internal",
"SecurityGroups": [
{
"GroupName": "terraform-20180907202140389700000002",
@@ -31,30 +31,30 @@
"NetworkInterfaces": [
{
"Status": "in-use",
- "MacAddress": "02:f8:bf:18:7c:0a",
+ "MacAddress": "02:52:a7:4b:43:10",
"SourceDestCheck": true,
"VpcId": "vpc-0b6626eba63c20d20",
"Description": "",
- "NetworkInterfaceId": "eni-09d792347f4e050db",
+ "NetworkInterfaceId": "eni-0f5378c203dea3028",
"PrivateIpAddresses": [
{
- "PrivateDnsName": "ip-10-0-10-162.ec2.internal",
- "PrivateIpAddress": "10.0.10.162",
+ "PrivateDnsName": "ip-10-0-5-134.ec2.internal",
+ "PrivateIpAddress": "10.0.5.134",
"Primary": true,
"Association": {
- "PublicIp": "34.205.252.140",
- "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com",
+ "PublicIp": "18.215.14.161",
+ "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com",
"IpOwnerId": "amazon"
}
}
],
- "PrivateDnsName": "ip-10-0-10-162.ec2.internal",
+ "PrivateDnsName": "ip-10-0-5-134.ec2.internal",
"Attachment": {
"Status": "attached",
"DeviceIndex": 0,
"DeleteOnTermination": true,
- "AttachmentId": "eni-attach-09624bb294e015616",
- "AttachTime": "2018-09-07T20:25:50.000Z"
+ "AttachmentId": "eni-attach-00536c4cc9813fb1b",
+ "AttachTime": "2018-09-07T20:22:56.000Z"
},
"Groups": [
{
@@ -64,11 +64,11 @@
],
"Ipv6Addresses": [],
"OwnerId": "816138690521",
- "PrivateIpAddress": "10.0.10.162",
+ "PrivateIpAddress": "10.0.5.134",
"SubnetId": "subnet-018374f09ef32961c",
"Association": {
- "PublicIp": "34.205.252.140",
- "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com",
+ "PublicIp": "18.215.14.161",
+ "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com",
"IpOwnerId": "amazon"
}
}
@@ -86,22 +86,30 @@
"Ebs": {
"Status": "attached",
"DeleteOnTermination": true,
- "VolumeId": "vol-0514fba36b29b890c",
- "AttachTime": "2018-09-07T20:25:51.000Z"
+ "VolumeId": "vol-02edf331c988bde6f",
+ "AttachTime": "2018-09-07T20:22:57.000Z"
}
}
],
"Architecture": "x86_64",
"RootDeviceType": "ebs",
"IamInstanceProfile": {
- "Id": "AIPAJUQY64SRYUWKBSLAK",
- "Arn": "arn:aws:iam::816138690521:instance-profile/trking-87205-bootstrap-profile"
+ "Id": "AIPAIXRRU3YPPZDQJLI3A",
+ "Arn": "arn:aws:iam::816138690521:instance-profile/trking-87205-master-profile"
},
"RootDeviceName": "/dev/xvda",
"VirtualizationType": "hvm",
"Tags": [
{
- "Value": "trking-87205-bootstrap",
+ "Value": "owned",
+ "Key": "kubernetes.io/cluster/trking-87205"
+ },
+ {
+ "Value": "2018-09-08T00:21+0000",
+ "Key": "expirationDate"
+ },
+ {
+ "Value": "trking-87205-master-0",
"Key": "Name"
},
{
@@ -109,10 +117,6 @@
"Key": "tectonicClusterID"
},
{
- "Value": "2018-09-08T00:21+0000",
- "Key": "expirationDate"
- },
- {
"Value": "Resource does not meet policy: stop@2018/09/10",
"Key": "maid_status"
} I don't see any surprising differences, and I have no idea why I can't SSH into the master node. But not being able to SSH into the master makes it hard to figure out why its etcd is broken. Or maybe there's just a networking issue that's behind my inability to connect for both SSH and etcd? |
/retest |
/test unit |
10f717b
to
6d3370d
Compare
/retest |
I've spun up a cluster to debug this, and the master is dying in Ignition: $ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-18d26-master-0']].InstanceId" --output text
i-098c83ac601024a12
$ aws ec2 get-console-output --instance-id i-098c83ac601024a12 --output text | tail -n5
[ 170.062650] ignition[738]: INFO : GET https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: attempt #37
[ 170.073885] ignition[738]: INFO : GET https://trking-18d26-tnc.coreserv[ 170.076770] ignition[738]: INFO : GET error: Get https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: EOF
ices.team.coreos.systems:80/config/master?etcd_index=0: attempt #37
[ 170.087578] ignition[738]: INFO : GET error: Get https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: EOF
2018-09-12T04:49:15.000Z Check from the bootstrap node: $ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-18d26-bootstrap']].PublicIpAddress" --output text
34.205.135.98
$ ssh [email protected]
$ curl -v 'https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0'
* About to connect() to trking-18d26-tnc.coreservices.team.coreos.systems port 80 (#0)
* Trying 10.0.1.14...
* Connected to trking-18d26-tnc.coreservices.team.coreos.systems (10.0.1.14) port 80 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* NSS error -5938 (PR_END_OF_FILE_ERROR)
* Encountered end of file
* Closing connection 0
curl: (35) Encountered end of file
$ systemctl status | head -n2
● ip-10-0-6-58
State: starting
$ systemctl | grep activating
bootkube.service loaded activating start start Bootstrap a Kubernetes cluster
kubelet.service loaded activating auto-restart Kubernetes Kubelet
$ systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Wed 2018-09-12 05:05:40 UTC; 2s ago
Process: 3765 ExecStart=/usr/bin/hyperkube kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --rotate-certificates --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/var/lib/cni/bin --network-plugin=cni --lock-file=/var/run/lock/kubelet.lock --exit-on-lock-contention --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged --node-labels=node-role.kubernetes.io/bootstrap --minimum-container-ttl-duration=6m0s --cluster-dns=10.3.0.10 --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/ca.crt --cloud-provider=aws --anonymous-auth=false --cgroup-driver=systemd --register-with-taints=node-role.kubernetes.io/bootstrap=:NoSchedule (code=exited, status=255)
Process: 3760 ExecStartPre=/usr/bin/bash -c gawk '/certificate-authority-data/ {print $2}' /etc/kubernetes/kubeconfig | base64 --decode > /etc/kubernetes/ca.crt (code=exited, status=0/SUCCESS)
Process: 3758 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
Main PID: 3765 (code=exited, status=255)
Sep 12 05:05:40 ip-10-0-6-58 systemd[1]: Unit kubelet.service entered failed state.
Sep 12 05:05:40 ip-10-0-6-58 systemd[1]: kubelet.service failed.
$ systemctl status bootkube.service
● bootkube.service - Bootstrap a Kubernetes cluster
Loaded: loaded (/etc/systemd/system/bootkube.service; static; vendor preset: disabled)
Active: activating (start) since Wed 2018-09-12 04:50:50 UTC; 15min ago
Main PID: 962 (bash)
Memory: 166.8M
CGroup: /system.slice/bootkube.service
├─ 962 /usr/bin/bash /opt/tectonic/bootkube.sh
└─3066 /usr/bin/podman run --rm --network host --name etcdctl --env ETCDCTL_API=3 --volume /opt/tectonic/tls:/opt/tectonic/tls:ro,z quay.io/coreos/etcd:v3.2.14 /usr/local/bin/etcdctl --dial-timeout...
Sep 12 04:51:02 ip-10-0-6-58 bash[962]: [31B blob data]
Sep 12 04:51:02 ip-10-0-6-58 bash[962]: Copying blob sha256:c15c14574a0bc94fb65cb906baae5debd103dd02991f3449adaa639441b7dde4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: [31B blob data]
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Writing manifest to image destination
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Storing signatures
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: https://trking-18d26-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.9.212:2379: getsockopt: connection refused
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: Error: unhealthy cluster
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: etcdctl failed. Retrying in 5 seconds...
$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ab5fe498da75 quay.io/coreos/etcd:v3.2.14 /usr/local/bin/etcd... 4 minutes ago Up 4 minutes ago etcdctl
83d7fba588d5 quay.io/coreos/kube-etcd-signer-server:678cc8e6841e2121ebfdb6e2db568fce290b67d6 kube-etcd-signer-se... 15 minutes ago Up 15 minutes ago lucid_tesla
cdbffdb210ea quay.io/coreos/tectonic-node-controller-operator-dev:0a24db2288f00b10ced358d9643debd601ffd0f1 /app/operator/node-... 15 minutes ago Exited (0) Less than a second ago trusting_morse
36af8121636c quay.io/coreos/kube-core-renderer-dev:0a24db2288f00b10ced358d9643debd601ffd0f1 /app/operator/kube-... 15 minutes ago Exited (0) Less than a second ago friendly_swanson
$ journalctl -n25
-- Logs begin at Wed 2018-09-12 04:49:14 UTC, end at Wed 2018-09-12 05:08:08 UTC. --
Sep 12 05:07:57 ip-10-0-6-58 systemd[1]: kubelet.service failed.
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Starting Kubernetes Kubelet...
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Started Kubernetes Kubelet.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --rotate-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/do
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --allow-privileged has been deprecated, will be removed in a future version
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tas
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/t
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Started Kubernetes systemd probe.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.892870 4122 server.go:418] Version: v1.11.0+d4cacc0
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.892979 4122 server.go:496] acquiring file lock on "/var/run/lock/kubelet.lock"
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893006 4122 server.go:501] watching for inotify events for: /var/run/lock/kubelet.lock
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893193 4122 aws.go:1032] Building AWS cloudprovider
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893219 4122 aws.go:994] Zone not specified in configuration file; querying AWS metadata service
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Starting Kubernetes systemd probe.
Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: E0912 05:08:08.075211 4122 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: F0912 05:08:08.075258 4122 server.go:262] failed to run Kubelet: could not init cloud provider "aws": AWS cloud failed to find ClusterID
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: Unit kubelet.service entered failed state.
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: kubelet.service failed. So I'm still not clear on what's going on, but etcd is broken, our ignition-file server seems non-responsive and is keeping master-0 from booting, and the kubelet is thrashing around without an |
This might be because we dropped a tag, #217 (comment) |
6d3370d
to
ef35007
Compare
Ah, thanks :). I've pushed 6d3370d -> ef35007, rebasing onto master and restoring that tag to the instance (but, as I explain in the commit message, I'm still removing it from the volumes). |
The smoke error was:
But I can't reproduce when I launch a cluster locally, so maybe it's just a flake. /retest |
This will make it easier to move into the existing infra step. The module source syntax used in the README is documented in [1,2,3], and means "the modules/aws/ami subdirectory of the github.com/openshift/installer repository cloned over HTTPS", etc. I don't think I should need the wrapping brackets in: vpc_security_group_ids = ["${var.vpc_security_group_ids}"] but without it I get [4]: Error: module.bootstrap.aws_instance.bootstrap: vpc_security_group_ids: should be a list The explicit brackets match our approach in the master and worker modules though, so they shouldn't break anything. It sounds like Terraform still has a few problems with remembering type information [5], and that may be what's going on here. I've simplified the tagging a bit, keeping the extra tags unification outside the module. I tried dropping the kubernetes.io/cluster/ tag completely, but it lead to [6]: Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: E0912 05:08:08.075211 4122 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly. Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: F0912 05:08:08.075258 4122 server.go:262] failed to run Kubelet: could not init cloud provider "aws": AWS cloud failed to find ClusterID The backing code for that is [7,8,9]. From [9], you can see that only the tag on the instance matters, so I've dropped kubernetes.io/cluster/... from volume_tags. Going forward, we may move to configuring this directly instead of relying on the tag-based initialization. [1]: https://www.terraform.io/docs/configuration/modules.html#source [2]: https://www.terraform.io/docs/modules/sources.html#github [3]: https://www.terraform.io/docs/modules/sources.html#modules-in-package-sub-directories [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/217/pull-ci-openshift-installer-e2e-aws/47/build-log.txt [5]: hashicorp/terraform#16916 (comment) [6]: openshift#217 [7]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/tags.go#L30-L34 [8]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/tags.go#L100-L126 [9]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/aws.go#L1126-L1132
ef35007
to
8a37f72
Compare
Try rebasing on master again. #244 should help with the flakes. |
Actually, I guess tide is smart enough to merge this onto master before testing. /retest |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As suggested by Stephen Cuppett, this allows registry <-> S3 transfers to bypass the (NAT) gateways. Traffic over the NAT gateways costs money, so the new endpoint should make S3 access from the cluster cheaper (and possibly more reliable). This also allows for additional security policy flexibility, although I'm not taking advantage of that in this commit. Docs for VPC endpoints are in [1,2,3,4]. Endpoints do not currently support cross-region requests [1]. And based on discussion with Stephen, adding an endpoint may *break* access to S3 on other regions. But I can't find docs to back that up, and [3] has: We use the most specific route that matches the traffic to determine how to route the traffic (longest prefix match). If you have an existing route in your route table for all internet traffic (0.0.0.0/0) that points to an internet gateway, the endpoint route takes precedence for all traffic destined for the service, because the IP address range for the service is more specific than 0.0.0.0/0. All other internet traffic goes to your internet gateway, including traffic that's destined for the service in other regions. which suggests that access to S3 on other regions may be unaffected. In any case, our registry buckets, and likely any other buckets associated with the cluster, will be living in the same region. concat is documented in [5]. The wrapping brackets avoid [6]: level=error msg="Error: module.vpc.aws_vpc_endpoint.s3: route_table_ids: should be a list" although I think that's a Terraform bug. See also 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a module, 2018-09-05, openshift#217), which talks about this same issue. [1]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html [2]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html [3]: https://docs.aws.amazon.com/vpc/latest/userguide/vpce-gateway.html [4]: https://www.terraform.io/docs/providers/aws/r/vpc_endpoint.html [5]: https://www.terraform.io/docs/configuration/interpolation.html#concat-list1-list2- [6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/745/pull-ci-openshift-installer-master-e2e-aws/1673/build-log.txt
Centralize extra-tag inclusion on aws/main.tf. This reduces the number of places we need to think about what tags should be ;). Also keep kubernetes.io/cluster/{name} localized in the aws module. See 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a module, 2018-09-05, openshift#217) for why we need to keep it on the bootstrap instance. But the bootstrap resources will be removed after the bootstrap-complete event comes through, and we don't want Kubernetes controllers trying to pick them up. This commit updates the internal Route 53 zone from KubernetesCluster to kubernetes.io/cluster/{name}: owned, catching it up to kubernetes/kubernetes@0b5ae539 (AWS: Support shared tag, 2017-02-18, kubernetes/kubernetes#41695). That tag originally landed on the zone back in 75fb49a (platforms/aws: apply tags to internal route53 zone, 2017-05-02, coreos/tectonic-installer#465). Only the master instances need the clusterid tag, as described in 6c7a5f0 (Tag master machines for adoption by machine controller, 2018-10-17, openshift#479). A number of VPC resources have moved from "shared" to "owned". The shared values are from 45dfc2b (modules/aws,azure: use the new tag format for k8s 1.6, 2017-05-04, coreos/tectonic-installer#469). The commit message doesn't have much to say for motivation, but Brad Ison said [1]: I'm not really sure if anything in Kubernetes actually uses the owned vs. shared values at the moment, but in any case, it might make more sense to mark subnets as shared. That was actually one of the main use cases for moving to this style of tagging -- being able to share subnets between clusters. But we aren't sharing these resources; see 6f55e67 (terraform/aws: remove option to use an existing vpc in aws, 2018-11-11, openshift#654). [1]: coreos/tectonic-installer#469 (comment)
…-release:4.0.0-0.6 Clayton pushed 4.0.0-0.nightly-2019-02-27-213933 to quay.io/openshift-release-dev/ocp-release:4.0.0-0.6. Extracting the associated RHCOS build: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 | grep machine-os-content machine-os-content registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-27-213933@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b $ oc image info registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-26-125216@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b | grep version version=47.330 that's the same machine-os-content image referenced from 4.0.0-0.5, which we used for installer v0.13.0. Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (Use RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03, openshift/release#1793). Also comment out regions which this particular RHCOS build wasn't pushed to, leaving only: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.330/meta.json | jq -r '.amis[] | .name' ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2 I'd initially expected to export the pinning environment variables in release.sh, but I've put them in build.sh here because our continuous integration tests use build.sh directly and don't go through release.sh. Using the slick, new change-log generator from [1], here's everything that changed in the update payload: $ oc adm release info --changelog ~/.local/lib/go/src --changes-from quay.io/openshift-release-dev/ocp-release:4.0.0-0.5 quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 # 4.0.0-0.6 Created: 2019-02-28 20:40:11 +0000 UTC Image Digest: `sha256:5ce3d05da3bfa3d0310684f5ac53d98d66a904d25f2e55c2442705b628560962` Promoted from registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-27-213933 ## Changes from 4.0.0-0.5 ### Components * Kubernetes 1.12.4 ### New images * [pod](https://github.com/openshift/images) git [2f60da39](openshift/images@2f60da3) `sha256:c0d602467dfe0299ce577ba568a9ef5fb9b0864bac6455604258e7f5986d3509` ### Rebuilt images without code change * [cloud-credential-operator](https://github.com/openshift/cloud-credential-operator) git [01bbf372](openshift/cloud-credential-operator@01bbf37) `sha256:f87be09923a5cb081722634d2e0c3d0a5633ea2c23da651398d4e915ad9f73b0` * [cluster-autoscaler](https://github.com/openshift/kubernetes-autoscaler) git [d8a4a304](openshift/kubernetes-autoscaler@d8a4a30) `sha256:955413b82cf8054ce149bc05c18297a8abe9c59f9d0034989f08086ae6c71fa6` * [cluster-autoscaler-operator](https://github.com/openshift/cluster-autoscaler-operator) git [73c46659](openshift/cluster-autoscaler-operator@73c4665) `sha256:756e813fce04841993c8060d08a5684c173cbfb61a090ae67cb1558d76a0336e` * [cluster-bootstrap](https://github.com/openshift/cluster-bootstrap) git [05a5c8e6](openshift/cluster-bootstrap@05a5c8e) `sha256:dbdd90da7d256e8d49e4e21cb0bdef618c79d83f539049f89f3e3af5dbc77e0f` * [cluster-config-operator](https://github.com/openshift/cluster-config-operator) git [aa1805e7](openshift/cluster-config-operator@aa1805e) `sha256:773d3355e6365237501d4eb70d58cd0633feb541d4b6f23d6a5f7b41fd6ad2f5` * [cluster-dns-operator](https://github.com/openshift/cluster-dns-operator) git [ffb04ae9](openshift/cluster-dns-operator@ffb04ae) `sha256:ca15f98cc1f61440f87950773329e1fdf58e73e591638f18c43384ad4f8f84da` * [cluster-machine-approver](https://github.com/openshift/cluster-machine-approver) git [2fbc6a6b](openshift/cluster-machine-approver@2fbc6a6) `sha256:a66af3b1f4ae98257ab600d54f8c94f3a4136f85863bbe0fa7c5dba65c5aea46` * [cluster-node-tuned](https://github.com/openshift/openshift-tuned) git [278ee72d](openshift/openshift-tuned@278ee72) `sha256:ad71743cc50a6f07eba013b496beab9ec817603b07fd3f5c022fffbf400e4f4b` * [cluster-node-tuning-operator](https://github.com/openshift/cluster-node-tuning-operator) git [b5c14deb](openshift/cluster-node-tuning-operator@b5c14de) `sha256:e61d1fdb7ad9f5fed870e917a1bc8fac9ccede6e4426d31678876bcb5896b000` * [cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator) git [3f79b51b](openshift/cluster-openshift-controller-manager-operator@3f79b51) `sha256:8f3b40b4dd29186975c900e41b1a94ce511478eeea653b89a065257a62bf3ae9` * [cluster-svcat-apiserver-operator](https://github.com/openshift/cluster-svcat-apiserver-operator) git [547648cb](openshift/cluster-svcat-apiserver-operator@547648c) `sha256:e7c9323b91dbb11e044d5a1277d1e29d106d92627a6c32bd0368616e0bcf631a` * [cluster-svcat-controller-manager-operator](https://github.com/openshift/cluster-svcat-controller-manager-operator) git [9261f420](openshift/cluster-svcat-controller-manager-operator@9261f42) `sha256:097a429eda2306fcd49e14e4f5db8ec3a09a90fa29ebdbc98cc519511ab6fb5b` * [cluster-version-operator](https://github.com/openshift/cluster-version-operator) git [70c0232e](openshift/cluster-version-operator@70c0232) `sha256:7d59edff68300e13f0b9e56d2f2bc1af7f0051a9fbc76cc208239137ac10f782` * [configmap-reloader](https://github.com/openshift/configmap-reload) git [3c2f8572](openshift/configmap-reload@3c2f857) `sha256:32360c79d8d8d54cea03675c24f9d0a69877a2f2e16b949ca1d97440b8f45220` * [console-operator](https://github.com/openshift/console-operator) git [32ed7c03](openshift/console-operator@32ed7c0) `sha256:f8c07cb72dc8aa931bbfabca9b4133f3b93bc96da59e95110ceb8c64f3efc755` * [container-networking-plugins-supported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:c6434441fa9cc96428385574578c41e9bc833b6db9557df1dd627411d9372bf4` * [container-networking-plugins-unsupported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:bb589cf71d4f41977ec329cf808cdb956d5eedfc604e36b98cfd0bacce513ffc` * [coredns](https://github.com/openshift/coredns) git [fbcb8252](openshift/coredns@fbcb825) `sha256:2f1812a95e153a40ce607de9b3ace7cae5bee67467a44a64672dac54e47f2a66` * [docker-builder](https://github.com/openshift/builder) git [1a77d837](openshift/builder@1a77d83) `sha256:27062ab2c62869e5ffeca234e97863334633241089a5d822a19350f16945fbcb` * [etcd](https://github.com/openshift/etcd) git [a0e62b48](openshift/etcd@a0e62b4) `sha256:e4e9677d004f8f93d4f084739b4502c2957c6620d633e1fdb379c33243c684fa` * [grafana](https://github.com/openshift/grafana) git [58efe0eb](openshift/grafana@58efe0e) `sha256:548abcc50ccb8bb17e6be2baf050062a60fc5ea0ca5d6c59ebcb8286fc9eb043` * [haproxy-router](https://github.com/openshift/router) git [2c33f47f](openshift/router@2c33f47) `sha256:c899b557e4ee2ea7fdbe5c37b5f4f6e9f9748a39119130fa930d9497464bd957` * [k8s-prometheus-adapter](https://github.com/openshift/k8s-prometheus-adapter) git [815fa76b](openshift/k8s-prometheus-adapter@815fa76) `sha256:772c1b40b21ccaa9ffcb5556a1228578526a141b230e8ac0afe19f14404fdffc` * [kube-rbac-proxy](https://github.com/openshift/kube-rbac-proxy) git [3f271e09](openshift/kube-rbac-proxy@3f271e0) `sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572` * [kube-state-metrics](https://github.com/openshift/kube-state-metrics) git [2ab51c9f](openshift/kube-state-metrics@2ab51c9) `sha256:611c800c052de692c84d89da504d9f386d3dcab59cbbcaf6a26023756bc863a0` * [libvirt-machine-controllers](https://github.com/openshift/cluster-api-provider-libvirt) git [7ff8b08f](openshift/cluster-api-provider-libvirt@7ff8b08) `sha256:6ab8749886ec26d45853c0e7ade3c1faaf6b36e09ba2b8a55f66c6cc25052832` * [multus-cni](https://github.com/openshift/ose-multus-cni) git [61f9e088](https://github.com/openshift/ose-multus-cni/commit/61f9e0886370ea5f6093ed61d4cfefc6dadef582) `sha256:e3f87811d22751e7f06863e7a1407652af781e32e614c8535f63d744e923ea5c` * [oauth-proxy](https://github.com/openshift/oauth-proxy) git [b771960b](openshift/oauth-proxy@b771960) `sha256:093a2ac687849e91671ce906054685a4c193dfbed27ebb977302f2e09ad856dc` * [openstack-machine-controllers](https://github.com/openshift/cluster-api-provider-openstack) git [c2d845ba](openshift/cluster-api-provider-openstack@c2d845b) `sha256:f9c321de068d977d5b4adf8f697c5b15f870ccf24ad3e19989b129e744a352a7` * [operator-registry](https://github.com/operator-framework/operator-registry) git [0531400c](operator-framework/operator-registry@0531400) `sha256:730f3b504cccf07e72282caf60dc12f4e7655d7aacf0374d710c3f27125f7008` * [prom-label-proxy](https://github.com/openshift/prom-label-proxy) git [46423f9d](openshift/prom-label-proxy@46423f9) `sha256:3235ad5e22b6f560d447266e0ecb2e5655fda7c0ab5c1021d8d3a4202f04d2ca` * [prometheus](https://github.com/openshift/prometheus) git [6e5fb5dc](openshift/prometheus@6e5fb5d) `sha256:013455905e4a6313f8c471ba5f99962ec097a9cecee3e22bdff3e87061efad57` * [prometheus-alertmanager](https://github.com/openshift/prometheus-alertmanager) git [4617d550](openshift/prometheus-alertmanager@4617d55) `sha256:54512a6cf25cf3baf7fed0b01a1d4786d952d93f662578398cad0d06c9e4e951` * [prometheus-config-reloader](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:244fc5f1a4a0aa983067331c762a04a6939407b4396ae0e86a1dd1519e42bb5d` * [prometheus-node-exporter](https://github.com/openshift/node_exporter) git [f248b582](openshift/node_exporter@f248b58) `sha256:390e5e1b3f3c401a0fea307d6f9295c7ff7d23b4b27fa0eb8f4017bd86d7252c` * [prometheus-operator](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:6e697dcaa19e03bded1edf5770fb19c0d2cd8739885e79723e898824ce3cd8f5` * [service-catalog](https://github.com/openshift/service-catalog) git [b24ffd6f](openshift/service-catalog@b24ffd6) `sha256:85ea2924810ced0a66d414adb63445a90d61ab5318808859790b1d4b7decfea6` * [service-serving-cert-signer](https://github.com/openshift/service-serving-cert-signer) git [30924216](openshift/service-serving-cert-signer@3092421) `sha256:7f89db559ffbd3bf609489e228f959a032d68dd78ae083be72c9048ef0c35064` * [telemeter](https://github.com/openshift/telemeter) git [e12aabe4](openshift/telemeter@e12aabe) `sha256:fd518d2c056d4ab8a89d80888e0a96445be41f747bfc5f93aa51c7177cf92b92` ### [aws-machine-controllers](https://github.com/openshift/cluster-api-provider-aws) * client: add cluster-api-provider-aws to UserAgent for AWS API calls [openshift#167](openshift/cluster-api-provider-aws#167) * Drop the yaml unmarshalling [openshift#155](openshift/cluster-api-provider-aws#155) * [Full changelog](openshift/cluster-api-provider-aws@46f4852...c0c3b9e) ### [cli, deployer, hyperkube, hypershift, node, tests](https://github.com/openshift/ose) * Build OSTree using baked SELinux policy [#22081](https://github.com/openshift/ose/pull/22081) * NodeName was being cleared for `oc debug node/X` instead of set [#22086](https://github.com/openshift/ose/pull/22086) * UPSTREAM: 73894: Print the involved object in the event table [#22039](https://github.com/openshift/ose/pull/22039) * Publish CRD openapi [#22045](https://github.com/openshift/ose/pull/22045) * UPSTREAM: 00000: wait for CRD discovery to be successful once before [#22149](https://github.com/openshift/ose/pull/22149) * `oc adm release info --changelog` should clone if necessary [#22148](https://github.com/openshift/ose/pull/22148) * [Full changelog](openshift/ose@c547bc3...0cbcfc5) ### [cluster-authentication-operator](https://github.com/openshift/cluster-authentication-operator) * Add redeploy on serving cert and operator pod template change [openshift#75](openshift/cluster-authentication-operator#75) * Create the service before waiting for serving certs [openshift#84](openshift/cluster-authentication-operator#84) * [Full changelog](openshift/cluster-authentication-operator@78dd53b...35879ec) ### [cluster-image-registry-operator](https://github.com/openshift/cluster-image-registry-operator) * Enable subresource status [openshift#209](openshift/cluster-image-registry-operator#209) * Add ReadOnly flag [openshift#210](openshift/cluster-image-registry-operator#210) * do not setup ownerrefs for clusterscoped/cross-namespace objects [openshift#215](openshift/cluster-image-registry-operator#215) * s3: include operator version in UserAgent for AWS API calls [openshift#212](openshift/cluster-image-registry-operator#212) * [Full changelog](openshift/cluster-image-registry-operator@0780074...8060048) ### [cluster-ingress-operator](https://github.com/openshift/cluster-ingress-operator) * Adds info log msg indicating ns/secret used by DNSManager [openshift#134](openshift/cluster-ingress-operator#134) * Introduce certificate controller [openshift#140](openshift/cluster-ingress-operator#140) * [Full changelog](openshift/cluster-ingress-operator@1b4fa5a...09d14db) ### [cluster-kube-apiserver-operator](https://github.com/openshift/cluster-kube-apiserver-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#307](openshift/cluster-kube-apiserver-operator#307) * bump to fix early status [openshift#309](openshift/cluster-kube-apiserver-operator#309) * [Full changelog](openshift/cluster-kube-apiserver-operator@4016927...fa75c05) ### [cluster-kube-controller-manager-operator](https://github.com/openshift/cluster-kube-controller-manager-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#183](openshift/cluster-kube-controller-manager-operator#183) * bump to fix empty status [openshift#184](openshift/cluster-kube-controller-manager-operator#184) * [Full changelog](openshift/cluster-kube-controller-manager-operator@95f5f32...53ff6d8) ### [cluster-kube-scheduler-operator](https://github.com/openshift/cluster-kube-scheduler-operator) * Rotate kubeconfig [openshift#62](openshift/cluster-kube-scheduler-operator#62) * Don't pass nil function pointer to NewConfigObserver [openshift#65](openshift/cluster-kube-scheduler-operator#65) * [Full changelog](openshift/cluster-kube-scheduler-operator@50848b4...7066c96) ### [cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator) * *: Clean test invocation and documenation [openshift#267](openshift/cluster-monitoring-operator#267) * pkg/operator: fix progressing state of cluster operator [openshift#268](openshift/cluster-monitoring-operator#268) * jsonnet/main.jsonnet: Bump Prometheus to v2.7.1 [openshift#246](openshift/cluster-monitoring-operator#246) * OWNERS: Remove ironcladlou [openshift#204](openshift/cluster-monitoring-operator#204) * test/e2e: Refactor framework setup & wait for query logic [openshift#265](openshift/cluster-monitoring-operator#265) * jsonnet: Update dependencies [openshift#269](openshift/cluster-monitoring-operator#269) * [Full changelog](openshift/cluster-monitoring-operator@94b701f...3609aea) ### [cluster-network-operator](https://github.com/openshift/cluster-network-operator) * Update to be able to track both DaemonSets and Deployments [openshift#102](openshift/cluster-network-operator#102) * openshift-sdn: more service-catalog netnamespace fixes [openshift#108](openshift/cluster-network-operator#108) * [Full changelog](openshift/cluster-network-operator@9db4d03...15204e6) ### [cluster-openshift-apiserver-operator](https://github.com/openshift/cluster-openshift-apiserver-operator) * bump to fix status reporting [openshift#157](openshift/cluster-openshift-apiserver-operator#157) * [Full changelog](openshift/cluster-openshift-apiserver-operator@1ce6ac7...0a65fe4) ### [cluster-samples-operator](https://github.com/openshift/cluster-samples-operator) * use pumped up rate limiter, shave 30 seconds from startup creates [openshift#113](openshift/cluster-samples-operator#113) * [Full changelog](openshift/cluster-samples-operator@4726068...f001324) ### [cluster-storage-operator](https://github.com/openshift/cluster-storage-operator) * WaitForFirstConsumer in AWS StorageClass [openshift#12](openshift/cluster-storage-operator#12) * [Full changelog](openshift/cluster-storage-operator@dc42489...b850242) ### [console](https://github.com/openshift/console) * Add back OAuth configuration link in kubeadmin notifier [openshift#1202](openshift/console#1202) * Normalize display of <ResourceIcon> across browsers, platforms [openshift#1210](openshift/console#1210) * Add margin spacing so event info doesn't run together before truncating [openshift#1170](openshift/console#1170) * [Full changelog](openshift/console@a0b75bc...d10fb8b) ### [docker-registry](https://github.com/openshift/image-registry) * Bump k8s and OpenShift, use new docker-distribution branch [openshift#165](openshift/image-registry#165) * [Full changelog](openshift/image-registry@75a1fbe...afcc7da) ### [installer](https://github.com/openshift/installer) * data: route53 A records with SimplePolicy should not use health check [openshift#1308](openshift#1308) * bootkube.sh: do not hide problems with render [openshift#1274](openshift#1274) * data/bootstrap/files/usr/local/bin/bootkube: etcdctl from release image [openshift#1315](openshift#1315) * pkg/types/validation: Drop v1beta1 backwards compat hack [openshift#1251](openshift#1251) * pkg/asset/tls: self-sign etcd-client-ca [openshift#1267](openshift#1267) * pkg/asset/tls: self-sign aggregator-ca [openshift#1275](openshift#1275) * pkg/types/validation/installconfig: Drop nominal v1beta2 support [openshift#1319](openshift#1319) * Removing unused/deprecated security groups and ports. Updated AWS doc [openshift#1306](openshift#1306) * [Full changelog](openshift/installer@0208204...563f71f) ### [jenkins, jenkins-agent-maven, jenkins-agent-nodejs](https://github.com/openshift/jenkins) * recover from jenkins deps backleveling workflow-durable-task-step fro… [openshift#806](openshift/jenkins#806) * [Full changelog](openshift/jenkins@2485f9a...e4583ca) ### [machine-api-operator](https://github.com/openshift/machine-api-operator) * Rename labels from sigs.k8s.io to machine.openshift.io [openshift#213](openshift/machine-api-operator#213) * Remove clusters.cluster.k8s.io CRD [openshift#225](openshift/machine-api-operator#225) * MAO: Stop setting statusProgressing=true when resyincing same version [openshift#217](openshift/machine-api-operator#217) * Generate clientset for machine health check API [openshift#223](openshift/machine-api-operator#223) * [Full changelog](openshift/machine-api-operator@bf95d7d...34c3424) ### [machine-config-controller, machine-config-daemon, machine-config-operator, machine-config-server, setup-etcd-environment](https://github.com/openshift/machine-config-operator) * daemon: Only print status if os == RHCOS [openshift#495](openshift/machine-config-operator#495) * Add pod image to image-references [openshift#500](openshift/machine-config-operator#500) * pkg/daemon: stash the node object [openshift#464](openshift/machine-config-operator#464) * Eliminate use of cpu limits [openshift#503](openshift/machine-config-operator#503) * MCD: add ign validation check for mc.ignconfig [openshift#481](openshift/machine-config-operator#481) * [Full changelog](openshift/machine-config-operator@875f25e...f0b87fc) ### [operator-lifecycle-manager](https://github.com/operator-framework/operator-lifecycle-manager) * fix(owners): remove cross-namespace and cluster->namespace ownerrefs [openshift#729](operator-framework/operator-lifecycle-manager#729) * [Full changelog](operator-framework/operator-lifecycle-manager@1ac9ace...9186781) ### [operator-marketplace](https://github.com/operator-framework/operator-marketplace) * [opsrc] Do not delete csc during purge [openshift#117](operator-framework/operator-marketplace#117) * Remove Dependency on Owner References [openshift#118](operator-framework/operator-marketplace#118) * [Full changelog](operator-framework/operator-marketplace@7b53305...fedd694) [1]: openshift/origin#22030
Builds on #213; review that first.
This will make it easier to move into the existing infra step.