Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules/aws/bootstrap: Pull AWS bootstrap setup into a module #217

Merged
merged 1 commit into from
Sep 13, 2018

Conversation

wking
Copy link
Member

@wking wking commented Sep 6, 2018

Builds on #213; review that first.

This will make it easier to move into the existing infra step.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 6, 2018

role = "${join("|", aws_iam_role.bootstrap.*.name)}"

#"${var.iam_role == "" ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, no, I need to get that working again... Will hopefully have a fix up shortly ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Fixed with 1efb239 -> f29ab92.

@crawford
Copy link
Contributor

crawford commented Sep 6, 2018

ok to test

@openshift openshift deleted a comment from wking Sep 6, 2018
volume_tags = "${var.tags}"
}

resource "aws_elb_attachment" "bootstrap" {
Copy link
Contributor

@abhinavdahiya abhinavdahiya Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/installer/blob/master/modules/aws/master/main.tf#L128-L144

When we move this module to be part of the infra step; we will end up with the same problem as var.elbs will not be countable at plan stage.
cc: @crawford

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will end up with the same problem as var.elbs will not be countable at plan stage.

This is a reference to hashicorp/terraform#12570, right? I expect we can work around that when we consolidate the stages, but am happy to adjust things here if there's something I can do now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will end up with the same problem as var.elbs will not be countable at plan stage.

This is a reference to hashicorp/terraform#12570, right?

I did indeed hit this, and pushed b8eec241 to #268, which seems like an only-moderately-hideous workaround ;).


tags = "${merge(map(
"Name", "${var.tectonic_cluster_name}-bootstrap",
"kubernetes.io/cluster/${var.tectonic_cluster_name}", "owned",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop this tag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop this tag?

Because we're eventually going to tear down the bootstrap stuff as part of installation, so we aren't leaving it around for Kubernetes to own.

@wking wking force-pushed the aws-bootstrap-module branch from 1efb239 to f29ab92 Compare September 6, 2018 20:09
@wking
Copy link
Member Author

wking commented Sep 6, 2018

#213 just landed, so I've rebased this and think it's good to go (unless more review suggestions come in ;).

@wking
Copy link
Member Author

wking commented Sep 6, 2018

The e2e-aws error was:

1 error(s) occurred:

* module.vpc.data.aws_route_table.worker[4]: data.aws_route_table.worker.4: Your query returned no results. Please change your search criteria and try again.

Probably a flake.

/retest

Also run the smoke tests:

retest this please

@crawford
Copy link
Contributor

crawford commented Sep 6, 2018

/approve

@crawford crawford dismissed their stale review September 6, 2018 20:54

blah blah blah blah blah blah

@wking wking force-pushed the aws-bootstrap-module branch from f29ab92 to 10f717b Compare September 6, 2018 21:40
@wking
Copy link
Member Author

wking commented Sep 6, 2018

Both e2e-aws and the smoke tests died with:

Error: module.bootstrap.aws_instance.bootstrap: vpc_security_group_ids: should be a list

I've pushed f29ab92 -> 10f717b adding explicit brackets to hopefully fix that. The underlying issue may be Terraform occasionally forgetting type information.

@crawford
Copy link
Contributor

crawford commented Sep 7, 2018

/retest

@crawford
Copy link
Contributor

crawford commented Sep 7, 2018

retest this please

@wking
Copy link
Member Author

wking commented Sep 10, 2018

An earlier e2e-aws run failed with:

Waiting for API at https://ci-op-b4gygz8p-5849d-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Waiting for API at https://ci-op-b4gygz8p-5849d-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Interrupted
2018/09/06 23:45:52 Container setup in pod e2e-aws failed, exit code 1, reason Error

Trying to reproduce locally, I spun up a cluster on Friday.

On the bootstrap node:

$ systemctl --failed | head -n2
  UNIT             LOAD   ACTIVE SUB    DESCRIPTION                   
● bootkube.service loaded failed failed Bootstrap a Kubernetes cluster
$ journalctl -u bootkube.service -n6
-- Logs begin at Fri 2018-09-07 20:26:48 UTC, end at Sat 2018-09-08 04:28:19 UTC. --
Sep 07 22:08:37 ip-10-0-10-162 bash[3486]: https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.5.134:2379: getsockopt: connection refused
Sep 07 22:08:37 ip-10-0-10-162 bash[3486]: Error:  unhealthy cluster
Sep 07 22:08:38 ip-10-0-10-162 bash[3486]: etcdctl failed too many times.
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Sep 07 22:08:38 ip-10-0-10-162 systemd[1]: Failed to start Bootstrap a Kubernetes cluster.
$ docker run --rm --env ETCDCTL_API=3 --volume /opt/tectonic/tls:/opt/tectonic/tls:ro,z quay.io/coreos/etcd:v3.2.14 etcdctl --cacert=/opt/tectonic/tls/etcd-client-ca.crt --cert=/opt/tectonic/tls/etcd-client.crt --key=/opt/tectonic/tls/etcd-client.key --endpoints=https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 endpoint health
https://trking-87205-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.5.134:2379: getsockopt: connection refused
Error:  unhealthy cluster
$ dig trking-87205-etcd-0.coreservices.team.coreos.systems +short
10.0.5.134
$ dig trking-87205-master-0.coreservices.team.coreos.systems +short

Uh. Checking from my dev box:

$ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-87205-master-0']].PublicIpAddress" --output text
18.215.14.161
$ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name']]" --output text | grep '^0\|Name\|IPADDRESS\|ASSOCIATION' | cut -b -80
0	x86_64		False	True	xen	ami-00cc4337762ba4a52	i-00c28bd06d8eb77a1	t2.medium	201
PRIVATEIPADDRESSES	True	ip-10-0-133-231.ec2.internal	10.0.133.231
TAGS	Name	trking-87205-worker-0
0	x86_64		False	True	xen	ami-00cc4337762ba4a52	i-072fdf9d1e0beaf3a	t2.medium	201
PRIVATEIPADDRESSES	True	ip-10-0-158-83.ec2.internal	10.0.158.83
TAGS	Name	trking-87205-worker-1
0	x86_64		False	True	xen	ami-00cc4337762ba4a52	i-01cd2e4e6ecaea69e	t2.medium	201
ASSOCIATION	amazon	ec2-18-215-14-161.compute-1.amazonaws.com	18.215.14.161
PRIVATEIPADDRESSES	True	ip-10-0-5-134.ec2.internal	10.0.5.134
ASSOCIATION	amazon	ec2-18-215-14-161.compute-1.amazonaws.com	18.215.14.161
TAGS	Name	trking-87205-master-0
0	x86_64		False	True	xen	ami-00cc4337762ba4a52	i-0011bb9aa98d56684	t2.medium	201
ASSOCIATION	amazon	ec2-34-205-252-140.compute-1.amazonaws.com	34.205.252.140
PRIVATEIPADDRESSES	True	ip-10-0-10-162.ec2.internal	10.0.10.162
ASSOCIATION	amazon	ec2-34-205-252-140.compute-1.amazonaws.com	34.205.252.140
TAGS	Name	trking-87205-bootstrap
0	x86_64		False	True	xen	ami-00cc4337762ba4a52	i-01d196ea977f59b3b	t2.medium	201
PRIVATEIPADDRESSES	True	ip-10-0-165-55.ec2.internal	10.0.165.55
TAGS	Name	trking-87205-worker-2

So indeed master-0's internal IP is 10.0.5.134, and its public IP is 18.215.14.161. I'm not sure why I can't resolve it via DNS from the bootstrap node. Anyhow, back to the bootstrap node:

$ ssh -v [email protected]
OpenSSH_7.6p1, OpenSSL 1.0.2n  7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 18.215.14.161 [18.215.14.161] port 22.
debug1: connect to address 18.215.14.161 port 22: Connection refused
ssh: connect to host 18.215.14.161 port 22: Connection refused
$ ssh -v [email protected] 
OpenSSH_7.6p1, OpenSSL 1.0.2n  7 Dec 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Connecting to 10.0.5.134 [10.0.5.134] port 22.
debug1: connect to address 10.0.5.134 port 22: Connection refused
ssh: connect to host 10.0.5.134 port 22: Connection refused

And from my dev box:

$ ssh -v [email protected]
OpenSSH_7.4p1, OpenSSL 1.0.2k-fips  26 Jan 2017
debug1: Reading configuration data /home/trking/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 58: Applying options for *
debug1: Connecting to 18.215.14.161 [18.215.14.161] port 22.
debug1: connect to address 18.215.14.161 port 22: Connection refused
ssh: connect to host 18.215.14.161 port 22: Connection refused

What's up with this node?

$ aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[].State.Name' --output text
running
$ aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[].SecurityGroups' --output text
sg-0061338459d264b41				 terraform-20180907202140389700000002

Compare that with the bootstrap node:

$ aws ec2 describe-instances --instance-id i-0011bb9aa98d56684 --query 'Reservations[].Instances[].SecurityGroups' --output text
sg-0061338459d264b41				 terraform-20180907202140389700000002

Same values. Actually, let's just diff the states:

$ BOOTSTRAP="$(aws ec2 describe-instances --instance-id i-0011bb9aa98d56684 --query 'Reservations[].Instances[]' --output json)"
$ MASTER="$(aws ec2 describe-instances --instance-id i-01cd2e4e6ecaea69e --query 'Reservations[].Instances[]' --output json)"
$ diff -u <(echo "${BOOTSTRAP}") <(echo "${MASTER}")
--- /dev/fd/63	2018-09-07 21:59:42.091362945 -0700
+++ /dev/fd/62	2018-09-07 21:59:42.091362945 -0700
@@ -3,22 +3,22 @@
         "Monitoring": {
             "State": "disabled"
         }, 
-        "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com", 
+        "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com", 
         "State": {
             "Code": 16, 
             "Name": "running"
         }, 
         "EbsOptimized": false, 
-        "LaunchTime": "2018-09-07T20:25:50.000Z", 
-        "PublicIpAddress": "34.205.252.140", 
-        "PrivateIpAddress": "10.0.10.162", 
+        "LaunchTime": "2018-09-07T20:22:56.000Z", 
+        "PublicIpAddress": "18.215.14.161", 
+        "PrivateIpAddress": "10.0.5.134", 
         "ProductCodes": [], 
         "VpcId": "vpc-0b6626eba63c20d20", 
         "StateTransitionReason": "", 
-        "InstanceId": "i-0011bb9aa98d56684", 
+        "InstanceId": "i-01cd2e4e6ecaea69e", 
         "EnaSupport": true, 
         "ImageId": "ami-00cc4337762ba4a52", 
-        "PrivateDnsName": "ip-10-0-10-162.ec2.internal", 
+        "PrivateDnsName": "ip-10-0-5-134.ec2.internal", 
         "SecurityGroups": [
             {
                 "GroupName": "terraform-20180907202140389700000002", 
@@ -31,30 +31,30 @@
         "NetworkInterfaces": [
             {
                 "Status": "in-use", 
-                "MacAddress": "02:f8:bf:18:7c:0a", 
+                "MacAddress": "02:52:a7:4b:43:10", 
                 "SourceDestCheck": true, 
                 "VpcId": "vpc-0b6626eba63c20d20", 
                 "Description": "", 
-                "NetworkInterfaceId": "eni-09d792347f4e050db", 
+                "NetworkInterfaceId": "eni-0f5378c203dea3028", 
                 "PrivateIpAddresses": [
                     {
-                        "PrivateDnsName": "ip-10-0-10-162.ec2.internal", 
-                        "PrivateIpAddress": "10.0.10.162", 
+                        "PrivateDnsName": "ip-10-0-5-134.ec2.internal", 
+                        "PrivateIpAddress": "10.0.5.134", 
                         "Primary": true, 
                         "Association": {
-                            "PublicIp": "34.205.252.140", 
-                            "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com", 
+                            "PublicIp": "18.215.14.161", 
+                            "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com", 
                             "IpOwnerId": "amazon"
                         }
                     }
                 ], 
-                "PrivateDnsName": "ip-10-0-10-162.ec2.internal", 
+                "PrivateDnsName": "ip-10-0-5-134.ec2.internal", 
                 "Attachment": {
                     "Status": "attached", 
                     "DeviceIndex": 0, 
                     "DeleteOnTermination": true, 
-                    "AttachmentId": "eni-attach-09624bb294e015616", 
-                    "AttachTime": "2018-09-07T20:25:50.000Z"
+                    "AttachmentId": "eni-attach-00536c4cc9813fb1b", 
+                    "AttachTime": "2018-09-07T20:22:56.000Z"
                 }, 
                 "Groups": [
                     {
@@ -64,11 +64,11 @@
                 ], 
                 "Ipv6Addresses": [], 
                 "OwnerId": "816138690521", 
-                "PrivateIpAddress": "10.0.10.162", 
+                "PrivateIpAddress": "10.0.5.134", 
                 "SubnetId": "subnet-018374f09ef32961c", 
                 "Association": {
-                    "PublicIp": "34.205.252.140", 
-                    "PublicDnsName": "ec2-34-205-252-140.compute-1.amazonaws.com", 
+                    "PublicIp": "18.215.14.161", 
+                    "PublicDnsName": "ec2-18-215-14-161.compute-1.amazonaws.com", 
                     "IpOwnerId": "amazon"
                 }
             }
@@ -86,22 +86,30 @@
                 "Ebs": {
                     "Status": "attached", 
                     "DeleteOnTermination": true, 
-                    "VolumeId": "vol-0514fba36b29b890c", 
-                    "AttachTime": "2018-09-07T20:25:51.000Z"
+                    "VolumeId": "vol-02edf331c988bde6f", 
+                    "AttachTime": "2018-09-07T20:22:57.000Z"
                 }
             }
         ], 
         "Architecture": "x86_64", 
         "RootDeviceType": "ebs", 
         "IamInstanceProfile": {
-            "Id": "AIPAJUQY64SRYUWKBSLAK", 
-            "Arn": "arn:aws:iam::816138690521:instance-profile/trking-87205-bootstrap-profile"
+            "Id": "AIPAIXRRU3YPPZDQJLI3A", 
+            "Arn": "arn:aws:iam::816138690521:instance-profile/trking-87205-master-profile"
         }, 
         "RootDeviceName": "/dev/xvda", 
         "VirtualizationType": "hvm", 
         "Tags": [
             {
-                "Value": "trking-87205-bootstrap", 
+                "Value": "owned", 
+                "Key": "kubernetes.io/cluster/trking-87205"
+            }, 
+            {
+                "Value": "2018-09-08T00:21+0000", 
+                "Key": "expirationDate"
+            }, 
+            {
+                "Value": "trking-87205-master-0", 
                 "Key": "Name"
             }, 
             {
@@ -109,10 +117,6 @@
                 "Key": "tectonicClusterID"
             }, 
             {
-                "Value": "2018-09-08T00:21+0000", 
-                "Key": "expirationDate"
-            }, 
-            {
                 "Value": "Resource does not meet policy: stop@2018/09/10", 
                 "Key": "maid_status"
             }

I don't see any surprising differences, and I have no idea why I can't SSH into the master node. But not being able to SSH into the master makes it hard to figure out why its etcd is broken. Or maybe there's just a networking issue that's behind my inability to connect for both SSH and etcd?

@wking
Copy link
Member Author

wking commented Sep 10, 2018

/retest

@wking
Copy link
Member Author

wking commented Sep 10, 2018

/test unit

@wking wking force-pushed the aws-bootstrap-module branch from 10f717b to 6d3370d Compare September 11, 2018 04:27
@wking
Copy link
Member Author

wking commented Sep 11, 2018

/retest

@wking
Copy link
Member Author

wking commented Sep 12, 2018

I've spun up a cluster to debug this, and the master is dying in Ignition:

$ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-18d26-master-0']].InstanceId" --output text
i-098c83ac601024a12
$ aws ec2 get-console-output --instance-id i-098c83ac601024a12 --output text | tail -n5
[  170.062650] ignition[738]: INFO     : GET https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: attempt #37
[  170.073885] ignition[738]: INFO     : GET https://trking-18d26-tnc.coreserv[  170.076770] ignition[738]: INFO     : GET error: Get https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: EOF
ices.team.coreos.systems:80/config/master?etcd_index=0: attempt #37
[  170.087578] ignition[738]: INFO     : GET error: Get https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0: EOF
	2018-09-12T04:49:15.000Z

Check from the bootstrap node:

$ aws ec2 describe-instances --query "Reservations[].Instances[] | [?Tags[? Key == 'Name' && Value == 'trking-18d26-bootstrap']].PublicIpAddress" --output text
34.205.135.98
$ ssh [email protected]
$ curl -v 'https://trking-18d26-tnc.coreservices.team.coreos.systems:80/config/master?etcd_index=0'
* About to connect() to trking-18d26-tnc.coreservices.team.coreos.systems port 80 (#0)
*   Trying 10.0.1.14...
* Connected to trking-18d26-tnc.coreservices.team.coreos.systems (10.0.1.14) port 80 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* NSS error -5938 (PR_END_OF_FILE_ERROR)
* Encountered end of file
* Closing connection 0
curl: (35) Encountered end of file
$ systemctl status | head -n2
● ip-10-0-6-58
    State: starting
$ systemctl | grep activating
bootkube.service                                                                                                                     loaded activating start        start Bootstrap a Kubernetes cluster
kubelet.service                                                                                                                      loaded activating auto-restart       Kubernetes Kubelet
$ systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Wed 2018-09-12 05:05:40 UTC; 2s ago
  Process: 3765 ExecStart=/usr/bin/hyperkube kubelet --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --rotate-certificates --cni-conf-dir=/etc/kubernetes/cni/net.d --cni-bin-dir=/var/lib/cni/bin --network-plugin=cni --lock-file=/var/run/lock/kubelet.lock --exit-on-lock-contention --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged --node-labels=node-role.kubernetes.io/bootstrap --minimum-container-ttl-duration=6m0s --cluster-dns=10.3.0.10 --cluster-domain=cluster.local --client-ca-file=/etc/kubernetes/ca.crt --cloud-provider=aws --anonymous-auth=false --cgroup-driver=systemd --register-with-taints=node-role.kubernetes.io/bootstrap=:NoSchedule (code=exited, status=255)
  Process: 3760 ExecStartPre=/usr/bin/bash -c gawk '/certificate-authority-data/ {print $2}' /etc/kubernetes/kubeconfig | base64 --decode > /etc/kubernetes/ca.crt (code=exited, status=0/SUCCESS)
  Process: 3758 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
 Main PID: 3765 (code=exited, status=255)

Sep 12 05:05:40 ip-10-0-6-58 systemd[1]: Unit kubelet.service entered failed state.
Sep 12 05:05:40 ip-10-0-6-58 systemd[1]: kubelet.service failed.
$ systemctl status bootkube.service
● bootkube.service - Bootstrap a Kubernetes cluster
   Loaded: loaded (/etc/systemd/system/bootkube.service; static; vendor preset: disabled)
   Active: activating (start) since Wed 2018-09-12 04:50:50 UTC; 15min ago
 Main PID: 962 (bash)
   Memory: 166.8M
   CGroup: /system.slice/bootkube.service
           ├─ 962 /usr/bin/bash /opt/tectonic/bootkube.sh
           └─3066 /usr/bin/podman run --rm --network host --name etcdctl --env ETCDCTL_API=3 --volume /opt/tectonic/tls:/opt/tectonic/tls:ro,z quay.io/coreos/etcd:v3.2.14 /usr/local/bin/etcdctl --dial-timeout...

Sep 12 04:51:02 ip-10-0-6-58 bash[962]: [31B blob data]
Sep 12 04:51:02 ip-10-0-6-58 bash[962]: Copying blob sha256:c15c14574a0bc94fb65cb906baae5debd103dd02991f3449adaa639441b7dde4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: [31B blob data]
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Skipping fetch of repeat blob sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Writing manifest to image destination
Sep 12 04:51:03 ip-10-0-6-58 bash[962]: Storing signatures
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: https://trking-18d26-etcd-0.coreservices.team.coreos.systems:2379 is unhealthy: failed to connect: dial tcp 10.0.9.212:2379: getsockopt: connection refused
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: Error:  unhealthy cluster
Sep 12 05:01:04 ip-10-0-6-58 bash[962]: etcdctl failed. Retrying in 5 seconds...
$ sudo podman ps -a
CONTAINER ID   IMAGE                                                                                           COMMAND                  CREATED          STATUS                              PORTS   NAMES
ab5fe498da75   quay.io/coreos/etcd:v3.2.14                                                                     /usr/local/bin/etcd...   4 minutes ago    Up 4 minutes ago                            etcdctl
83d7fba588d5   quay.io/coreos/kube-etcd-signer-server:678cc8e6841e2121ebfdb6e2db568fce290b67d6                 kube-etcd-signer-se...   15 minutes ago   Up 15 minutes ago                           lucid_tesla
cdbffdb210ea   quay.io/coreos/tectonic-node-controller-operator-dev:0a24db2288f00b10ced358d9643debd601ffd0f1   /app/operator/node-...   15 minutes ago   Exited (0) Less than a second ago           trusting_morse
36af8121636c   quay.io/coreos/kube-core-renderer-dev:0a24db2288f00b10ced358d9643debd601ffd0f1                  /app/operator/kube-...   15 minutes ago   Exited (0) Less than a second ago           friendly_swanson
$ journalctl -n25
-- Logs begin at Wed 2018-09-12 04:49:14 UTC, end at Wed 2018-09-12 05:08:08 UTC. --
Sep 12 05:07:57 ip-10-0-6-58 systemd[1]: kubelet.service failed.
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Starting Kubernetes Kubelet...
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Started Kubernetes Kubelet.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --rotate-certificates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/do
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --allow-privileged has been deprecated, will be removed in a future version
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tas
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --client-ca-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --anonymous-auth has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/t
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Started Kubernetes systemd probe.
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.892870    4122 server.go:418] Version: v1.11.0+d4cacc0
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.892979    4122 server.go:496] acquiring file lock on "/var/run/lock/kubelet.lock"
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893006    4122 server.go:501] watching for inotify events for: /var/run/lock/kubelet.lock
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893193    4122 aws.go:1032] Building AWS cloudprovider
Sep 12 05:08:07 ip-10-0-6-58 hyperkube[4122]: I0912 05:08:07.893219    4122 aws.go:994] Zone not specified in configuration file; querying AWS metadata service
Sep 12 05:08:07 ip-10-0-6-58 systemd[1]: Starting Kubernetes systemd probe.
Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: E0912 05:08:08.075211    4122 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: F0912 05:08:08.075258    4122 server.go:262] failed to run Kubelet: could not init cloud provider "aws": AWS cloud failed to find ClusterID
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: Unit kubelet.service entered failed state.
Sep 12 05:08:08 ip-10-0-6-58 systemd[1]: kubelet.service failed.

So I'm still not clear on what's going on, but etcd is broken, our ignition-file server seems non-responsive and is keeping master-0 from booting, and the kubelet is thrashing around without an aws cloud provider and with a bunch of deprecated options. I still don't see how any of that is related to the changes in my PR :p.

@abhinavdahiya
Copy link
Contributor

Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.

This might be because we dropped a tag, #217 (comment)

@wking wking force-pushed the aws-bootstrap-module branch from 6d3370d to ef35007 Compare September 12, 2018 16:19
@wking
Copy link
Member Author

wking commented Sep 12, 2018

Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.

This might be because we dropped a tag, #217 (comment)

Ah, thanks :). I've pushed 6d3370d -> ef35007, rebasing onto master and restoring that tag to the instance (but, as I explain in the commit message, I'm still removing it from the volumes).

@wking
Copy link
Member Author

wking commented Sep 12, 2018

The smoke error was:

Waiting for API at https://ci-op-hlzw4yd1-3e1a1-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Waiting for API at https://ci-op-hlzw4yd1-3e1a1-api.origin-ci-int-aws.dev.rhcloud.com:6443 to respond ...
Interrupted
2018/09/12 18:44:23 Container setup in pod e2e-aws-smoke failed, exit code 1, reason Error

But I can't reproduce when I launch a cluster locally, so maybe it's just a flake.

/retest

This will make it easier to move into the existing infra step.

The module source syntax used in the README is documented in [1,2,3],
and means "the modules/aws/ami subdirectory of the
github.com/openshift/installer repository cloned over HTTPS", etc.

I don't think I should need the wrapping brackets in:

  vpc_security_group_ids = ["${var.vpc_security_group_ids}"]

but without it I get [4]:

  Error: module.bootstrap.aws_instance.bootstrap: vpc_security_group_ids: should be a list

The explicit brackets match our approach in the master and worker
modules though, so they shouldn't break anything.  It sounds like
Terraform still has a few problems with remembering type information
[5], and that may be what's going on here.

I've simplified the tagging a bit, keeping the extra tags unification
outside the module.  I tried dropping the kubernetes.io/cluster/ tag
completely, but it lead to [6]:

  Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: E0912 05:08:08.075211    4122 tags.go:94] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
  Sep 12 05:08:08 ip-10-0-6-58 hyperkube[4122]: F0912 05:08:08.075258    4122 server.go:262] failed to run Kubelet: could not init cloud provider "aws": AWS cloud failed to find ClusterID

The backing code for that is [7,8,9].  From [9], you can see that only
the tag on the instance matters, so I've dropped
kubernetes.io/cluster/... from volume_tags.  Going forward, we may
move to configuring this directly instead of relying on the tag-based
initialization.

[1]: https://www.terraform.io/docs/configuration/modules.html#source
[2]: https://www.terraform.io/docs/modules/sources.html#github
[3]: https://www.terraform.io/docs/modules/sources.html#modules-in-package-sub-directories
[4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/217/pull-ci-openshift-installer-e2e-aws/47/build-log.txt
[5]: hashicorp/terraform#16916 (comment)
[6]: openshift#217
[7]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/tags.go#L30-L34
[8]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/tags.go#L100-L126
[9]: https://github.com/kubernetes/kubernetes/blob/v1.11.3/pkg/cloudprovider/providers/aws/aws.go#L1126-L1132
@wking wking force-pushed the aws-bootstrap-module branch from ef35007 to 8a37f72 Compare September 12, 2018 22:32
@crawford
Copy link
Contributor

Try rebasing on master again. #244 should help with the flakes.

@crawford
Copy link
Contributor

Actually, I guess tide is smart enough to merge this onto master before testing.

/retest

@wking
Copy link
Member Author

wking commented Sep 13, 2018

/retest

@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 13, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit dfd9ff9 into openshift:master Sep 13, 2018
@wking wking deleted the aws-bootstrap-module branch September 13, 2018 17:27
wking added a commit to wking/openshift-installer that referenced this pull request Dec 13, 2018
As suggested by Stephen Cuppett, this allows registry <-> S3 transfers
to bypass the (NAT) gateways.  Traffic over the NAT gateways costs
money, so the new endpoint should make S3 access from the cluster
cheaper (and possibly more reliable).  This also allows for additional
security policy flexibility, although I'm not taking advantage of that
in this commit.  Docs for VPC endpoints are in [1,2,3,4].

Endpoints do not currently support cross-region requests [1].  And
based on discussion with Stephen, adding an endpoint may *break*
access to S3 on other regions.  But I can't find docs to back that up,
and [3] has:

  We use the most specific route that matches the traffic to determine
  how to route the traffic (longest prefix match).  If you have an
  existing route in your route table for all internet traffic
  (0.0.0.0/0) that points to an internet gateway, the endpoint route
  takes precedence for all traffic destined for the service, because
  the IP address range for the service is more specific than
  0.0.0.0/0.  All other internet traffic goes to your internet
  gateway, including traffic that's destined for the service in other
  regions.

which suggests that access to S3 on other regions may be unaffected.
In any case, our registry buckets, and likely any other buckets
associated with the cluster, will be living in the same region.

concat is documented in [5].  The wrapping brackets avoid [6]:

  level=error msg="Error: module.vpc.aws_vpc_endpoint.s3: route_table_ids: should be a list"

although I think that's a Terraform bug.  See also 8a37f72
(modules/aws/bootstrap: Pull AWS bootstrap setup into a module,
2018-09-05, openshift#217), which talks about this same issue.

[1]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-s3.html
[2]: https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html
[3]: https://docs.aws.amazon.com/vpc/latest/userguide/vpce-gateway.html
[4]: https://www.terraform.io/docs/providers/aws/r/vpc_endpoint.html
[5]: https://www.terraform.io/docs/configuration/interpolation.html#concat-list1-list2-
[6]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/745/pull-ci-openshift-installer-master-e2e-aws/1673/build-log.txt
wking added a commit to wking/openshift-installer that referenced this pull request Jan 11, 2019
Centralize extra-tag inclusion on aws/main.tf.  This reduces the
number of places we need to think about what tags should be ;).

Also keep kubernetes.io/cluster/{name} localized in the aws module.
See 8a37f72 (modules/aws/bootstrap: Pull AWS bootstrap setup into a
module, 2018-09-05, openshift#217) for why we need to keep it on the bootstrap
instance.  But the bootstrap resources will be removed after the
bootstrap-complete event comes through, and we don't want Kubernetes
controllers trying to pick them up.

This commit updates the internal Route 53 zone from KubernetesCluster
to kubernetes.io/cluster/{name}: owned, catching it up to
kubernetes/kubernetes@0b5ae539 (AWS: Support shared tag, 2017-02-18,
kubernetes/kubernetes#41695).  That tag originally landed on the zone
back in 75fb49a (platforms/aws: apply tags to internal route53 zone,
2017-05-02, coreos/tectonic-installer#465).

Only the master instances need the clusterid tag, as described in
6c7a5f0 (Tag master machines for adoption by machine controller,
2018-10-17, openshift#479).

A number of VPC resources have moved from "shared" to "owned".  The
shared values are from 45dfc2b (modules/aws,azure: use the new tag
format for k8s 1.6, 2017-05-04, coreos/tectonic-installer#469).  The
commit message doesn't have much to say for motivation, but Brad Ison
said [1]:

  I'm not really sure if anything in Kubernetes actually uses the
  owned vs. shared values at the moment, but in any case, it might
  make more sense to mark subnets as shared.  That was actually one of
  the main use cases for moving to this style of tagging -- being able
  to share subnets between clusters.

But we aren't sharing these resources; see 6f55e67 (terraform/aws:
remove option to use an existing vpc in aws, 2018-11-11, openshift#654).

[1]: coreos/tectonic-installer#469 (comment)
wking added a commit to wking/openshift-installer that referenced this pull request Feb 28, 2019
…-release:4.0.0-0.6

Clayton pushed 4.0.0-0.nightly-2019-02-27-213933 to
quay.io/openshift-release-dev/ocp-release:4.0.0-0.6.  Extracting the
associated RHCOS build:

  $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 | grep machine-os-content
    machine-os-content                            registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-27-213933@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b
  $ oc image info registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-26-125216@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b | grep version
              version=47.330

that's the same machine-os-content image referenced from 4.0.0-0.5,
which we used for installer v0.13.0.

Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing
of the pinned release despite openshift/release@60007df2 (Use
RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03,
openshift/release#1793).

Also comment out regions which this particular RHCOS build wasn't
pushed to, leaving only:

  $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.330/meta.json | jq -r '.amis[] | .name'
  ap-northeast-1
  ap-northeast-2
  ap-south-1
  ap-southeast-1
  ap-southeast-2
  ca-central-1
  eu-central-1
  eu-west-1
  eu-west-2
  eu-west-3
  sa-east-1
  us-east-1
  us-east-2
  us-west-1
  us-west-2

I'd initially expected to export the pinning environment variables in
release.sh, but I've put them in build.sh here because our continuous
integration tests use build.sh directly and don't go through
release.sh.

Using the slick, new change-log generator from [1], here's everything
that changed in the update payload:

  $ oc adm release info --changelog ~/.local/lib/go/src --changes-from quay.io/openshift-release-dev/ocp-release:4.0.0-0.5 quay.io/openshift-release-dev/ocp-release:4.0.0-0.6
  # 4.0.0-0.6

  Created: 2019-02-28 20:40:11 +0000 UTC
  Image Digest: `sha256:5ce3d05da3bfa3d0310684f5ac53d98d66a904d25f2e55c2442705b628560962`
  Promoted from registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-27-213933

  ## Changes from 4.0.0-0.5

  ### Components

  * Kubernetes 1.12.4

  ### New images

  * [pod](https://github.com/openshift/images) git [2f60da39](openshift/images@2f60da3) `sha256:c0d602467dfe0299ce577ba568a9ef5fb9b0864bac6455604258e7f5986d3509`

  ### Rebuilt images without code change

  * [cloud-credential-operator](https://github.com/openshift/cloud-credential-operator) git [01bbf372](openshift/cloud-credential-operator@01bbf37) `sha256:f87be09923a5cb081722634d2e0c3d0a5633ea2c23da651398d4e915ad9f73b0`
  * [cluster-autoscaler](https://github.com/openshift/kubernetes-autoscaler) git [d8a4a304](openshift/kubernetes-autoscaler@d8a4a30) `sha256:955413b82cf8054ce149bc05c18297a8abe9c59f9d0034989f08086ae6c71fa6`
  * [cluster-autoscaler-operator](https://github.com/openshift/cluster-autoscaler-operator) git [73c46659](openshift/cluster-autoscaler-operator@73c4665) `sha256:756e813fce04841993c8060d08a5684c173cbfb61a090ae67cb1558d76a0336e`
  * [cluster-bootstrap](https://github.com/openshift/cluster-bootstrap) git [05a5c8e6](openshift/cluster-bootstrap@05a5c8e) `sha256:dbdd90da7d256e8d49e4e21cb0bdef618c79d83f539049f89f3e3af5dbc77e0f`
  * [cluster-config-operator](https://github.com/openshift/cluster-config-operator) git [aa1805e7](openshift/cluster-config-operator@aa1805e) `sha256:773d3355e6365237501d4eb70d58cd0633feb541d4b6f23d6a5f7b41fd6ad2f5`
  * [cluster-dns-operator](https://github.com/openshift/cluster-dns-operator) git [ffb04ae9](openshift/cluster-dns-operator@ffb04ae) `sha256:ca15f98cc1f61440f87950773329e1fdf58e73e591638f18c43384ad4f8f84da`
  * [cluster-machine-approver](https://github.com/openshift/cluster-machine-approver) git [2fbc6a6b](openshift/cluster-machine-approver@2fbc6a6) `sha256:a66af3b1f4ae98257ab600d54f8c94f3a4136f85863bbe0fa7c5dba65c5aea46`
  * [cluster-node-tuned](https://github.com/openshift/openshift-tuned) git [278ee72d](openshift/openshift-tuned@278ee72) `sha256:ad71743cc50a6f07eba013b496beab9ec817603b07fd3f5c022fffbf400e4f4b`
  * [cluster-node-tuning-operator](https://github.com/openshift/cluster-node-tuning-operator) git [b5c14deb](openshift/cluster-node-tuning-operator@b5c14de) `sha256:e61d1fdb7ad9f5fed870e917a1bc8fac9ccede6e4426d31678876bcb5896b000`
  * [cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator) git [3f79b51b](openshift/cluster-openshift-controller-manager-operator@3f79b51) `sha256:8f3b40b4dd29186975c900e41b1a94ce511478eeea653b89a065257a62bf3ae9`
  * [cluster-svcat-apiserver-operator](https://github.com/openshift/cluster-svcat-apiserver-operator) git [547648cb](openshift/cluster-svcat-apiserver-operator@547648c) `sha256:e7c9323b91dbb11e044d5a1277d1e29d106d92627a6c32bd0368616e0bcf631a`
  * [cluster-svcat-controller-manager-operator](https://github.com/openshift/cluster-svcat-controller-manager-operator) git [9261f420](openshift/cluster-svcat-controller-manager-operator@9261f42) `sha256:097a429eda2306fcd49e14e4f5db8ec3a09a90fa29ebdbc98cc519511ab6fb5b`
  * [cluster-version-operator](https://github.com/openshift/cluster-version-operator) git [70c0232e](openshift/cluster-version-operator@70c0232) `sha256:7d59edff68300e13f0b9e56d2f2bc1af7f0051a9fbc76cc208239137ac10f782`
  * [configmap-reloader](https://github.com/openshift/configmap-reload) git [3c2f8572](openshift/configmap-reload@3c2f857) `sha256:32360c79d8d8d54cea03675c24f9d0a69877a2f2e16b949ca1d97440b8f45220`
  * [console-operator](https://github.com/openshift/console-operator) git [32ed7c03](openshift/console-operator@32ed7c0) `sha256:f8c07cb72dc8aa931bbfabca9b4133f3b93bc96da59e95110ceb8c64f3efc755`
  * [container-networking-plugins-supported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:c6434441fa9cc96428385574578c41e9bc833b6db9557df1dd627411d9372bf4`
  * [container-networking-plugins-unsupported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:bb589cf71d4f41977ec329cf808cdb956d5eedfc604e36b98cfd0bacce513ffc`
  * [coredns](https://github.com/openshift/coredns) git [fbcb8252](openshift/coredns@fbcb825) `sha256:2f1812a95e153a40ce607de9b3ace7cae5bee67467a44a64672dac54e47f2a66`
  * [docker-builder](https://github.com/openshift/builder) git [1a77d837](openshift/builder@1a77d83) `sha256:27062ab2c62869e5ffeca234e97863334633241089a5d822a19350f16945fbcb`
  * [etcd](https://github.com/openshift/etcd) git [a0e62b48](openshift/etcd@a0e62b4) `sha256:e4e9677d004f8f93d4f084739b4502c2957c6620d633e1fdb379c33243c684fa`
  * [grafana](https://github.com/openshift/grafana) git [58efe0eb](openshift/grafana@58efe0e) `sha256:548abcc50ccb8bb17e6be2baf050062a60fc5ea0ca5d6c59ebcb8286fc9eb043`
  * [haproxy-router](https://github.com/openshift/router) git [2c33f47f](openshift/router@2c33f47) `sha256:c899b557e4ee2ea7fdbe5c37b5f4f6e9f9748a39119130fa930d9497464bd957`
  * [k8s-prometheus-adapter](https://github.com/openshift/k8s-prometheus-adapter) git [815fa76b](openshift/k8s-prometheus-adapter@815fa76) `sha256:772c1b40b21ccaa9ffcb5556a1228578526a141b230e8ac0afe19f14404fdffc`
  * [kube-rbac-proxy](https://github.com/openshift/kube-rbac-proxy) git [3f271e09](openshift/kube-rbac-proxy@3f271e0) `sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572`
  * [kube-state-metrics](https://github.com/openshift/kube-state-metrics) git [2ab51c9f](openshift/kube-state-metrics@2ab51c9) `sha256:611c800c052de692c84d89da504d9f386d3dcab59cbbcaf6a26023756bc863a0`
  * [libvirt-machine-controllers](https://github.com/openshift/cluster-api-provider-libvirt) git [7ff8b08f](openshift/cluster-api-provider-libvirt@7ff8b08) `sha256:6ab8749886ec26d45853c0e7ade3c1faaf6b36e09ba2b8a55f66c6cc25052832`
  * [multus-cni](https://github.com/openshift/ose-multus-cni) git [61f9e088](https://github.com/openshift/ose-multus-cni/commit/61f9e0886370ea5f6093ed61d4cfefc6dadef582) `sha256:e3f87811d22751e7f06863e7a1407652af781e32e614c8535f63d744e923ea5c`
  * [oauth-proxy](https://github.com/openshift/oauth-proxy) git [b771960b](openshift/oauth-proxy@b771960) `sha256:093a2ac687849e91671ce906054685a4c193dfbed27ebb977302f2e09ad856dc`
  * [openstack-machine-controllers](https://github.com/openshift/cluster-api-provider-openstack) git [c2d845ba](openshift/cluster-api-provider-openstack@c2d845b) `sha256:f9c321de068d977d5b4adf8f697c5b15f870ccf24ad3e19989b129e744a352a7`
  * [operator-registry](https://github.com/operator-framework/operator-registry) git [0531400c](operator-framework/operator-registry@0531400) `sha256:730f3b504cccf07e72282caf60dc12f4e7655d7aacf0374d710c3f27125f7008`
  * [prom-label-proxy](https://github.com/openshift/prom-label-proxy) git [46423f9d](openshift/prom-label-proxy@46423f9) `sha256:3235ad5e22b6f560d447266e0ecb2e5655fda7c0ab5c1021d8d3a4202f04d2ca`
  * [prometheus](https://github.com/openshift/prometheus) git [6e5fb5dc](openshift/prometheus@6e5fb5d) `sha256:013455905e4a6313f8c471ba5f99962ec097a9cecee3e22bdff3e87061efad57`
  * [prometheus-alertmanager](https://github.com/openshift/prometheus-alertmanager) git [4617d550](openshift/prometheus-alertmanager@4617d55) `sha256:54512a6cf25cf3baf7fed0b01a1d4786d952d93f662578398cad0d06c9e4e951`
  * [prometheus-config-reloader](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:244fc5f1a4a0aa983067331c762a04a6939407b4396ae0e86a1dd1519e42bb5d`
  * [prometheus-node-exporter](https://github.com/openshift/node_exporter) git [f248b582](openshift/node_exporter@f248b58) `sha256:390e5e1b3f3c401a0fea307d6f9295c7ff7d23b4b27fa0eb8f4017bd86d7252c`
  * [prometheus-operator](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:6e697dcaa19e03bded1edf5770fb19c0d2cd8739885e79723e898824ce3cd8f5`
  * [service-catalog](https://github.com/openshift/service-catalog) git [b24ffd6f](openshift/service-catalog@b24ffd6) `sha256:85ea2924810ced0a66d414adb63445a90d61ab5318808859790b1d4b7decfea6`
  * [service-serving-cert-signer](https://github.com/openshift/service-serving-cert-signer) git [30924216](openshift/service-serving-cert-signer@3092421) `sha256:7f89db559ffbd3bf609489e228f959a032d68dd78ae083be72c9048ef0c35064`
  * [telemeter](https://github.com/openshift/telemeter) git [e12aabe4](openshift/telemeter@e12aabe) `sha256:fd518d2c056d4ab8a89d80888e0a96445be41f747bfc5f93aa51c7177cf92b92`

  ### [aws-machine-controllers](https://github.com/openshift/cluster-api-provider-aws)

  * client: add cluster-api-provider-aws to UserAgent for AWS API calls [openshift#167](openshift/cluster-api-provider-aws#167)
  * Drop the yaml unmarshalling [openshift#155](openshift/cluster-api-provider-aws#155)
  * [Full changelog](openshift/cluster-api-provider-aws@46f4852...c0c3b9e)

  ### [cli, deployer, hyperkube, hypershift, node, tests](https://github.com/openshift/ose)

  * Build OSTree using baked SELinux policy [#22081](https://github.com/openshift/ose/pull/22081)
  * NodeName was being cleared for `oc debug node/X` instead of set [#22086](https://github.com/openshift/ose/pull/22086)
  * UPSTREAM: 73894: Print the involved object in the event table [#22039](https://github.com/openshift/ose/pull/22039)
  * Publish CRD openapi [#22045](https://github.com/openshift/ose/pull/22045)
  * UPSTREAM: 00000: wait for CRD discovery to be successful once before [#22149](https://github.com/openshift/ose/pull/22149)
  * `oc adm release info --changelog` should clone if necessary [#22148](https://github.com/openshift/ose/pull/22148)
  * [Full changelog](openshift/ose@c547bc3...0cbcfc5)

  ### [cluster-authentication-operator](https://github.com/openshift/cluster-authentication-operator)

  * Add redeploy on serving cert and operator pod template change [openshift#75](openshift/cluster-authentication-operator#75)
  * Create the service before waiting for serving certs [openshift#84](openshift/cluster-authentication-operator#84)
  * [Full changelog](openshift/cluster-authentication-operator@78dd53b...35879ec)

  ### [cluster-image-registry-operator](https://github.com/openshift/cluster-image-registry-operator)

  * Enable subresource status [openshift#209](openshift/cluster-image-registry-operator#209)
  * Add ReadOnly flag [openshift#210](openshift/cluster-image-registry-operator#210)
  * do not setup ownerrefs for clusterscoped/cross-namespace objects [openshift#215](openshift/cluster-image-registry-operator#215)
  * s3: include operator version in UserAgent for AWS API calls [openshift#212](openshift/cluster-image-registry-operator#212)
  * [Full changelog](openshift/cluster-image-registry-operator@0780074...8060048)

  ### [cluster-ingress-operator](https://github.com/openshift/cluster-ingress-operator)

  * Adds info log msg indicating ns/secret used by DNSManager [openshift#134](openshift/cluster-ingress-operator#134)
  * Introduce certificate controller [openshift#140](openshift/cluster-ingress-operator#140)
  * [Full changelog](openshift/cluster-ingress-operator@1b4fa5a...09d14db)

  ### [cluster-kube-apiserver-operator](https://github.com/openshift/cluster-kube-apiserver-operator)

  * bump(*): fix installer pod shutdown and rolebinding [openshift#307](openshift/cluster-kube-apiserver-operator#307)
  * bump to fix early status [openshift#309](openshift/cluster-kube-apiserver-operator#309)
  * [Full changelog](openshift/cluster-kube-apiserver-operator@4016927...fa75c05)

  ### [cluster-kube-controller-manager-operator](https://github.com/openshift/cluster-kube-controller-manager-operator)

  * bump(*): fix installer pod shutdown and rolebinding [openshift#183](openshift/cluster-kube-controller-manager-operator#183)
  * bump to fix empty status [openshift#184](openshift/cluster-kube-controller-manager-operator#184)
  * [Full changelog](openshift/cluster-kube-controller-manager-operator@95f5f32...53ff6d8)

  ### [cluster-kube-scheduler-operator](https://github.com/openshift/cluster-kube-scheduler-operator)

  * Rotate kubeconfig [openshift#62](openshift/cluster-kube-scheduler-operator#62)
  * Don't pass nil function pointer to NewConfigObserver [openshift#65](openshift/cluster-kube-scheduler-operator#65)
  * [Full changelog](openshift/cluster-kube-scheduler-operator@50848b4...7066c96)

  ### [cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator)

  * *: Clean test invocation and documenation [openshift#267](openshift/cluster-monitoring-operator#267)
  * pkg/operator: fix progressing state of cluster operator [openshift#268](openshift/cluster-monitoring-operator#268)
  * jsonnet/main.jsonnet: Bump Prometheus to v2.7.1 [openshift#246](openshift/cluster-monitoring-operator#246)
  * OWNERS: Remove ironcladlou [openshift#204](openshift/cluster-monitoring-operator#204)
  * test/e2e: Refactor framework setup & wait for query logic [openshift#265](openshift/cluster-monitoring-operator#265)
  * jsonnet: Update dependencies [openshift#269](openshift/cluster-monitoring-operator#269)
  * [Full changelog](openshift/cluster-monitoring-operator@94b701f...3609aea)

  ### [cluster-network-operator](https://github.com/openshift/cluster-network-operator)

  * Update to be able to track both DaemonSets and Deployments [openshift#102](openshift/cluster-network-operator#102)
  * openshift-sdn: more service-catalog netnamespace fixes [openshift#108](openshift/cluster-network-operator#108)
  * [Full changelog](openshift/cluster-network-operator@9db4d03...15204e6)

  ### [cluster-openshift-apiserver-operator](https://github.com/openshift/cluster-openshift-apiserver-operator)

  * bump to fix status reporting [openshift#157](openshift/cluster-openshift-apiserver-operator#157)
  * [Full changelog](openshift/cluster-openshift-apiserver-operator@1ce6ac7...0a65fe4)

  ### [cluster-samples-operator](https://github.com/openshift/cluster-samples-operator)

  * use pumped up rate limiter, shave 30 seconds from startup creates [openshift#113](openshift/cluster-samples-operator#113)
  * [Full changelog](openshift/cluster-samples-operator@4726068...f001324)

  ### [cluster-storage-operator](https://github.com/openshift/cluster-storage-operator)

  * WaitForFirstConsumer in AWS StorageClass [openshift#12](openshift/cluster-storage-operator#12)
  * [Full changelog](openshift/cluster-storage-operator@dc42489...b850242)

  ### [console](https://github.com/openshift/console)

  * Add back OAuth configuration link in kubeadmin notifier [openshift#1202](openshift/console#1202)
  * Normalize display of <ResourceIcon> across browsers, platforms [openshift#1210](openshift/console#1210)
  * Add margin spacing so event info doesn't run together before truncating [openshift#1170](openshift/console#1170)
  * [Full changelog](openshift/console@a0b75bc...d10fb8b)

  ### [docker-registry](https://github.com/openshift/image-registry)

  * Bump k8s and OpenShift, use new docker-distribution branch [openshift#165](openshift/image-registry#165)
  * [Full changelog](openshift/image-registry@75a1fbe...afcc7da)

  ### [installer](https://github.com/openshift/installer)

  * data: route53 A records with SimplePolicy should not use health check [openshift#1308](openshift#1308)
  * bootkube.sh: do not hide problems with render [openshift#1274](openshift#1274)
  * data/bootstrap/files/usr/local/bin/bootkube: etcdctl from release image [openshift#1315](openshift#1315)
  * pkg/types/validation: Drop v1beta1 backwards compat hack [openshift#1251](openshift#1251)
  * pkg/asset/tls: self-sign etcd-client-ca [openshift#1267](openshift#1267)
  * pkg/asset/tls: self-sign aggregator-ca [openshift#1275](openshift#1275)
  * pkg/types/validation/installconfig: Drop nominal v1beta2 support [openshift#1319](openshift#1319)
  * Removing unused/deprecated security groups and ports. Updated AWS doc [openshift#1306](openshift#1306)
  * [Full changelog](openshift/installer@0208204...563f71f)

  ### [jenkins, jenkins-agent-maven, jenkins-agent-nodejs](https://github.com/openshift/jenkins)

  * recover from jenkins deps backleveling workflow-durable-task-step fro… [openshift#806](openshift/jenkins#806)
  * [Full changelog](openshift/jenkins@2485f9a...e4583ca)

  ### [machine-api-operator](https://github.com/openshift/machine-api-operator)

  * Rename labels from sigs.k8s.io to machine.openshift.io [openshift#213](openshift/machine-api-operator#213)
  * Remove clusters.cluster.k8s.io CRD [openshift#225](openshift/machine-api-operator#225)
  * MAO: Stop setting statusProgressing=true when resyincing same version [openshift#217](openshift/machine-api-operator#217)
  * Generate clientset for machine health check API [openshift#223](openshift/machine-api-operator#223)
  * [Full changelog](openshift/machine-api-operator@bf95d7d...34c3424)

  ### [machine-config-controller, machine-config-daemon, machine-config-operator, machine-config-server, setup-etcd-environment](https://github.com/openshift/machine-config-operator)

  * daemon: Only print status if os == RHCOS [openshift#495](openshift/machine-config-operator#495)
  * Add pod image to image-references [openshift#500](openshift/machine-config-operator#500)
  * pkg/daemon: stash the node object [openshift#464](openshift/machine-config-operator#464)
  * Eliminate use of cpu limits [openshift#503](openshift/machine-config-operator#503)
  * MCD: add ign validation check for mc.ignconfig [openshift#481](openshift/machine-config-operator#481)
  * [Full changelog](openshift/machine-config-operator@875f25e...f0b87fc)

  ### [operator-lifecycle-manager](https://github.com/operator-framework/operator-lifecycle-manager)

  * fix(owners): remove cross-namespace and cluster->namespace ownerrefs [openshift#729](operator-framework/operator-lifecycle-manager#729)
  * [Full changelog](operator-framework/operator-lifecycle-manager@1ac9ace...9186781)

  ### [operator-marketplace](https://github.com/operator-framework/operator-marketplace)

  * [opsrc] Do not delete csc during purge [openshift#117](operator-framework/operator-marketplace#117)
  * Remove Dependency on Owner References [openshift#118](operator-framework/operator-marketplace#118)
  * [Full changelog](operator-framework/operator-marketplace@7b53305...fedd694)

[1]: openshift/origin#22030
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants