Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: Verify required network items are set #5256

Merged
merged 2 commits into from
Sep 11, 2017

Conversation

ashcrow
Copy link
Member

@ashcrow ashcrow commented Aug 29, 2017

When upgrading osm_cluster_network_cidr, osm_host_subnet_length, and
openshift_portal_net must be set to avoid SDN initialization errors.
This was found when the default parameters were changed between Openshift
versions. This meant users who upgraded and did not specify either
mentioned variable at install/upgrade time ended up getting SDN errors
post upgrade.

When osm_cluster_network_cidr, osm_host_subnet_length, and
openshift_portal_net are not set the upgrade will fail telling the user
that the variables must be set and how to find the current values in the
current install.

References: b50b4ea
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451023

@ashcrow ashcrow added the kind/bug Categorizes issue or PR as related to a bug. label Aug 29, 2017
@ashcrow ashcrow requested review from sdodson and dgoodwin August 29, 2017 15:48
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 6, 2017
@ashcrow
Copy link
Member Author

ashcrow commented Sep 6, 2017

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 957856c (logs)

@openshift-bot
Copy link

error: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 957856c (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 957856c (logs)

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

@sdodson It looks like everything actually passed by looking at the logs ... 😕

@ashcrow ashcrow requested a review from tbielawa September 7, 2017 14:06
tbielawa
tbielawa previously approved these changes Sep 7, 2017
Copy link
Contributor

@tbielawa tbielawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Error message is helpful, I love when software tells me more than just 'it went bad, something with a variable...'

osm_cluster_network_cidr and openshift_portal_net are required inventory variables when upgrading.
These variables should match what is currently used in the cluster. If you don't remember what
these values are you can find them in /etc/origin/master/master-config.yaml on a master with the names
clusterNetworkCIDR(osm_cluster_network_cidr) and hostSubnetLength (openshift_portal_net).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve it because it fixes the bug. I think it could be improved though for extra help-a-user-out value. Another task or play or something that fetched those values from a master so they didn't have to search them out themselves. I won't block on it, that might be excessive work, it's just something I thought of.

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

[merge]

@openshift-bot
Copy link

[test]ing while waiting on the merge queue

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

[INFO] Logging test suite test-upgrade started at Thu Sep  7 11:25:18 EDT 2017
No resources found.
------------------------------------------
creating index with old pattern
error: invalid resource name "pods/logging-es-data-master-wlfhx9va-1-zkshl": [may not contain '/']

Failing on elastic search.

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

aos-ci-test

sdodson
sdodson previously approved these changes Sep 7, 2017
@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 957856c (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 957856c (logs)

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

Hrm, hitting the same thing again...

error: invalid resource name "pods/logging-es-data-master-m2hsfj46-1-bpzsl": [may not contain '/']
....
NAME                    READY     STATUS              RESTARTS   AGE
logging-fluentd-ztrtn   0/1       ContainerCreating   0          56s
... repeated 2 times
NAME                    READY     STATUS              RESTARTS   AGE
logging-fluentd-ztrtn   0/1       ContainerCreating   0          57s
... repeated 3 times
NAME                    READY     STATUS              RESTARTS   AGE
logging-fluentd-ztrtn   0/1       ContainerCreating   0          58s
... repeated 2 times
NAME                    READY     STATUS              RESTARTS   AGE
logging-fluentd-ztrtn   0/1       ContainerCreating   0          59s

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

Rebasing just in case ...

@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

aos-ci-test

@ashcrow ashcrow requested a review from mtnbikenc September 7, 2017 19:28
@ashcrow
Copy link
Member Author

ashcrow commented Sep 7, 2017

👍 good failure. Now to figure out how to get CI to add these variables for upgrade testing ...

  1. Hosts:    localhost
     Play:     Verify upgrade can proceed on first master
     Task:     assert
     Message:  osm_cluster_network_cidr and openshift_portal_net are required inventory variables when upgrading. These variables should match what is currently used in the cluster. If you don't remember what these values are you can find them in /etc/origin/master/master-config.yaml on a master with the names clusterNetworkCIDR(osm_cluster_network_cidr)  and hostSubnetLength (openshift_portal_net).

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for ec76d09 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for ec76d09 (logs)

When upgrading osm_cluster_network_cidr, osm_host_subnet_length, and
openshift_portal_net must be set to avoid SDN initialization errors.
This was found when the default parameters were changed between Openshift
versions. This meant users who upgraded and did not specify either
mentioned variable at install/upgrade time ended up getting SDN errors
post upgrade.

When osm_cluster_network_cidr, osm_host_subnet_length, and
openshift_portal_net are not set the upgrade will fail telling the user
that the variables must be set and how to find the current values in the
current install.

References: openshift@b50b4ea
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451023
@ashcrow ashcrow changed the title upgrade: Verify osm_cluster_network_cidr and openshift_portal_net are set upgrade: Verify required network items are set Sep 8, 2017
@openshift-bot
Copy link

Evaluated for openshift ansible merge up to 10ecef4

@sdodson
Copy link
Member

sdodson commented Sep 8, 2017

[test]

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/1003/) (Base Commit: 756ab33) (PR Branch Commit: 10ecef4)

@ashcrow
Copy link
Member Author

ashcrow commented Sep 8, 2017

     Message:  osm_cluster_network_cidr, osm_host_subnet_length, and openshift_portal_net are required inventory variables when upgrading. These variables should match what is currently used in the cluster. If you don't remember what these values are you can find them in /etc/origin/master/master-config.yaml on a master with the names clusterNetworkCIDR (osm_cluster_network_cidr), osm_host_subnet_length (hostSubnetLength), and openshift_portal_net (hostSubnetLength).

I'm guessing either the change didn't make it to the CI servers yet OR the test kicked off before the change became available.

@ashcrow
Copy link
Member Author

ashcrow commented Sep 8, 2017

aos-ci-test

@sdodson
Copy link
Member

sdodson commented Sep 8, 2017

Test only failed on job ordering, when aos-ci-test comes back green i'll merge.

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for 10ecef4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 10ecef4 (logs)

@ashcrow
Copy link
Member Author

ashcrow commented Sep 11, 2017

Unrelated to this change:

     Play:     Verify Requirements
     Task:     openshift_health_check
     Message:  One or more checks failed
     Details:  check "docker_image_availability":
               One or more required Docker images are not available:
                   XXXX/openshift3/ose-docker-registry:v3.6
               Configured registries: XXXXXXXXXXXXXXXXXXXXXXXXXXX,
               Checked by: timeout 10 skopeo inspect --tls-verify=false

  2. Hosts:    10.8.170.225
     Play:     Configure nodes
     Task:     openshift_node_certificates : fail
     Message:  CA certificate /etc/origin/master/ca.crt doesn't exist on CA host 10.8.170.230. Apply 'openshift_ca' role to 10.8.170.230.

@ashcrow
Copy link
Member Author

ashcrow commented Sep 11, 2017

aos-ci-test

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_NOT_containerized for 10ecef4 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 10ecef4 (logs)

@ashcrow
Copy link
Member Author

ashcrow commented Sep 11, 2017

Different missing image 😦

  1. Hosts:    10.8.169.203
     Play:     Verify Requirements
     Task:     openshift_health_check
     Message:  One or more checks failed
     Details:  check "docker_image_availability":
               One or more required Docker images are not available:
                   XXXX/openshift3/ose-haproxy-router:v3.6

@sdodson
Copy link
Member

sdodson commented Sep 11, 2017

lets assume that's a temporary issue, lets see what [test] says

@openshift-bot
Copy link

Evaluated for openshift ansible test up to 10ecef4

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/636/) (Base Commit: 69f0384) (PR Branch Commit: 10ecef4)

@sdodson sdodson merged commit f95e3c4 into openshift:master Sep 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants