Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use openshift-master-api and openshift-master-controllers setup always #4832

Merged
merged 6 commits into from
Aug 9, 2017

Conversation

smarterclayton
Copy link
Contributor

The installer now always configures two services on the master for the api server and the controllers. Native clustering is the default configuration mode, even when only one master is configured. A warning is added if there are no etcd groups configured. Finally, the new client based leader election code in the controllers is used as the default, which removes the need for the controllers to have access to etcd.

This will set the stage for running controllers and apiservers as static pods.

@sdodson

@smarterclayton smarterclayton force-pushed the stop_using_old_mode branch 2 times, most recently from 8096eaa to 78cb4f8 Compare July 23, 2017 01:16
@smarterclayton
Copy link
Contributor Author

aos-ci-test

@smarterclayton
Copy link
Contributor Author

Changes required to support this in upgrades:

  1. stop and remove the old origin-master.service and sysconfig files if they exist
  2. error / warn if the user had those files that scripts and other automation may need to change.

@smarterclayton
Copy link
Contributor Author

This does not include the change to move controllers to static pods because I realized we need to deal with some fundamental ordering problems before that (the node needs to be configured and ready ahead of time, instead of masters first). Requires the work @kwoodson will be doing to split out node preparation from node config - once that is set up, we may actually want to run node prep on masters, then remove the service units for both api and controller and replace them with static pods, and use the kubelet in run-once mode as well. I don't see much value in system containers for api or controllers, but there is a lot of value in those as static pods, especially with crio present

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_NOT_containerized for 78cb4f8 (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for 78cb4f8 (logs)

@smarterclayton
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_NOT_containerized for 8961e7f (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for 8961e7f (logs)

@smarterclayton
Copy link
Contributor Author

Tsk tsk, jobs not setting etcd hosts

Copy link
Contributor

@tbielawa tbielawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN statements

tasks:
- fail:
msg: No etcd hosts defined. Running an all-in-one master is deprecated and will no longer be supported in a future upgrade.
when: groups.oo_etcd_to_config | default([]) | length == 0 and not openshift_master_unsupported_all_in_one | default(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put ANDd when statements on multiple lines in a list format, e.g. groups.oo_etcd_to_config | default([]) | length == 0 and not openshift_master_unsupported_all_in_one | default(False) should be more like this:

    when:
      - groups.oo_etcd_to_config | default([]) | length == 0
      - not openshift_master_unsupported_all_in_one | default(False)

- name: restart master api
systemd: name={{ openshift.common.service_type }}-master-api state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

notify: Verify API Server

- name: restart master controllers
systemd: name={{ openshift.common.service_type }}-master-controllers state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

- name: restart master api
systemd: name={{ openshift.common.service_type }}-master-api state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

notify: Verify API Server

- name: restart master controllers
systemd: name={{ openshift.common.service_type }}-master-controllers state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

@@ -275,14 +228,14 @@
delay: 1
run_once: true
changed_when: false
when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
when: openshift.master.cluster_method == 'native' and master_api_service_status_changed | bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN


- name: Start and enable master controller on first master
systemd:
name: "{{ openshift.common.service_type }}-master-controllers"
enabled: yes
state: started
when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and inventory_hostname == openshift_master_hosts[0]
when: openshift.master.cluster_method == 'native' and inventory_hostname == openshift_master_hosts[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN


- name: Start and enable master controller on all masters
systemd:
name: "{{ openshift.common.service_type }}-master-controllers"
enabled: yes
state: started
when: openshift_master_ha | bool and openshift.master.cluster_method == 'native' and inventory_hostname != openshift_master_hosts[0]
when: openshift.master.cluster_method == 'native' and inventory_hostname != openshift_master_hosts[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

notify: Verify API Server

- name: restart master controllers
systemd: name={{ openshift.common.service_type }}-master-controllers state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_controllers_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do the multi-line when in a follow up, it would make this much harder to review and the re are way more references in these files than just my changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do all the multi-line when in #4947 (comment)

- name: restart master api
systemd: name={{ openshift.common.service_type }}-master-api state=restarted
when: (openshift.master.ha is defined and openshift.master.ha | bool) and (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
when: (not (master_api_service_status_changed | default(false) | bool)) and openshift.master.cluster_method == 'native'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-line WHEN

@smarterclayton
Copy link
Contributor Author

@sdodson @tbielawa where are the job definitions for the failing tests? The inventories need to be updated with an etcd group.

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 27, 2017
@sdodson
Copy link
Member

sdodson commented Jul 27, 2017

@smarterclayton they're in gerrit, i've updated them so that the master is an etcd host.

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2017
@smarterclayton
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_NOT_containerized for 8738627 (logs)

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for 8738627 (logs)

@smarterclayton
Copy link
Contributor Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 4be1576 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 4be1576 (logs)

@@ -32,7 +32,7 @@
openshift_facts:
role: master
local_facts:
cluster_method: "{{ openshift_master_cluster_method | default(None) }}"
cluster_method: "{{ openshift_master_cluster_method | default('native') }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that, no more openshift_master_cluster_method variable not set after 30 minutes of installation.

@ingvagabund
Copy link
Member

Changes required to support this in upgrades:

  1. stop and remove the old origin-master.service and sysconfig files if they exist
  2. error / warn if the user had those files that scripts and other automation may need to change.

We should not merge this PR until we have a code that supports both changes. Since if anyone tries to upgrade from an existing (e.g. 3.5 or 3.6) cluster to 3.7 with openshift-ansible-3.7.*, the master service will keep running and may cause disruptions during the upgrade.

@smarterclayton
Copy link
Contributor Author

Updated with the upgrade script (which simply disables and masks origin-master) and also removes all additional places that start origin-master.

The fedora job failure may be because the origin-master service is still started. Rerunning in case it picked up the older change.

@smarterclayton
Copy link
Contributor Author

aos-ci-test

@smarterclayton
Copy link
Contributor Author

[test]

@openshift-bot
Copy link

error: aos-ci-jenkins/OS_3.6_containerized for c71bed7 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for c71bed7 (logs)

@sdodson
Copy link
Member

sdodson commented Aug 8, 2017

registry flake?

@sdodson
Copy link
Member

sdodson commented Aug 8, 2017

aos-ci-test

@smarterclayton
Copy link
Contributor Author

Looks like the tests passed in the one job. [test] again to see if it works this time.

@openshift-bot
Copy link

Evaluated for openshift ansible test up to c71bed7

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for c71bed7 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for c71bed7 (logs)

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/424/) (Base Commit: 566731d) (PR Branch Commit: c71bed7)

@smarterclayton
Copy link
Contributor Author

This went green on install_update. The fedora error seems to be happening on other PRs. The status_check failure on test was a flake in the job. Seems to be ok from a test perspective.

@sdodson
Copy link
Member

sdodson commented Aug 9, 2017

[merge]

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to c71bed7

@openshift-bot
Copy link

openshift-bot commented Aug 9, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/819/) (Base Commit: 5fe4c8c) (PR Branch Commit: c71bed7)

@openshift-bot openshift-bot merged commit 57db372 into openshift:master Aug 9, 2017
@ingvagabund
Copy link
Member

ingvagabund commented Aug 9, 2017

\O/ congratz @smarterclayton

@smarterclayton
Copy link
Contributor Author

Will do follow up for the other changes tomorrow or Friday - specifically the when changes

@mffiedler
Copy link

mffiedler commented Aug 9, 2017

This breaks things like the metrics deployer that assume non-ha uses the atomic-openshift-master system unit. Will start filing BZs for other areas to adopt this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants