Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protect against starting rke2-server "by accident" on rke2-agent nodes #1590

Closed
Martin-Weiss opened this issue Aug 10, 2021 · 6 comments
Closed

Comments

@Martin-Weiss
Copy link

Is your feature request related to a problem? Please describe.
We have started rke2-server instead of rke2-agent "by accident" on a system. This i.e. caused one more etcd to be created and this also caused a change in the cluster CIDR settings for kube-proxy, because on agents the settings for CIRD were not included in the config.yaml.

Describe the solution you'd like
We should have a config option "agent or server" in the config.yaml and have only one single service.
Or we should detect during server or agent start, that the "other" one has been enabled or similar..

Describe alternatives you've considered
Do not make "human" mistakes ;-).

@brandond
Copy link
Member

brandond commented Aug 10, 2021

For the RPM packages at least, we only install the unit for one - whichever is selected during install:

rke2/install.sh

Lines 25 to 27 in cfa99d2

# - INSTALL_RKE2_TYPE
# Type of rke2 service. Can be either "server" or "agent".
# Default is "server".
We could probably modify the tarball installer to not drop the unit for the other type.

The two units also conflict with each other, so you can't start them at the same time:

Conflicts=rke2-server.service

@Martin-Weiss
Copy link
Author

Martin-Weiss commented Aug 10, 2021

But "systemctl stop rke2-agent; systemctl start rke2-server" does work and has caused a big problem.... maybe we could get a "type=[agent|server]" option for config.yaml?

@mstrent
Copy link

mstrent commented Sep 2, 2021

Just accidentally ran "systemctl start rke2-server" on all of my agent nodes and Bad Things happened. Still working on recovering.

This is v1.21.4+rke2r2. Installed via "curl -sfL https://get.rke2.io | sh -" method.

@mstrent
Copy link

mstrent commented Sep 2, 2021

I had to completely blow away and recreate the cluster. I didn't capture all the logs or issues or things I tried. But suffice to say, doing this is very bad and should be more actively prevented.

@mstrent
Copy link

mstrent commented Sep 2, 2021

In the mean time, I'm adding this to my Ansible deploy scripts for agent nodes:

# Accidentally starting rke2-server on an agent can totally b0rk the cluster.
- name: Delete rke2-server systemd unit files for safety
  file:
    path: "{{ item }}"
    state: absent
  with_items:
    - /usr/local/lib/systemd/system/rke2-server.env
    - /usr/local/lib/systemd/system/rke2-server.service
  register: server_service

- name: Reload systemd
  systemd:
    daemon_reload: yes
  when: server_service.changed

@stale
Copy link

stale bot commented Mar 2, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Mar 2, 2022
@stale stale bot closed this as completed Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants