Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

37->38 upgrade after previous rollback yields loss of SSH access #1473

Closed
fifofonix opened this issue Apr 17, 2023 · 9 comments
Closed

37->38 upgrade after previous rollback yields loss of SSH access #1473

fifofonix opened this issue Apr 17, 2023 · 9 comments
Assignees
Labels
jira for syncing to jira kind/bug

Comments

@fifofonix
Copy link

Describe the bug

SSHD fails to start because of too public host keys.

Reproduction steps

  1. Commission an F37 host.
  2. Upgrade to F38
  3. Rollback to F37
  4. Upgrade Again to F38

Expected behavior

SSH access should be retained.

Actual behavior

SSH access is lost.

System details

  • vSphere
  • Next F37/F38

Butane or Ignition config

No response

Additional information

After step #3 in reproduction steps the /var/lib/.ssh-host-keys-migration stamp is still present causing the key migration service not to run in step #4. Removal of stamp file prior to step #4 allows a clean upgrade and SSH access is available.

Note that although it is fairly evident that SSHD fails in the boot console it is hard to capture details of the reason for failure from the journals when downgrading to F37 because of journal log version incompatibilities. journalctl provides a warning to this effect but its recommended solution did not work for me on the now F37 node. I ended up capturing the journal file it listed and scp'ing to a F38 machine and then used the command given to examine the file there.

@bgilbert
Copy link
Contributor

Thanks for the report. While this issue does involve migration code that was implemented by the Fedora CoreOS WG, Fedora doesn't generally support downgrades between major versions, so you're trying to do something that won't necessarily work. What led you to discover this problem?

@travier
Copy link
Member

travier commented Apr 18, 2023

Related to #1394

@fifofonix
Copy link
Author

Thanks for the report. While this issue does involve migration code that was implemented by the Fedora CoreOS WG, Fedora doesn't generally support downgrades between major versions, so you're trying to do something that won't necessarily work. What led you to discover this problem?

I think the CIFS related issues with F38 were the main reason I rolled back a few nodes. The rollback feature is awesome of course and it is a shame that it is not officially supported for major versions because intuitively that is where you would think the most types of issues are likely to be discovered.

@dustymabe
Copy link
Member

dustymabe commented Apr 18, 2023

@bgilbert
Thanks for the report. While this issue does involve migration code that was implemented by the Fedora CoreOS WG, Fedora doesn't generally support downgrades between major versions, so you're trying to do something that won't necessarily work. What led you to discover this problem?

Indeed. Fedora traditionally hasn't supported major downgrades (or even really any downgrades IIUC). The OSTree tech powering FCOS/Silverblue/IoT does give us the ability to roll back to the previous version and it is a feature we often highlight as a benefit of running one of these variants.

While it would be hard to support major downgrades generally (i.e. there is a lot of software we don't control) it would be nice if we could do our best to make it as reliable as possible and/or at least collect known issues.. I can think of two ways for us to try to catch something like this in the future:

  • In the very least we could test an upgrade->rollback->upgrade cycle in our Fedora N test day days.
  • We could create an automated test that did this when we did new releases.

@fifofonix
I think the CIFS related issues with F38 were the main reason I rolled back a few nodes. The rollback feature is awesome of course and it is a shame that it is not officially supported for major versions because intuitively that is where you would think the most types of issues are likely to be discovered.

Rolling back for the CIFS related issue is a great example of OSTree working to solve a problem for you (the user).

I think what @bgilbert was highlighting was that generally the people who put software in Fedora don't anticipate major downgrades in packages to occur, so it can invalidate assumptions and expose problems. In this particular case the files that were stored in /etc/ were "rolled back" while there was state that was stored in /var/ that prevented the migration script from running on the re-upgrade.

I'm trying to think of a way we can account for this case and maybe we'll ship a fix before F38 hits stable.

@jlebon
Copy link
Member

jlebon commented Apr 18, 2023

I agree it's hard in the general case to flawlessly support rollbacks if the rest of Fedora doesn't (barring moving to btrfs and snapshotting /var on upgrades :)). I'm comfortable not trying to work against the current and having to deal with e.g. database migration issues manually. In this case though, SSH itself is concerned which might be the only way you have to get into the system, so you don't even have a fighting chance to get out of this manually even if you wanted to.

Hmm, one simple thing we could do I think is to have a systemd unit that runs before ssh-host-keys-migration.service and just rm -f the stamp file? The unit is idempotent, so it's fine if it reruns on an already-migrated system. Then drop it in a barrier where the likelihood that you'd rollback to f37 again is very low (could even be the very next release).

@dustymabe dustymabe added the meeting topics for meetings label Apr 18, 2023
@dustymabe dustymabe changed the title Rollback from F38 to F37 followed by another F38 upgrade can lead to loss of SSH access 37->38 upgrade after previous rollback yields loss of SSH access Apr 19, 2023
@dustymabe
Copy link
Member

We discussed this in the community meeting today.

12:56:35  dustymabe | #agreed we will ship a systemd dropin to remove the stamp file  
                    | ConditionPathExists= on the migration unit so the idempotent
                    | migration code will run on every boot until we remove it after a
                    | barrier release

@dustymabe dustymabe self-assigned this Apr 19, 2023
@dustymabe dustymabe added the jira for syncing to jira label Apr 19, 2023
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Apr 19, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Apr 19, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Apr 20, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Apr 20, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473

(cherry picked from commit 5e1efae)
@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. and removed meeting topics for meetings labels Apr 20, 2023
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Apr 20, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473

(cherry picked from commit 5e1efae)
@dustymabe
Copy link
Member

The fix for this went into testing stream release 38.20230414.2.1. Please try out the new release and report issues.

@dustymabe dustymabe removed the status/pending-testing-release Fixed upstream. Waiting on a testing release. label Apr 21, 2023
@dustymabe
Copy link
Member

The fix for this went into next stream release 38.20230430.1.0. Please try out the new release and report issues.

@dustymabe dustymabe removed the status/pending-next-release Fixed upstream. Waiting on a next release. label May 3, 2023
@dustymabe
Copy link
Member

This issue never affected our stable stream.

c4rt0 pushed a commit to c4rt0/fedora-coreos-config that referenced this issue May 17, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
…igration

In this case we'll override the ConditionPathExists from
ssh-host-keys-migration.service [1] to force it to run every boot.
We want to do this to handle the case where someone could do an
upgrade->rollback->upgrade and end up locked out of their system [2].

[1] https://src.fedoraproject.org/rpms/openssh/blob/6f7c765ed4cf0e8eef664fb93b26f4ea2a14d955/f/ssh-host-keys-migration.service
[2] coreos/fedora-coreos-tracker#1473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/bug
Projects
None yet
Development

No branches or pull requests

5 participants