-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the concept of "checkpoint" commits #1228
Comments
In any case we don't use ostree's HTTP server on our infrastructure, so any kind of ref swizzling would require co-ordination with some other HTTP daemon, any CDNs/proxies/etc. I can't see that being very popular. :) |
@cgwalters, I guess this is something you have already thought about for Project Atomic? One key question is how to arrange the metadata for marking commits as checkpoints so that clients can efficiently find the sequence of checkpoints they need to download to get to the latest commit. How about an So to update, the client would pull the This would be complicated slightly by the (probable) need to reboot for each checkpoint we pull, but the concept should still work. *Assuming we’re past the bootstrapping process which Cosimo mentioned. |
Another approach would be to use something like the (Although I’ve just noticed: the code in |
The idea of checkpoints is an interesting one, and I like that it provides natural static-delta targets. That said, specific to this use case, couldn't the updater just pull the latest commit as usual and perform the migration at boot time? |
We want to download the new flatpaks as part of the same pull process as for the OS, and deploy them at the same time as the new OS OSTree. i.e. The whole thing needs to be atomic. We do not want any period of time where the user has the new OS (without the bundled, say, LibreOffice) but doesn’t have the new flatpaks (which would make for a very angry LibreOffice user). |
/me curious - how are you achieving the 'bundling' ? |
?? I dearly hope no one is actually using the There's basically two parts to this - the mechanics of booting into a checkpoint, and actually implementing the transition. For the first part, I assume what you were thinking here is having the HTTP server do dynamic dispatch based on e.g. the OTOH, the way I'd have considered approaching this would be using distinct refs - the client system updates to So rather than having the libostree pull code follow the To make it truly atomic gets a little tricky as you'd likely need to e.g. make some changes in |
(I'm elaborating on/agreeing with what @jlebon said here - doing this at boot time seems to me to be the most powerful/flexible way to do it) |
Our implementation is not truly atomic because we rely on a boot task to go from a deployed flatpak to an exported one (ie the user can see ti). So we're closing the majority of failure cases that get but not eliminating it, because we'd prefer to not have the app appear twice under the old OS version if there is a failure in the OS update. However that's by-the-by because what we need is the updater (which does the downloading and deploying of the OS, and the flatpaks) to ensure that both downloads and deploys are successfully completed before the active OS is switched over, hence the checkpoint requirement. |
@cgwalters: OOI, has Project Atomic hit this problem (needing some kind of checkpoint in an update stream) before? I’d be surprised if not — if you can essentially support upgrading directly from version 0 of Project Atomic through to now. I guess the main difference I can think of between the two potential approaches here (tagging checkpoint commits using metadata; or tagging them as the last commit on a ref before changing the ref name) is how they interact with LAN sharing of updates, and caching of checkpoints to serve to other peers on the LAN. If checkpoints are tagged in the commit metadata, libostree needs to be aware of the caching policy so that it doesn’t prune the old checkpoint commits, since they might be needed to serve to peers at some point in the future. If checkpoints are tagged by renaming the ref, those checkpoint commits are only pruned if the old refs are deleted, which means that the caching policy moves up to a higher layer — it’s now controlled by the system updater ( That leads me to lean towards implementing checkpoints by renaming refs, which needs less support from libostree, and mostly just needs to be implemented in our updater. However, I would still be very interested in knowing what Project Atomic does (or might eventually do), so that we don’t end up diverging unnecessarily. The downside of implementing checkpoints by renaming refs, though, is that going back in time (pulling and deploying an old version of the OS) is a bit harder, since you have to undo the ref rename. This could probably be mitigated by putting some metadata in the first commit of the new ref which points back to the old ref; just like the last commit of the old ref will point to the new ref. |
Not really no (well at least if we're talking about the CentOS/RHEL Atomic Host). For fun I tried booting
Then started a basic nginx container, which stays running while I
Container is still there and still works. Remember our audience is servers with active sysadmins - so there are potential transitions like to overlayfs but they know how to handle these types of things. |
I think this is the approach to take. Having it be in the commit stream makes it feel a lot more "invisible/magical". |
So for Red Hat Enterprise Linux Atomic Host (and similarly CentOS AH) we simply haven't done a transition of this form. However, for Fedora, we do require admins to explicitly rebase - this matches the general current Fedora model. However, we are talking about more of a "single stream" experience there; see this issue for example. |
Just an update on this for completeness: for Fedora CoreOS and RHCOS, we implemented this using metadata describing a graph of permissible upgrade paths fed to higher-level software which in turn drives rpm-ostree. Relevant projects: https://github.com/coreos/fedora-coreos-cincinnati/, https://github.com/openshift/cincinnati/, https://github.com/coreos/zincati/, https://github.com/openshift/machine-config-operator. This does lead to interesting problems though. |
Sometimes a new OS version contains an incompatible change that requires an new version of the system updater in order to be completed.
One occurrence that we found here at Endless is when we move out some files from the OSTree (e.g. we start using Flatpak to distribute an app that was previously part of the core OS); we have a new version of the updater that knows how to perform the migration, but users coming from previous versions won't have the migration code available and would be left in a new tree without the app at all.
The implementation idea I have in mind for this is to add the concept of "mandatory checkpoint" commits in OSTree: when going from commit X to commit Y, if a commit M between them is marked as mandatory, OSTree/eos-updater would make sure to first deploy M before going to Y. In our case, M would contain the new version of the updater, capable of downloading new flatpaks. A nice byproduct of this approach is that we can also easily use M as a landmark to make deltas to/from.
Another interesting question is how to bootstrap this process, since a checkpoint-aware updater must exist to perform the first migration. Right now we are planning to solve that by adding a new ref (e.g. if
stable
was the current OS ref, this would bestable-v2
) and making the last commit ofstable
the one that introduces the checkpoint-aware updater. A different idea would be to have the server return a different commit checksum for the same ref depending on client capabilities/version, but I know that at least in the past there was a strong desire to keep the HTTP server "dumb".CC @ramcq @pwithnall
The text was updated successfully, but these errors were encountered: