-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically sized bare-metal image can clobber data partition on reprovision #586
Comments
Discussed at a community meeting. Notes: Ignition runs too late to detect the problem. coreos-installer could detect an existing *COS root partition, as a special case, and fail if the install image tries to overwrite any subsequent partition. But that only detects the problem; it doesn't allow successfully reprovisioning nodes. If we commit to a long-term cap on the root FS size, it can still be changed later, but that's a breaking change and would need an announcement and deprecation period. People with nontrivial data on separate partitions probably have large disks and wouldn't mind allocating e.g. 20G to the OS. In practice we need ~2x the OS size to support Fedora major version bumps; we don't need to factor in package layering and user data because the user can make the root filesystem larger than the minimum. Should we set the cap at 5 GiB? 10 GiB? Possible approaches:
|
Between those two I would choose 10 😄
Is there a reason we'd need to make the partition larger than the disk image? Why not just make the disk image larger too and just make it all sparse so we're not actually transfering more bits. IIUC this would remove the "danger" from this approach. Maybe there is something I'm missing.
Safest, but least ideal probably.
I worry about this one a bit. I'm not worried about coreos/ignition#924 as that seems reasonable, but I do worry about doing this by default for the root filesystem on every ignition boot (first provisioning). Seems like we'd need to be extra careful to consider all of the ways people muck with their disk image during install today and make sure this doesn't introduce issues. |
That doesn't actually help. The image is compressed and the extra space is mostly zeroes, so we're not transferring much more data either way. But we still have to write all the zeroes to disk, because we can't risk leaving some non-zeroed bits in zeroed blocks; those blocks might be significant to the filesystem (e.g. a free bitmap). And coreos-installer doesn't know which part of the partition is past the end of the filesystem.
Hmm. What sorts of failure cases are you concerned about? |
IMO, option 3 is the cleanest approach. Related: coreos/fedora-coreos-config#604 That PR will make |
So, for one, I think the defaults should result in a working system. I missed the "start_mib: 5000" warning on one of the documentation pages. I have a 16Tb Raid5 mounted on /var; one of the three members of this raid happens to be the boot disk. The only partitions specified in the fcc file were for the RAID, and all had That "something else" created a 2.7Gb root partition, which of course means that updates are failing now for lack of room. If the tooling is going to start creating partitions not specified in the ignition file, it should at least create something that meets known minimum requirements... I think I may have been reading "other" documentation instead of the simple examples...for instance:
I guess I kind of assumed that "largest block available" would have been calculated after a minimum root partition was allocated... Whoops. Rescue CD time. I've never tried to shrink a software raid before. Much less tried to move the start block of one of the members. Probably the easiest thing to do is to fail the member that rides on the root disk and re-add it after editing the partition table. |
We discussed this during the community meeting today. Notes:
|
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
First part of this done in coreos/fedora-coreos-config#850. We should probably discuss migration period before merging that so we can include it in the warning message. Let's say... 3 months? Also, I think we need an escape hatch to be safe. One idea is to require the Ignition config to explicitly specify the rootfs size if users want it less than 5G. (The issue right now is that specifying any additional partition doesn't make it obvious that it's trapping the rootfs, but requiring users to type it out will make them realize that's probably not what they want.) In that case, we assume that's what the user really wants. This also keys into |
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
3 months SGTM. We're implicitly committing to a 5 GiB minimum and a 5 GiB recommendation. Is that what we want? Should we set e.g. a 5 GiB minimum and a 10 GiB recommendation? I'm okay with the escape hatch you described. However, echoing what I said in coreos/fedora-coreos-config#850 (comment), I don't think it's reasonable to ship a small disk image and then fail the boot by default if the user doesn't expand it. One option is to only fail the boot if the user has explicitly hemmed in the rootfs partition, but not if the user has retained the default partition layout on a small disk. Another option is to just ship a larger image. 😛 |
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
Good point. It's tricky though because it really depends on what the workload and partitioning is going to be. E.g. 8 GiB should be really safe if all of |
This was discussed in the community meeting:
The rationale for a single number is to keep it simple. |
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
Wanted to circle back to option 3.
The boot disk symlink idea I think resolves this now.
We wouldn't hardcode a specific size in the drop-in though, just
Yeah, that's a tricky one. Basically, it's missing the "policy" bits from So one half-baked here is simply to put that logic in transposefs: if we detect that the rootfs is being reprovisioned (not in-place), we delete the drop-in. |
That's only true if we only allow the user to partition the boot disk via the symlink, which would be a compat break.
The goal is to allow the user to specify a config like this, right? storage:
disks:
- device: /dev/vda
partitions:
- label: data So after merging with the base config, they'd get: storage:
disks:
- device: /dev/vda
partitions:
- number: 4
label: root
resize: true
- label: data ...which doesn't actually force a minimum size. Under the current implementation, this config produces a root partition with the original size.
Any child config really needs to understand what its parent is doing (or, at least, that its actions don't conflict with its parent) to avoid unintended results. I don't think it's great to have a base config that changes depending on what the child config is doing. |
The boot symlink idea is #759. |
Ack OK, sorry for the confusion here. I was strictly focused on the "fold growpart into Ignition" bit.
We could flip it around: by default there's no base config drop-in, and we dynamically generate one if we detect that we want to grow the root partition.
Honestly, I don't think it's so bad. What we're essentially doing is codifying shell code into Ignition config drop-ins so that Ignition does the work we would've done anyway. There's lots of advantages to this, including (1) keeping all the knowledge about block devices and filesystems in Ignition, (2) sizing LUKS containers and filesystems at the right size from the start, and (3) less shell code. |
So concretely, it'd be something like:
|
Okay, so your proposal is:
(1) isn't really true, since transposefs and maybe growfs would still be doing a bunch of partition and filesystem manipulation. We're already planning to address (2) via #759. And re (3), I'm not sure the above pile of JSON manipulation in shell would be a net win. |
Thanks for the sanity-check. Wanted to think through this idea. |
Was talking to @bgilbert about this. The situation now is: we've decided to put the burden of sizing the rootfs correctly at provisioning time. With the plan above to have a specific size as a contract and additional bits to warn (and eventually error in some cases) to catch violations, this issue will be considered resolved. |
I suggest leaving this issue open to track the move to erroring out. |
The console warning from coreos/fedora-coreos-config#850 says:
It's June 2021, so I think we can start failing. |
where do we make that change? |
bump also, what happens if I'm on an SD card on a RPi4 that is 8G - hard fail from now on? |
Only a warning. We hard fail if the rootfs is smaller than 8G and partitions were added after it (trapping the rootfs). Check out the box in https://docs.fedoraproject.org/en-US/fedora-coreos/storage/. |
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)
Bug Report
Expected Behavior
Successful reprovisioning of a machine from a newer OS image, without loss of data on user-created data partitions.
Actual Behavior
Data loss or provisioning failure.
Reproduction Steps
Other Information
If existing data partitions are preserved using the functionality of coreos/coreos-installer#321, reinstall will fail because coreos-installer will prevent the new install image from clobbering preserved partitions. If instead they're preserved by rerunning the Ignition config to recreate the partitions, data loss will occur when the installer writes the larger image over the beginning of the data partition.
If the Ignition config dynamically sets the starting offset of the partition (via a zero or missing
startMiB
), data loss can also appear to occur if the new image is smaller than the previous one.Currently we dynamically set the size of the bare-metal root partition, but that size is a contract with the user. We can't avoid such a contract entirely (unless we disallow adding partitions to the boot disk) but we could separate it conceptually from the image size. That would involve 1) documenting a minimum starting offset for user data partitions, and 2) enforcing it with FCCT. That approach seems brittle, though, especially since not everyone uses FCCT.
Instead, I'd be in favor of switching back to a fixed-size bare metal image. We'd need to make that fixed size equal to what we're using now, without additional room for future expansion, unless we're prepared to clobber user data partitions in the process.
The text was updated successfully, but these errors were encountered: