Dynamically sized bare-metal image can clobber data partition on reprovision #586

bgilbert · 2020-07-29T14:56:50Z

Bug Report

Expected Behavior

Successful reprovisioning of a machine from a newer OS image, without loss of data on user-created data partitions.

Actual Behavior

Data loss or provisioning failure.

Reproduction Steps

Install FCOS or RHCOS. Use an Ignition config to create a data partition immediately following the root filesystem.
Reprovision FCOS or RHCOS from a newer image with a larger root filesystem.

Other Information

If existing data partitions are preserved using the functionality of coreos/coreos-installer#321, reinstall will fail because coreos-installer will prevent the new install image from clobbering preserved partitions. If instead they're preserved by rerunning the Ignition config to recreate the partitions, data loss will occur when the installer writes the larger image over the beginning of the data partition.

If the Ignition config dynamically sets the starting offset of the partition (via a zero or missing startMiB), data loss can also appear to occur if the new image is smaller than the previous one.

Currently we dynamically set the size of the bare-metal root partition, but that size is a contract with the user. We can't avoid such a contract entirely (unless we disallow adding partitions to the boot disk) but we could separate it conceptually from the image size. That would involve 1) documenting a minimum starting offset for user data partitions, and 2) enforcing it with FCCT. That approach seems brittle, though, especially since not everyone uses FCCT.

Instead, I'd be in favor of switching back to a fixed-size bare metal image. We'd need to make that fixed size equal to what we're using now, without additional room for future expansion, unless we're prepared to clobber user data partitions in the process.

The text was updated successfully, but these errors were encountered:

bgilbert · 2020-08-26T16:35:16Z

Discussed at a community meeting. Notes:

Ignition runs too late to detect the problem. coreos-installer could detect an existing *COS root partition, as a special case, and fail if the install image tries to overwrite any subsequent partition. But that only detects the problem; it doesn't allow successfully reprovisioning nodes.

If we commit to a long-term cap on the root FS size, it can still be changed later, but that's a breaking change and would need an announcement and deprecation period. People with nontrivial data on separate partitions probably have large disks and wouldn't mind allocating e.g. 20G to the OS. In practice we need ~2x the OS size to support Fedora major version bumps; we don't need to factor in package layering and user data because the user can make the root filesystem larger than the minimum. Should we set the cap at 5 GiB? 10 GiB?

Possible approaches:

Make the partition larger than the filesystem and larger than the disk image. Seems dangerous.
Ship a root partition with a well-defined (larger) size and accept the additional I/O during install.
Support partition resizing in Ignition, ship a small root partition, and have the base Ignition config automatically resize it to a well-defined minimum size before creating any user-specified partitions.

dustymabe · 2020-08-31T15:25:50Z

Discussed at a community meeting. Notes:

Ignition runs too late to detect the problem. coreos-installer could detect an existing *COS root partition, as a special case, and fail if the install image tries to overwrite any subsequent partition. But that only detects the problem; it doesn't allow successfully reprovisioning nodes.

If we commit to a long-term cap on the root FS size, it can still be changed later, but that's a breaking change and would need an announcement and deprecation period. People with nontrivial data on separate partitions probably have large disks and wouldn't mind allocating e.g. 20G to the OS. In practice we need ~2x the OS size to support Fedora major version bumps; we don't need to factor in package layering and user data because the user can make the root filesystem larger than the minimum. Should we set the cap at 5 GiB? 10 GiB?

Between those two I would choose 10 😄

Possible approaches:

Make the partition larger than the filesystem and larger than the disk image. Seems dangerous.

Is there a reason we'd need to make the partition larger than the disk image? Why not just make the disk image larger too and just make it all sparse so we're not actually transfering more bits. IIUC this would remove the "danger" from this approach. Maybe there is something I'm missing.

Ship a root partition with a well-defined (larger) size and accept the additional I/O during install.

Safest, but least ideal probably.

Support partition resizing in Ignition, ship a small root partition, and have the base Ignition config automatically resize it to a well-defined minimum size before creating any user-specified partitions.

I worry about this one a bit. I'm not worried about coreos/ignition#924 as that seems reasonable, but I do worry about doing this by default for the root filesystem on every ignition boot (first provisioning). Seems like we'd need to be extra careful to consider all of the ways people muck with their disk image during install today and make sure this doesn't introduce issues.

bgilbert · 2020-09-15T03:52:09Z

Is there a reason we'd need to make the partition larger than the disk image? Why not just make the disk image larger too and just make it all sparse so we're not actually transfering more bits.

That doesn't actually help. The image is compressed and the extra space is mostly zeroes, so we're not transferring much more data either way. But we still have to write all the zeroes to disk, because we can't risk leaving some non-zeroed bits in zeroed blocks; those blocks might be significant to the filesystem (e.g. a free bitmap). And coreos-installer doesn't know which part of the partition is past the end of the filesystem.

Support partition resizing in Ignition, ship a small root partition, and have the base Ignition config automatically resize it to a well-defined minimum size before creating any user-specified partitions.

Seems like we'd need to be extra careful to consider all of the ways people muck with their disk image during install today and make sure this doesn't introduce issues.

Hmm. What sorts of failure cases are you concerned about?

jlebon · 2020-09-22T15:35:10Z

IMO, option 3 is the cleanest approach.

Related: coreos/fedora-coreos-config#604

That PR will make coreos-growpart not do any partition resizing if the rootfs partition size is any different before and after the disks stage (implying the user has resized it already via Ignition). Obviously we'll need to rework that if we go this route here.

bnordgren · 2021-01-17T20:48:49Z

Seems like we'd need to be extra careful to consider all of the ways people muck with their disk image during install today and make sure this doesn't introduce issues.

Hmm. What sorts of failure cases are you concerned about?

So, for one, I think the defaults should result in a working system. I missed the "start_mib: 5000" warning on one of the documentation pages. I have a 16Tb Raid5 mounted on /var; one of the three members of this raid happens to be the boot disk. The only partitions specified in the fcc file were for the RAID, and all had start_mib:0 size_mib:0. Something else took care of creating the root partition, boot partition, etc. behind the scenes.

That "something else" created a 2.7Gb root partition, which of course means that updates are failing now for lack of room.

If the tooling is going to start creating partitions not specified in the ignition file, it should at least create something that meets known minimum requirements...

I think I may have been reading "other" documentation instead of the simple examples...for instance:

start_mib (integer, optional): Start position of the partition from the beginning of the disk, in mebibytes. If 0, the partition will be placed at the start of the largest block available.

I guess I kind of assumed that "largest block available" would have been calculated after a minimum root partition was allocated...

Whoops. Rescue CD time. I've never tried to shrink a software raid before. Much less tried to move the start block of one of the members. Probably the easiest thing to do is to fail the member that rides on the root disk and re-add it after editing the partition table.

bgilbert · 2021-02-11T05:58:08Z

We discussed this during the community meeting today. Notes:

We've now had a case (Upgrade fails due to minimum-free-space-percent exceeding 3% #731) where someone was bitten by this
Adding FCCT warnings
- In the general case, warnings are difficult, since FCCT doesn't know the device node for the boot device. /dev/disk/by-* symlinks don't work because the well-defined ones point to partitions, and storage.disks.[].device points to the whole-disk device.
- ACTION: We'll try to modify FCCT to at least check the case where the root filesystem is recreated in the config
Bad-config detection in the initrd
- The initrd could check for a bad config after Ignition runs (too-small root partition followed by another partition)
- Could log a warning to the journal, add a line to the MOTD, and/or fail boot
- ACTION: We'll add a warning and/or MOTD line first, notify coreos-status@, provide a migration period, and then switch to a hard failure
Re option 3, we now have resize support, but there are problems:
- The base config doesn't know the device node for the boot disk without external help
- Resize would fail if the target disk isn't large enough for the minimum root partition size, even if there's no partition created after it, and we might not want that case to fail
- A default partition section could merge poorly with a user config that specifies a different layout for the disk (including via FCCT sugar), or specifies the disk via a different device node symlink
It'd be good to improve docs around the minimum root partition size, if we can find a good place to do it

We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)

jlebon · 2021-02-11T21:35:06Z

ACTION: We'll add a warning and/or MOTD line first, notify coreos-status@, provide a migration period, and then switch to a hard failure

First part of this done in coreos/fedora-coreos-config#850. We should probably discuss migration period before merging that so we can include it in the warning message. Let's say... 3 months?

Also, I think we need an escape hatch to be safe. One idea is to require the Ignition config to explicitly specify the rootfs size if users want it less than 5G. (The issue right now is that specifying any additional partition doesn't make it obvious that it's trapping the rootfs, but requiring users to type it out will make them realize that's probably not what they want.) In that case, we assume that's what the user really wants. This also keys into fcct emitting a warning for that case.

We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)

bgilbert · 2021-02-15T22:04:18Z

3 months SGTM.

We're implicitly committing to a 5 GiB minimum and a 5 GiB recommendation. Is that what we want? Should we set e.g. a 5 GiB minimum and a 10 GiB recommendation?

I'm okay with the escape hatch you described. However, echoing what I said in coreos/fedora-coreos-config#850 (comment), I don't think it's reasonable to ship a small disk image and then fail the boot by default if the user doesn't expand it. One option is to only fail the boot if the user has explicitly hemmed in the rootfs partition, but not if the user has retained the default partition layout on a small disk. Another option is to just ship a larger image. 😛

We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)

jlebon · 2021-02-22T17:01:37Z

We're implicitly committing to a 5 GiB minimum and a 5 GiB recommendation. Is that what we want? Should we set e.g. a 5 GiB minimum and a 10 GiB recommendation?

Good point. It's tricky though because it really depends on what the workload and partitioning is going to be. E.g. 8 GiB should be really safe if all of /var is a separate partition. Maybe something like: "We recommend setting aside at least 8 GiB of the root partition for the OS itself. You should allocate more space above this based on the requirements of the intended workload." ?

jlebon · 2021-02-24T17:45:59Z

This was discussed in the community meeting:

#agreed Set a single limit of 8 GiB: we error if the rootfs is trapped with less than that, but only warn if there are no following partition

The rationale for a single number is to keep it simple.

We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)

jlebon · 2021-03-05T18:56:08Z

Wanted to circle back to option 3.

Re option 3, we now have resize support, but there are problems:

The base config doesn't know the device node for the boot disk without external help

The boot disk symlink idea I think resolves this now.

Resize would fail if the target disk isn't large enough for the minimum root partition size, even if there's no partition created after it, and we might not want that case to fail

We wouldn't hardcode a specific size in the drop-in though, just resize: true, right?

A default partition section could merge poorly with a user config that specifies a different layout for the disk (including via FCCT sugar), or specifies the disk via a different device node symlink

Yeah, that's a tricky one. Basically, it's missing the "policy" bits from coreos-growpart which ensure we don't growpart if root reprovisioning is detected.

So one half-baked here is simply to put that logic in transposefs: if we detect that the rootfs is being reprovisioned (not in-place), we delete the drop-in.

bgilbert · 2021-03-06T06:31:01Z

The boot disk symlink idea I think resolves this now.

That's only true if we only allow the user to partition the boot disk via the symlink, which would be a compat break.

We wouldn't hardcode a specific size in the drop-in though, just resize: true, right?

The goal is to allow the user to specify a config like this, right?

storage:
  disks:
    - device: /dev/vda
      partitions:
        - label: data

So after merging with the base config, they'd get:

storage:
  disks:
    - device: /dev/vda
      partitions:
        - number: 4
          label: root
          resize: true
        - label: data

...which doesn't actually force a minimum size. Under the current implementation, this config produces a root partition with the original size.

So one half-baked here is simply to put that logic in transposefs: if we detect that the rootfs is being reprovisioned (not in-place), we delete the drop-in.

Any child config really needs to understand what its parent is doing (or, at least, that its actions don't conflict with its parent) to avoid unintended results. I don't think it's great to have a base config that changes depending on what the child config is doing.

bgilbert · 2021-03-06T08:19:29Z

The boot symlink idea is #759.

jlebon · 2021-03-08T17:00:11Z

...which doesn't actually force a minimum size

Ack OK, sorry for the confusion here. I was strictly focused on the "fold growpart into Ignition" bit.

Any child config really needs to understand what its parent is doing (or, at least, that its actions don't conflict with its parent) to avoid unintended results.

We could flip it around: by default there's no base config drop-in, and we dynamically generate one if we detect that we want to grow the root partition.

I don't think it's great to have a base config that changes depending on what the child config is doing.

Honestly, I don't think it's so bad. What we're essentially doing is codifying shell code into Ignition config drop-ins so that Ignition does the work we would've done anyway. There's lots of advantages to this, including (1) keeping all the knowledge about block devices and filesystems in Ignition, (2) sizing LUKS containers and filesystems at the right size from the start, and (3) less shell code.

jlebon · 2021-03-08T17:10:23Z

So concretely, it'd be something like:

fetch stage runs
transposefs runs:
- if reprovisioning the rootfs, saves it to RAM
- if not reprovisioning, or in-place reprovisioning, generate a base Ignition config drop-in which grows the rootfs either to the end, or to at least MIN_SIZE if another partition is specified
disks stage runs
growfs purely handles filesystem growing in the no-reprovisioning case (and maybe in the future, we teach Ignition to grow filesystems too, so then this would also be handled by the disks stage)

bgilbert · 2021-03-09T04:34:22Z

Okay, so your proposal is:

If not reprovisioning the rootfs, transposefs walks through the Ignition config, resolves every storage.disks.device path back to a device node, and compares it to the device node for the root disk. If it finds a match, it dynamically generates an Ignition dropin for partition 4 of that device that sets the resize flag, and also a sizeMiB if the disk specifies an additional partition. If it doesn't find a match, it generates a dropin that only sets resize on partition 4 of the boot disk. We document that on FCOS, resize for the root partition will default to true under certain conditions, and size will default to MIN_SIZE under a subset of those conditions.
growfs still exists, solely for filesystem resizing, and runs only in the no-reprovision case. Alternatively, we teach Ignition to resize filesystems, which is functionality which in practice will only be needed for the root filesystem, but which requires it to know how to resize all supported filesystem types.

There's lots of advantages to this, including (1) keeping all the knowledge about block devices and filesystems in Ignition, (2) sizing LUKS containers and filesystems at the right size from the start, and (3) less shell code.

(1) isn't really true, since transposefs and maybe growfs would still be doing a bunch of partition and filesystem manipulation. We're already planning to address (2) via #759. And re (3), I'm not sure the above pile of JSON manipulation in shell would be a net win.

jlebon · 2021-03-09T17:22:41Z

Thanks for the sanity-check. Wanted to think through this idea.

jlebon · 2021-03-10T16:20:33Z

Was talking to @bgilbert about this. The situation now is: we've decided to put the burden of sizing the rootfs correctly at provisioning time. With the plan above to have a specific size as a contract and additional bits to warn (and eventually error in some cases) to catch violations, this issue will be considered resolved.

jlebon · 2021-04-12T15:51:59Z

I suggest leaving this issue open to track the move to erroring out.

bgilbert · 2021-06-08T23:08:22Z

The console warning from coreos/fedora-coreos-config#850 says:

############################################################################
WARNING: The root filesystem is too small. It is strongly recommended to
allocate at least 8 GiB of space to allow for upgrades. From June 2021, this
condition will trigger a failure in some cases. For more information, see:
https://docs.fedoraproject.org/en-US/fedora-coreos/storage/

You may delete this warning using:
sudo rm /etc/motd.d/60-coreos-rootfs-size.motd
############################################################################

It's June 2021, so I think we can start failing.

dustymabe · 2021-07-28T16:26:47Z

It's June 2021, so I think we can start failing.

where do we make that change?

dustymabe · 2021-10-07T21:27:14Z

bump

also, what happens if I'm on an SD card on a RPi4 that is 8G - hard fail from now on?

jlebon · 2021-10-28T13:58:48Z

bump

also, what happens if I'm on an SD card on a RPi4 that is 8G - hard fail from now on?

Only a warning. We hard fail if the rootfs is smaller than 8G and partitions were added after it (trapping the rootfs). Check out the box in https://docs.fedoraproject.org/en-US/fedora-coreos/storage/.

We've recently had our first case of a "trapped" rootfs running out of space for upgrades: coreos/fedora-coreos-tracker#731 Until we actually implement stronger behaviour for this, let's explicitly check for this case and emit a warning if we detect it. In the future, we'll look at making this a hard error by default (with an escape hatch). For more information, see: coreos/fedora-coreos-tracker#586 (comment)

bgilbert added kind/bug area/stability format/raw-disk labels Jul 29, 2020

dustymabe added the meeting topics for meetings label Jul 29, 2020

bgilbert removed the meeting topics for meetings label Aug 6, 2020

bgilbert added the jira for syncing to jira label Sep 10, 2020

bgilbert mentioned this issue Oct 1, 2020

Updated bare metal install for coreos-installer openshift/openshift-docs#25709

Merged

jlebon mentioned this issue Feb 3, 2021

Upgrade fails due to minimum-free-space-percent exceeding 3% #731

Closed

jlebon mentioned this issue Feb 11, 2021

overlay: add warning if rootfs < 8G coreos/fedora-coreos-config#850

Merged

bgilbert added the meeting topics for meetings label Feb 17, 2021

jlebon mentioned this issue Feb 22, 2021

storage: tweak note re. rootfs sizing coreos/fedora-coreos-docs#248

Merged

travier removed the meeting topics for meetings label Feb 24, 2021

jlebon mentioned this issue Mar 5, 2021

Rename coreos-growpart to ignition-ostree-growfs and teach it to grow LUKS containers coreos/fedora-coreos-config#878

Merged

jlebon mentioned this issue Mar 5, 2021

Root Filesystem Expansion #570

Open

bgilbert mentioned this issue Mar 10, 2021

Warn if the root filesystem is too small coreos/butane#211

Open

lucab mentioned this issue Apr 1, 2021

image.yaml: bump virtual disk images to 10G coreos/fedora-coreos-config#888

Merged

bgilbert mentioned this issue Jun 12, 2021

Document automated reprovisioning for bare metal coreos/fedora-coreos-docs#299

Open

bgilbert mentioned this issue Oct 18, 2021

Change EFI-System partition format from fat16 to fat32 #993

Open

bgilbert added the status/pending-action Needs action label Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically sized bare-metal image can clobber data partition on reprovision #586

Dynamically sized bare-metal image can clobber data partition on reprovision #586

bgilbert commented Jul 29, 2020

bgilbert commented Aug 26, 2020

dustymabe commented Aug 31, 2020

bgilbert commented Sep 15, 2020

jlebon commented Sep 22, 2020

bnordgren commented Jan 17, 2021

bgilbert commented Feb 11, 2021

jlebon commented Feb 11, 2021 •

edited

Loading

bgilbert commented Feb 15, 2021

jlebon commented Feb 22, 2021

jlebon commented Feb 24, 2021

jlebon commented Mar 5, 2021

bgilbert commented Mar 6, 2021

bgilbert commented Mar 6, 2021

jlebon commented Mar 8, 2021

jlebon commented Mar 8, 2021

bgilbert commented Mar 9, 2021

jlebon commented Mar 9, 2021

jlebon commented Mar 10, 2021

jlebon commented Apr 12, 2021

bgilbert commented Jun 8, 2021

dustymabe commented Jul 28, 2021

dustymabe commented Oct 7, 2021 •

edited

Loading

jlebon commented Oct 28, 2021

Dynamically sized bare-metal image can clobber data partition on reprovision #586

Dynamically sized bare-metal image can clobber data partition on reprovision #586

Comments

bgilbert commented Jul 29, 2020

Bug Report

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

bgilbert commented Aug 26, 2020

dustymabe commented Aug 31, 2020

bgilbert commented Sep 15, 2020

jlebon commented Sep 22, 2020

bnordgren commented Jan 17, 2021

bgilbert commented Feb 11, 2021

jlebon commented Feb 11, 2021 • edited Loading

bgilbert commented Feb 15, 2021

jlebon commented Feb 22, 2021

jlebon commented Feb 24, 2021

jlebon commented Mar 5, 2021

bgilbert commented Mar 6, 2021

bgilbert commented Mar 6, 2021

jlebon commented Mar 8, 2021

jlebon commented Mar 8, 2021

bgilbert commented Mar 9, 2021

jlebon commented Mar 9, 2021

jlebon commented Mar 10, 2021

jlebon commented Apr 12, 2021

bgilbert commented Jun 8, 2021

dustymabe commented Jul 28, 2021

dustymabe commented Oct 7, 2021 • edited Loading

jlebon commented Oct 28, 2021

jlebon commented Feb 11, 2021 •

edited

Loading

dustymabe commented Oct 7, 2021 •

edited

Loading