Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockdev: specifically wait for /boot to show up #155

Closed
wants to merge 1 commit into from
Closed

blockdev: specifically wait for /boot to show up #155

wants to merge 1 commit into from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Feb 12, 2020

We want to be resilient to systems where the kernel may take a while
to send the updates out, or udevd hasn't reacted to them yet, or there's
some other IO stuttering.

So we keep running udevadm settle until the boot partition shows up.
Note we're not actually waiting a long time in the loop itself in total
(just 1s). The real waiting happens at udevadm settle (which doesn't
really do anything else than wait for udev rules to be processed). The
loop here is just to have a better chance to have udevadm settle watch
for the actual udev events we care about (see the related comment about
this that I deleted).

See also: coreos/fedora-coreos-tracker#385

We want to be resilient to systems where the kernel may take a while
to send the updates out, or udevd hasn't reacted to them yet, or there's
some other IO stuttering.

So we keep running `udevadm settle` until the boot partition shows up.
Note we're not actually waiting a long time in the loop itself in total
(just 1s). The real waiting happens at `udevadm settle` (which doesn't
really _do_ anything else than wait for udev rules to be processed). The
loop here is just to have a better chance to have `udevadm settle` watch
for the actual udev events we care about (see the related comment about
this that I deleted).

See also: https://github.com/coreos/coreos-installer/issues/154
@jlebon
Copy link
Member Author

jlebon commented Feb 12, 2020

(Haven't tested this yet...)

sleep(Duration::from_millis(100));
}

return Err("timed out waiting for boot partition to show up".into());
Copy link
Member

@cgwalters cgwalters Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're waiting a total of 1s? Maybe add that to the error; something like

let maxwait = Duration::from_secs(1);
for dur in (0..maxwait/10) {
   ...
   sleep(dur)
}
return Err("timed out waiting {} for partition label '{}' of device {}", maxwait, label, device);

@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> {
ioctl_none!(blkrrpart, 0x12, 95);
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]);

pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preserve the comment that was deleted here?

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notwithstanding that this wasn't the cause of coreos/fedora-coreos-tracker#385, I think this is a moderate improvement to reduce the chance of a possible race.

@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> {
ioctl_none!(blkrrpart, 0x12, 95);
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]);

pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> {
eprint!("Waiting for partition label {} to come online", label);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to report this to the user (and I'm not 100% on that) it should be in write_disk() (and an eprintln!) rather than in utility code.

Probably quotes around the {}, too.


// postprocess
if ignition.is_some() || config.firstboot_kargs.is_some() || config.platform.is_some() {
udev_settle_until_partition_appears(&config.device, "boot")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not hardcode the partition label both inline in blockdev.rs and inline here.

@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> {
ioctl_none!(blkrrpart, 0x12, 95);
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]);

pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> {
eprint!("Waiting for partition label {} to come online", label);
for _ in (0..10).rev() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this, we might as well increase the number of iterations. Timeout leads to global failure anyway, and most systems should return on the first iteration.

@jlebon
Copy link
Member Author

jlebon commented Aug 3, 2021

I think this may still be worth doing, but I'd rather wait until we actually see this race in the wild. Let's close for now and we can revive then.

@jlebon jlebon closed this Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants