-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockdev: specifically wait for /boot to show up #155
Conversation
We want to be resilient to systems where the kernel may take a while to send the updates out, or udevd hasn't reacted to them yet, or there's some other IO stuttering. So we keep running `udevadm settle` until the boot partition shows up. Note we're not actually waiting a long time in the loop itself in total (just 1s). The real waiting happens at `udevadm settle` (which doesn't really _do_ anything else than wait for udev rules to be processed). The loop here is just to have a better chance to have `udevadm settle` watch for the actual udev events we care about (see the related comment about this that I deleted). See also: https://github.com/coreos/coreos-installer/issues/154
(Haven't tested this yet...) |
sleep(Duration::from_millis(100)); | ||
} | ||
|
||
return Err("timed out waiting for boot partition to show up".into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're waiting a total of 1s? Maybe add that to the error; something like
let maxwait = Duration::from_secs(1);
for dur in (0..maxwait/10) {
...
sleep(dur)
}
return Err("timed out waiting {} for partition label '{}' of device {}", maxwait, label, device);
@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> { | |||
ioctl_none!(blkrrpart, 0x12, 95); | |||
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]); | |||
|
|||
pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve the comment that was deleted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notwithstanding that this wasn't the cause of coreos/fedora-coreos-tracker#385, I think this is a moderate improvement to reduce the chance of a possible race.
@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> { | |||
ioctl_none!(blkrrpart, 0x12, 95); | |||
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]); | |||
|
|||
pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> { | |||
eprint!("Waiting for partition label {} to come online", label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to report this to the user (and I'm not 100% on that) it should be in write_disk()
(and an eprintln!
) rather than in utility code.
Probably quotes around the {}
, too.
|
||
// postprocess | ||
if ignition.is_some() || config.firstboot_kargs.is_some() || config.platform.is_some() { | ||
udev_settle_until_partition_appears(&config.device, "boot")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not hardcode the partition label both inline in blockdev.rs
and inline here.
@@ -225,6 +225,22 @@ pub fn try_discard_all(file: &mut File) -> Result<bool> { | |||
ioctl_none!(blkrrpart, 0x12, 95); | |||
ioctl_write_ptr_bad!(blkdiscard, request_code_none!(0x12, 119), [u64; 2]); | |||
|
|||
pub fn udev_settle_until_partition_appears(device: &str, label: &str) -> Result<()> { | |||
eprint!("Waiting for partition label {} to come online", label); | |||
for _ in (0..10).rev() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to do this, we might as well increase the number of iterations. Timeout leads to global failure anyway, and most systems should return on the first iteration.
I think this may still be worth doing, but I'd rather wait until we actually see this race in the wild. Let's close for now and we can revive then. |
We want to be resilient to systems where the kernel may take a while
to send the updates out, or udevd hasn't reacted to them yet, or there's
some other IO stuttering.
So we keep running
udevadm settle
until the boot partition shows up.Note we're not actually waiting a long time in the loop itself in total
(just 1s). The real waiting happens at
udevadm settle
(which doesn'treally do anything else than wait for udev rules to be processed). The
loop here is just to have a better chance to have
udevadm settle
watchfor the actual udev events we care about (see the related comment about
this that I deleted).
See also: coreos/fedora-coreos-tracker#385