Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Add ConditionPathExists=/etc/initrd-release to targets #140

Merged
merged 1 commit into from
Dec 5, 2019

Conversation

cgwalters
Copy link
Member

I noticed that several of our units actually run twice, including
ignition-files.service (!) and ignition-ostree-mount-firstboot-sysroot.service.

Looking deeper into this, I discovered a bit of the initrd that
I didn't even know existed, which is initrd-parse-etc.service
that actually runs ExecStart=-/usr/bin/systemctl --no-block start initrd-fs.target
post switch root, after services that are part of that target
have already run (and then been shut down).

This adds a big wrinkle to my understanding of system bootup.

Then I noticed that initrd-fs.target has:
ConditionPathExists=/etc/initrd-release
which will be false post-switchroot.

The semantics here are...strange. It seems like the goal is to ensure
that the rootfs is definitely still mounted, perhaps in
very complex rootfs scenarios? In our case it will still be
mounted, we definitely don't want to rerun any of our Ignition
units.

Adding the same condition to our targets ensures that they
aren't re-run post switchroot.

@cgwalters
Copy link
Member Author

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

that actually runs ExecStart=-/usr/bin/systemctl --no-block start initrd-fs.target
post switch root, after services that are part of that target
have already run (and then been shut down).

Oh wow. That's... scary. It's weird though, that unit should only be running in the initrd itself (from bootup(7)). CL actually used to have ordering wrt this in the past. E.g. ignition-files.service had:

Before=initrd-parse-etc.service
After=initrd-root-fs.target

Seems like a systemd regression put in in the real root by mistake?

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

I think there's some kind of racy thing going on. If I boot the FCOS I just built locally normally:

[root@coreos ~]# journalctl -b 0 | grep initrd-parse-etc.service
Dec 05 16:24:29 coreos systemd[1]: initrd-parse-etc.service: Succeeded.
Dec 05 16:24:31 coreos systemd[1]: initrd-parse-etc.service: Succeeded.
[root@coreos ~]# journalctl -b 0 | grep 'files passed'
Dec 05 16:24:29 coreos ignition[780]: INFO     : files: files passed
Dec 05 16:24:31 coreos ignition[780]: INFO     : files: files passed

If I stop the boot using rd.break and then resume:

[root@coreos ~]# journalctl -b 0 | grep initrd-parse-etc.service
Dec 05 16:25:17 coreos systemd[1]: initrd-parse-etc.service: Succeeded.
[root@coreos ~]# journalctl -b 0 | grep 'files passed'
Dec 05 16:25:17 coreos ignition[780]: INFO     : files: files passed

So I think this might be systemd getting confused somehow.

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

Ohhh, I think this might just be the journal getting confused. Notice the pid is the same here:

Dec 05 16:24:29 coreos ignition[780]: INFO     : files: files passed
Dec 05 16:24:31 coreos ignition[780]: INFO     : files: files passed

And it's the same in other tests that I've run. So I don't think ignition-files.service is actually running twice. Otherwise, I would've expected to see errors like e.g. "file already exists and overwrite is off" by now.

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

And -o json-pretty reveals why:

# journalctl -b 0 --grep="files passed" -o json-pretty
{
        "SYSLOG_FACILITY" : "3",
        "_EXE" : "/usr/bin/ignition",
        "_MACHINE_ID" : "f3a1c3b65c0e4e10bf5c511499720c51",
        "_PID" : "781",
        "_GID" : "0",
        "_CAP_EFFECTIVE" : "3fffffffff",
        "_SYSTEMD_SLICE" : "system.slice",
        "__REALTIME_TIMESTAMP" : "1575563615190378",
        "SYSLOG_IDENTIFIER" : "ignition",
        "__MONOTONIC_TIMESTAMP" : "6220168",
        "_COMM" : "ignition",
        "_STREAM_ID" : "fdf84b64a8f04861b4c996a44f047be8",
        "__CURSOR" : "s=e030078e2be04702951333e34e93cfa1;i=2f2;b=96c10fee3e024fe88621d9e9b77c2d61;m=5ee988;t=598f77d9abd6a;x=c30effcfe9d>
        "_BOOT_ID" : "96c10fee3e024fe88621d9e9b77c2d61",
        "MESSAGE" : "INFO     : files: files passed",
        "_SELINUX_CONTEXT" : "kernel",
        "_SYSTEMD_CGROUP" : "/system.slice/ignition-files.service",
        "PRIORITY" : "6",
        "_SYSTEMD_INVOCATION_ID" : "1951a787387740a5b7aa64cd67ff5559",
        "_TRANSPORT" : "stdout",
        "_CMDLINE" : "/usr/bin/ignition --root=/sysroot --platform=qemu --stage=files --log-to-stdout",
        "_HOSTNAME" : "coreos",
        "_SYSTEMD_UNIT" : "ignition-files.service",
        "_UID" : "0"
}
{
        "_TRANSPORT" : "kernel",
        "_MACHINE_ID" : "d8e40229bf814913af3886c2aaa33f95",
        "SYSLOG_IDENTIFIER" : "ignition",
        "_BOOT_ID" : "96c10fee3e024fe88621d9e9b77c2d61",
        "SYSLOG_PID" : "781",
        "PRIORITY" : "6",
        "MESSAGE" : "INFO     : files: files passed",
        "SYSLOG_FACILITY" : "3",
        "__REALTIME_TIMESTAMP" : "1575563617516057",
        "__CURSOR" : "s=e030078e2be04702951333e34e93cfa1;i=38e;b=96c10fee3e024fe88621d9e9b77c2d61;m=826636;t=598f77dbe3a19;x=a712cd15e29>
        "__MONOTONIC_TIMESTAMP" : "8545846",
        "_HOSTNAME" : "coreos",
        "_SOURCE_MONOTONIC_TIMESTAMP" : "6402014"
}

It's getting logged twice, once via journald, and once via the kernel buffer I think because of:

@cgwalters
Copy link
Member Author

Hmm...this might explain why we're seeing "Missed [x] kernel messages" in the journal too - if we're in some cases logging recursively, maybe journald tries to suppress this type of cycle but can't always do it?

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

So I think what's happening there is that it depends when switch root happens vs the journal reads input from the kernel buffer. If we switch root fast enough, the new journald instance which doesn't have that knob will be reading the data that the pre-switch-root journald left while forwarding.

I noticed that several of our units appeared to run twice, including
`ignition-files.service` (!) and `ignition-ostree-mount-firstboot-sysroot.service`.

[Later investigation revealed this was actually a double-logging
issue.]

Looking deeper into this, I discovered a bit of the initrd that
I didn't even know existed, which is `initrd-parse-etc.service`
that actually runs `ExecStart=-/usr/bin/systemctl --no-block start initrd-fs.target`
*post* switch root, after services that are part of that target
have already run (and then been shut down).

This adds a big wrinkle to my understanding of system bootup.

Then I noticed that `initrd-fs.target` has:
`ConditionPathExists=/etc/initrd-release`
which will be false post-switchroot.

The semantics here are...strange.  It seems like the goal is to ensure
that the rootfs is definitely still mounted, perhaps in
very complex rootfs scenarios?  In our case it will still be
mounted, we definitely don't want to rerun any of our Ignition
units.

[We discovered the units aren't actually being rerun, but
 this still seems like a best practice]
@cgwalters cgwalters force-pushed the no-rerun-switchroot branch from c457e2a to 8800e42 Compare December 5, 2019 17:14
@cgwalters
Copy link
Member Author

Wow, thanks for that investigation. Trying this some more, it does seem like you're right, and I was just lucky with the race when I was testing out this patch.

But...hum. Then I guess my question is why isn't that unit re-running our services? [Going to add some sanity checking that it really isn't happening]

Is the reason that ConditionPathExists=/etc/initrd-release is used in a bunch of units to simply avoid breakage if they're somehow enabled by default in the real root?

I updated the commit message with the new findings, but basically I'd argue for merging anyways since this seems like a best practice.

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

I updated the commit message with the new findings, but basically I'd argue for merging anyways since this seems like a best practice.

I think I agree, though... let's just plaster it in all our units?

Is the reason that ConditionPathExists=/etc/initrd-release is used in a bunch of units to simply avoid breakage if they're somehow enabled by default in the real root?

This is a pure guess, but I think it's simply because the build system puts all the service files indiscriminately in the real root and the 01systemd-initrd module just fetches them from there.

One nice bonus of doing this is I'm pretty sure it's what allows e.g. systemctl status initd-parse-etc.service to work correctly post-switchroot. If we do this to our units too, we should be able to get that working as well.

@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

(I can investigate this and do that switch as a follow-up if folks agree. I think it'd be a nice tiny QOL improvement.)

@cgwalters
Copy link
Member Author

I think I agree, though... let's just plaster it in all our units?

Yeah.

@cgwalters cgwalters merged commit 8168e4c into coreos:master Dec 5, 2019
jlebon added a commit to jlebon/ignition-dracut that referenced this pull request Dec 5, 2019
This is standard practice for units that are only meant to run in the
initrd. It matches what e.g. `systemd` does for its initrd units. See
also: coreos#140.

I also snuck in `Documentation=` lines in there.
@jlebon
Copy link
Member

jlebon commented Dec 5, 2019

#142

jlebon added a commit that referenced this pull request Dec 6, 2019
This is standard practice for units that are only meant to run in the
initrd. It matches what e.g. `systemd` does for its initrd units. See
also: #140.

I also snuck in `Documentation=` lines in there.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants