[sled-agent] Disk Detection, Partition Management, and U.2 formatting #2176

smklein · 2023-01-17T03:54:34Z

Features

Enables the Sled Agent to identify physical disks, format zpools on U.2 devices, and use those zpools as allocation targets for datasets within Nexus.

For anyone familiar with the hard-coded list of zpools in sled agent's config.toml, these hook into the same mechanism, but come from real devices. As such, specifying them ahead-of-time is no longer required.

Mechanism

Implements the mechanism described in RFD 352 to detect disks, partitions, and zpools within.

Uses illumos-devinfo to query libdevinfo for device information
Uses libefi-illumos to query libefi for partition information
Uses fstyp to bridge the gap from "partition path" to "zpool name".

Testing

Tested manually on gimlet-sn21 on the following device layout:

Slot    Device  Path
   0       u.2  /devices/pci@ab,0/pci1022,1483@1,1/pci1b96,0@0/blkdev@w0014EE81000BC153,0
   1       u.2  /devices/pci@ab,0/pci1022,1483@1,2/pci1b96,0@0/blkdev@w0014EE81000BC67A,0
   2       u.2  /devices/pci@ab,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000BC5F5,0
   3       u.2  /devices/pci@ab,0/pci1022,1483@1,4/pci1b96,0@0/blkdev@w0014EE81000BC5C5,0
   4       u.2  /devices/pci@0,0/pci1022,1483@3,4/pci1b96,0@0/blkdev@w0014EE81000BC5D4,0
   5       u.2  /devices/pci@0,0/pci1022,1483@3,3/pci1b96,0@0/blkdev@w0014EE81000BC5C9,0
   6       u.2  /devices/pci@0,0/pci1022,1483@3,2/pci1b96,0@0/blkdev@w0014EE81000BC5D2,0
   7       u.2  /devices/pci@38,0/pci1022,1483@1,3/pci1b96,0@0/blkdev@w0014EE81000BC5CB,0
  17       m.2  /devices/pci@0,0/pci1022,1483@1,3/pci1344,3100@0/blkdev@w00A07501340802D1,0

Used nvmeadm secure-erase to fully wipe U.2 devices, observed that the sled-agent detects and correctly formats them.
Used zpool destroy to wipe the zpool off the U.2 device while leaving the GPT partition, observed that the sled-agent correctly detects the partition and adds the zpool to it.
Tested that in all cases Nexus is notified of the new zpools.

common/src/sql/dbinit.sql

sled-agent/src/hardware/illumos/mod.rs

sled-agent/src/illumos/zpool.rs

jgallagher · 2023-01-25T21:05:56Z

sled-agent/src/sled_agent.rs

+            .ensure_using_exactly_these_disks(self.inner.hardware.disks())
+            .await
+        {
+            warn!(log, "Failed to add disk: {e}");


This warning might be misleading, because there are a bunch of possible failures from ensure_using_exactly_these_disks (including at least failure to remove a disk in addition to adding one). Is failure to ensure the disk set we expect just a warning? I'm early in the PR review so maybe this will become clear later, but - what are the ramifications of continuing to run if the set of disks we think we're using aren't the disks we're actually using?

Failure to ensure any disks results in an error, not a warning. However, we still try to insert /remove everything else that needs to be changed.

I didn't want "one bad U.2" to cause a whole sled to eject itself, which is sorta the motivation behind "continuing when we've already seen an error anyway".

sled-agent/src/storage_manager.rs

sled-agent/src/hardware/illumos/disk.rs

sled-agent/src/hardware/illumos/mod.rs

jgallagher · 2023-01-26T17:17:09Z

sled-agent/src/storage_manager.rs

+            })
+            .map(|(key, _)| key);
+        for key in keys_to_be_removed {
+            if let Err(e) = self.delete_disk_locked(&mut disks, &key).await {


I don't think this is right - at this point disks is empty (because we swapped it on line 940), so delete_disk_locked won't do anything. Fixing this seems tricky. Playing with it a bit.

My short-term workaround: I'm going back to my old mechanism of updating the set of disks in place.

This desperately needs to be refactored to have some better unit tests without the tight coupling to real hardware

jclulow

Hi Sean!

With my apologies for the interminable delay, I have taken an initial look through this. I think the shape is about right on the Nexus/Omicron end. There are some things about the interaction with devinfo and NVMe and programs that we're running that I think might merit some changes.

Let me know if anything I've said needs more explanation. There are a lot of complex moving pieces which I am always happy to talk through if you'd like.

Thanks for picking this up, it's extremely critical!

sled-agent/src/hardware/illumos/mod.rs

jclulow · 2023-01-27T03:59:30Z

sled-agent/src/illumos/fstyp.rs

+        let cmd = command.arg(FSTYP).arg("-a").arg(path);
+
+        let output = execute(cmd).map_err(Error::from)?;
+        let stdout = String::from_utf8_lossy(&output.stdout);


I don't think we should use from_utf8_lossy() here, but rather from_utf8() and propagate an error out if one should occur. We definitely don't expect an error at this moment.

In general I think we should use String::from_utf8() (and handle failures) on any stdout output that we intend to inspect programatically. If you're just including stderr in a diagnostic message, then the lossy version seems fine (even preferable!).

Sure, sounds good. Updated.

jclulow · 2023-01-27T04:03:47Z

sled-agent/src/hardware/illumos/disk.rs

+                    // TODO: If we see a completely empty M.2, should we create
+                    // the expected partitions? Or would it be wiser to infer
+                    // that this indicates an unexpected error conditions that
+                    // needs mitigation?
+                    return Err(DiskError::CannotFormatM2NotImplemented);


I'm not sure, but it feels like this will be in the purvue of installinator; at least initially -- at least in the cases where manufacturing has failed to get it done.

Yeah, that makes sense. I figured the sled agent could support it, but I'd rather just exit early for now, since it's not a priority now.

jclulow · 2023-01-27T04:10:18Z

sled-agent/src/hardware/illumos/mod.rs

+    // For some reason, libdevfs doesn't prepend "/devices" when
+    // referring to the path, but it still returns an absolute path.


This is because libdevinfo is working with the kernel-visible device node path. The devfs file system, which exposes this tree, is traditionally mounted at /devices -- but in theory it could also be mounted elsewhere, and is a construct of the user mode environment, etc.

Updated the comment with this detail

sled-agent/src/hardware/illumos/mod.rs

jclulow · 2023-01-27T05:47:31Z

sled-agent/src/hardware/illumos/mod.rs

-        Self { tofino: TofinoSnapshot::new() }
+    // Walk the device tree to capture a view of the current hardware.
+    fn new(log: &Logger) -> Result<Self, Error> {
+        let mut device_info = DevInfo::new().map_err(Error::DevInfo)?;


I think we should probably be doing DevInfo::new_force_load() here, rather than new(). How often is this polling being done? Perhaps we should do a force load occasionally, or just the first time we start up. I'm concerned that we might not see all of the disks if we don't, because they might not be attached when they are not already in use by a ZFS pool, etc.

We are currently scanning the hardware every five seconds (though that polling choice is arbitrary). I figured we'd eventually switch to a sysevent-style interface, and only do a full scan if we missed an event, but for now, the "scanning devinfo" tries to do very little, and hand off work to a different tokio task.

I can update this to new_force_load anyway?

sled-agent/src/hardware/illumos/mod.rs

sled-agent/src/illumos/fstyp.rs

sled-agent/src/illumos/zpool.rs

sled-agent/src/hardware/illumos/mod.rs

smklein · 2023-01-30T22:28:01Z

(The current CI failures are due to ghcr being down)

We pretty desperately need this PR in as the foundation for some follow-up work, so I'm going to merge. I'm still very open to follow-ups and additional feedback, but I'm prioritizing urgency in this particular case.

Depends on #2176 - Restructure sled agent's "storage manager" to be better at reporting notifications to Nexus - Report notifications of disk attachment and removal to Nexus - Create DB structures to represent physical disks - Expose some information about those disks in the public API TODO, potentially after this PR: - [ ] Record a better history of "which disks have been attached to which sleds" - [ ] Make "Physical Disks" the parents of "Zpools", rather than their current "parent" (which is just "sled"). Part of #2036

Depends on a refactor made in #2176 - Uses `libdevinfo` to grab `baseboard-*` fields about Gimlets - Plumbs that information up to Nexus Fixes #2211 TODO: - [ ] Add tests - [ ] Make it easier to "inject fake values" here, possibly via sled agent configuration - [ ] Actually index on these values in Nexus, rather than storing them as arbitrary strings. E.g., we should probably ensure there aren't duplicates across the fleet.

smklein added 4 commits December 5, 2022 11:29

Tracking disks via libdevinfo

f6ba4af

Merge branch 'main' into detect-disks

d59f515

some SQL table hypotheticals

482d62f

Partition detection - still needs more work; untested

0c3d3c6

smklein commented Jan 17, 2023

View reviewed changes

common/src/sql/dbinit.sql Outdated Show resolved Hide resolved

smklein added 10 commits January 16, 2023 23:11

comments

b596a4a

fmt

56c0ce5

Better error handling

8e5e70c

pfexec before fstyp, add disks on hw scan

330dcbf

prepend '/devices' to devfs paths

1ccbfd2

Merge branch 'main' into detect-disks

3dd3929

Merge branch 'main' into detect-disks

d874798

delete disk without todo

738ebdc

Sled agent formatting U.2 devices to conform to zpool format

7bafeca

Merge branch 'main' into detect-disks

40c4767

smklein mentioned this pull request Jan 20, 2023

Bump rust toolchain to nightly-2023-01-19 #2187

Closed

smklein added 5 commits January 20, 2023 02:32

Update libefi-illumos

31305b9

Merge branch 'main' into detect-disks

c11e461

better errors

542da6f

Fix blkdev polling, add logging, format zpools

567c2b5

Make tofino logging a little less noisy

a788462

smklein changed the title ~~[sled-agent] Disk Detection and Partition Management~~ [sled-agent] Disk Detection, Partition Management, and U.2 formatting Jan 23, 2023

smklein added 3 commits January 22, 2023 21:46

Comments and clippy

7236a3a

remove sql changes

a306bb7

primitive device ID

a942444

smklein marked this pull request as ready for review January 23, 2023 05:03

smklein added 2 commits January 23, 2023 14:07

Integrate with nvme ioctls to grab vendor ID

cd5b024

luck be a lockfile

70e62aa

smklein commented Jan 23, 2023

View reviewed changes

sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved

smklein commented Jan 23, 2023

View reviewed changes

sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved

logging cleanup

9867125

smklein requested review from ahl, jclulow and jgallagher January 24, 2023 20:06

smklein added 2 commits January 25, 2023 15:18

Merge branch 'main' into detect-disks

f52e128

Disk identity is strongly typed now

36db8d2

jgallagher reviewed Jan 25, 2023

View reviewed changes

smklein added 4 commits January 25, 2023 17:46

Merge branch 'main' into detect-disks

46f8f12

Review feedback

f2c36d7

more feedback

48cf49c

errors don't panic

14d65d5

jgallagher reviewed Jan 26, 2023

View reviewed changes

jclulow requested changes Jan 27, 2023

View reviewed changes

smklein added 7 commits January 27, 2023 10:01

Merge branch 'main' into detect-disks

4055451

review feedback

5f76b7c

Rely on devinfo properties rather than ioctl

c8c5bc9

Sanitize environment

9a1987a

Clippy and utf8

2c5fbe4

Merge branch 'main' into detect-disks

72879b9

Allow test devices to start, even without polling device tree

74384de

This was referenced Jan 30, 2023

[nexus] Exposing physical disks in operator API #2257

Merged

Plumb baseboard information from Sled Agent to Nexus #2258

Merged

Merge branch 'main' into detect-disks

a8bea74

smklein merged commit 1925085 into main Jan 30, 2023

smklein deleted the detect-disks branch January 30, 2023 22:28

smklein mentioned this pull request Jan 31, 2023

sled-agent should format new NVMe drives with proper sector size #1895

Open

smklein mentioned this pull request Feb 14, 2023

The installinator should be capable of writing real disk images to persistent storage #2361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sled-agent] Disk Detection, Partition Management, and U.2 formatting #2176

[sled-agent] Disk Detection, Partition Management, and U.2 formatting #2176

smklein commented Jan 17, 2023 •

edited

Loading

jgallagher Jan 25, 2023

smklein Jan 26, 2023

jgallagher Jan 26, 2023

smklein Jan 30, 2023

jclulow left a comment

jclulow Jan 27, 2023

smklein Jan 29, 2023

jclulow Jan 27, 2023

smklein Jan 27, 2023

jclulow Jan 27, 2023

smklein Jan 27, 2023

jclulow Jan 27, 2023

smklein Jan 27, 2023

smklein commented Jan 30, 2023

		// For some reason, libdevfs doesn't prepend "/devices" when
		// referring to the path, but it still returns an absolute path.

[sled-agent] Disk Detection, Partition Management, and U.2 formatting #2176

[sled-agent] Disk Detection, Partition Management, and U.2 formatting #2176

Conversation

smklein commented Jan 17, 2023 • edited Loading

Features

Mechanism

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jclulow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein commented Jan 30, 2023

smklein commented Jan 17, 2023 •

edited

Loading