Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a diagnostic kstat for obtaining pool status #17076

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ihoro
Copy link

@ihoro ihoro commented Feb 20, 2025

Motivation and Context

A hung pool process can be left holding the spa config lock or the spa namespace lock. If an admin wants to observe the status of a pool using the traditional zpool status, it could hang waiting for one of the locks held by the stuck process. It would be nice to observe pool status in this scenario without the risk of the inquiry hanging.

This PR is an aggregated and updated version of #16026 and #16484.

Description

This change adds /proc/spl/kstat/zfs/<poolname>/status_json.

This kstat output does not require taking the spa_namespace lock, as in the case for 'zpool status'. It can be used for investigations when pools are in a hung state while holding global locks required for a traditional 'zpool status' to proceed.

The newly introduced zfs_lockless_read_enabled module parameter enables traversal of related kernel structures in cases where the required config read locks cannot be taken.

When zfs_lockess_read_enabled is set, this kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices).

The idea is to follow zpool status -jp output as much as possible:

# sysctl -n kstat.zfs.p2.misc.status_json | jq
{
  "output_version": {
    "command": "kstat zpool status",
    "vers_major": 0,
    "vers_minor": 1
  },
  "pools": {
    "p2": {
      "name": "p2",
      "state": "ONLINE",
      "pool_guid": "16174423138529437443",
      "txg": "4",
      "spa_version": "5000",
      "zpl_version": "5",
      "vdevs": {
        "p2": {
          "name": "p2",
          "vdev_type": "root",
          "guid": "16174423138529437443",
          "class": "normal",
          "state": "ONLINE",
          "alloc_space": "116736",
          "total_space": "134217728",
          "def_space": "134217728",
          "read_errors": "0",
          "write_errors": "0",
          "checksum_errors": "0",
          "vdevs": {
            "/root/tmp/p2/file1": {
              "name": "/root/tmp/p2/file1",
              "vdev_type": "file",
              "guid": "542797111474461141",
              "path": "/root/tmp/p2/file1",
              "class": "normal",
              "state": "ONLINE",
              "alloc_space": "59392",
              "total_space": "67108864",
              "def_space": "67108864",
              "rep_dev_size": "71827456",
              "phys_space": "83886080",
              "read_errors": "0",
              "write_errors": "0",
              "checksum_errors": "0",
              "slow_ios": "0"
            },
            "/root/tmp/p2/file2📌": {
              "name": "/root/tmp/p2/file2📌",
              "vdev_type": "file",
              "guid": "13616452074869266915",
              "path": "/root/tmp/p2/file2📌",
              "class": "normal",
              "state": "ONLINE",
              "alloc_space": "57344",
              "total_space": "67108864",
              "def_space": "67108864",
              "rep_dev_size": "71827456",
              "phys_space": "83886080",
              "read_errors": "0",
              "write_errors": "0",
              "checksum_errors": "0",
              "slow_ios": "0"
            }
          }
        }
      },
      "error_count": "0"
    }
  }
}

How Has This Been Tested?

Added new pool_status_json.ksh atuomated test which compares the output with zpool status -jp.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@gmelikov
Copy link
Member

Don't mean to oppose this PR somehow, but just wanted to highlight for anybody passing by that there's already a method to get just a pool's status without locks via cat /proc/spl/kstat/zfs/POOLNAME/state.

@allanjude
Copy link
Contributor

Don't mean to oppose this PR somehow, but just wanted to highlight for anybody passing by that there's already a method to get just a pool's status without locks via cat /proc/spl/kstat/zfs/POOLNAME/state.

for clarity, that returns: ONLINE or DEGRADED

Not something analogous to zpool status, that tells you about EACH vdev

This kstat output does not require taking the spa_namespace
lock, as in the case for `zpool status`. It can be used for
investigations when pools are in a hung state while holding
global locks required for a traditional 'zpool status' to
proceed.

The newly introduced `zfs_lockless_read_enabled` module parameter
enables traversal of related kernel structures in cases where
the required read locks cannot be taken.

When `zfs_lockless_read_enabled` is set, this kstat is not safe
to use in conditions where pools are in the process of
configuration changes (i.e., adding/removing devices).

Co-authored-by: Don Brady <[email protected]>
Co-authored-by: Umer Saleem <[email protected]>
Signed-off-by: Igor Ostapenko <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
@ihoro ihoro force-pushed the pool-status-json-kstat branch from 938438b to 7e13585 Compare February 20, 2025 14:11
@tonyhutter
Copy link
Contributor

tonyhutter commented Feb 20, 2025

I think having a module param to disable locks for these rare debug situations is a valid use case. If we go down that path though, I think there's potentially more benefit in disabling locks for the userspace commands rather than a using a kstat. That would let us use the familiar zpool status with all its nice options and formatting (and other commands). It would also be less code churn than the kstat route.

I have a rough prototype here that seems to work (lightly tested only!):
tonyhutter@a729b84

# Disable locks on 'tank'
echo -n tank > /sys/module/zfs/parameters/zfs_debug_skip_locks_on_pool

# Disable locks on all pools
echo -n '*' > /sys/module/zfs/parameters/zfs_debug_skip_locks_on_pool

zpool status
zpool status -j
...

The prototype is slightly more dangerous than the kstat version since it does traverse the spa_namespace avl tree (locklessly). But the odds that the tree changes out from under you while running zpool is very extremely low, and when it does, it would be doing so during a time that would risk using the kstat as well (like an export).

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants