Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool events does not list DEGRADED state change for pool #12629

Closed
bitonic opened this issue Oct 10, 2021 · 5 comments
Closed

zpool events does not list DEGRADED state change for pool #12629

bitonic opened this issue Oct 10, 2021 · 5 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@bitonic
Copy link
Contributor

bitonic commented Oct 10, 2021

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 21.05
Kernel Version 5.10.70
Architecture x86_64
OpenZFS Version 2.0.6-1

Describe the problem you're observing

When a pool goes into DEGRADED due to a missing drive, the corresponding statechange event is not generated.

Describe how to reproduce the problem

My pool looks like this:

  pool: zroot
 state: ONLINE
  scan: resilvered 5.49M in 00:00:01 with 0 errors on Sun Oct 10 10:43:34 2021
config:

	NAME                                         STATE     READ WRITE CKSUM
	zroot                                        ONLINE       0     0     0
	  raidz1-0                                   ONLINE       0     0     0
	    ata-ST10000NM0568-2H5110_ZHZ54D5K-part3  ONLINE       0     0     0
	    ata-ST10000NM0568-2H5110_ZHZ54D1C-part3  ONLINE       0     0     0
	    ata-ST10000NM0568-2H5110_ZHZ54DBW-part1  ONLINE       0     0     0
	    ata-ST10000NM0568-2H5110_ZHZ54D2A-part1  ONLINE       0     0     0

errors: No known data errors

I wanted to verify that ZED sends emails correctly when the pool is degraded. To do that, I kill one of the drives like so:

sudo sh -c 'echo 0 > /sys/block/sdc/device/delete'

And the status changes to degraded as expected:

  pool: zroot
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: resilvered 15.0M in 00:00:01 with 0 errors on Sun Oct 10 10:15:55 2021
config:
 
        NAME                                         STATE     READ WRITE CKSUM
        zroot                                        DEGRADED     0     0     0
          raidz1-0                                   DEGRADED     0     0     0
            ata-ST10000NM0568-2H5110_ZHZ54D5K-part3  ONLINE       0     0     0
            ata-ST10000NM0568-2H5110_ZHZ54D1C-part3  ONLINE       0     0     0
            ata-ST10000NM0568-2H5110_ZHZ54DBW-part1  UNAVAIL      3    69     0
            ata-ST10000NM0568-2H5110_ZHZ54D2A-part1  ONLINE       0     0     0
 
errors: No known data errors

However, zpool events -v does not include a statechange event to DEGRADED for the pool. It does include an UNAVAIL statechange event for the drive itself. However, the default zedlet statechange-notify.sh only catches 'DEGRADED', 'FAULTED' or 'REMOVED'.

There is an argument to be made that statechange-notify.sh should also catch UNAVAIL, but I still think that the statechange event for the pool going to DEGRADED should be generated.

Here is the list of relevant events:

Oct 10 2021 10:39:51.709234009 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x150e03af8dd01001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xe244c57cc81be54b
                vdev = 0x7e580703404ed3ff
        (end detector)
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x7e580703404ed3ff
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x150e03b1fe3
        vdev_delta_ts = 0x3cdb
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0x581f35675e2a34d0
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        zio_err = 0x5
        zio_flags = 0xb08c1
        zio_stage = 0x1000000
        zio_pipeline = 0x1080000
        zio_delay = 0x60b2
        zio_timestamp = 0x150e03a5d84
        zio_delta = 0x74d2
        zio_priority = 0x0
        zio_offset = 0x42000
        zio_size = 0x2000
        time = 0x6162c2f7 0x2a460d59 
        eid = 0x10

Oct 10 2021 10:39:51.709234009 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x150e03c1a6d01001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xe244c57cc81be54b
                vdev = 0x7e580703404ed3ff
        (end detector)
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x7e580703404ed3ff
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x150e03c03c6
        vdev_delta_ts = 0x13cdc
        vdev_read_errors = 0x1
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0x581f35675e2a34d0
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        zio_err = 0x5
        zio_flags = 0xb08c1
        zio_stage = 0x1000000
        zio_pipeline = 0x1080000
        zio_delay = 0x130b9
        zio_timestamp = 0x150e03ac6ea
        zio_delta = 0x13c83
        zio_priority = 0x0
        zio_offset = 0x91808b82000
        zio_size = 0x2000
        time = 0x6162c2f7 0x2a460d59 
        eid = 0x11

Oct 10 2021 10:39:51.709234009 ereport.fs.zfs.io
        class = "ereport.fs.zfs.io"
        ena = 0x150e03d5c5d01001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xe244c57cc81be54b
                vdev = 0x7e580703404ed3ff
        (end detector)
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x7e580703404ed3ff
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x150e03d4b23
        vdev_delta_ts = 0x25387
        vdev_read_errors = 0x2
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0x581f35675e2a34d0
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        zio_err = 0x5
        zio_flags = 0xb08c1
        zio_stage = 0x1000000
        zio_pipeline = 0x1080000
        zio_delay = 0x24b23
        zio_timestamp = 0x150e03af79c
        zio_delta = 0x25331
        zio_priority = 0x0
        zio_offset = 0x91808bc2000
        zio_size = 0x2000
        time = 0x6162c2f7 0x2a460d59 
        eid = 0x12

Oct 10 2021 10:39:51.709234009 ereport.fs.zfs.probe_failure
        class = "ereport.fs.zfs.probe_failure"
        ena = 0x150e03e351d01001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xe244c57cc81be54b
                vdev = 0x7e580703404ed3ff
        (end detector)
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x7e580703404ed3ff
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x150e03d4b23
        vdev_delta_ts = 0x25387
        vdev_read_errors = 0x3
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0x581f35675e2a34d0
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        prev_state = 0x0
        time = 0x6162c2f7 0x2a460d59 
        eid = 0x13

Oct 10 2021 10:39:52.820242319 ereport.fs.zfs.vdev.unknown
        class = "ereport.fs.zfs.vdev.unknown"
        ena = 0x1512276b31e00001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xe244c57cc81be54b
                vdev = 0x7e580703404ed3ff
        (end detector)
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x7e580703404ed3ff
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0x150e09f79f7
        vdev_delta_ts = 0x3ee8
        vdev_read_errors = 0x3
        vdev_write_errors = 0x45
        vdev_cksum_errors = 0x0
        vdev_delays = 0x0
        parent_guid = 0x581f35675e2a34d0
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        prev_state = 0x1
        time = 0x6162c2f8 0x30e3e78f 
        eid = 0x14

Oct 10 2021 10:39:52.820242319 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "zroot"
        pool_guid = 0xe244c57cc81be54b
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x7e580703404ed3ff
        vdev_state = "UNAVAIL" (0x4)
        vdev_path = "/dev/disk/by-id/ata-ST10000NM0568-2H5110_ZHZ54DBW-part1"
        vdev_laststate = "ONLINE" (0x7)
        time = 0x6162c2f8 0x30e3e78f 
        eid = 0x15
@bitonic bitonic added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 10, 2021
@bitonic bitonic changed the title zpool events does not list DEGRADED state change for pol zpool events does not list DEGRADED state change for pool Oct 10, 2021
@ghost
Copy link

ghost commented Oct 25, 2021

Should this have been closed? It doesn't seem like the commit would fix zpool events not listing the state change?

@bitonic
Copy link
Contributor Author

bitonic commented Oct 25, 2021

@freqlabs is right, this should not have been closed. The PR is related but does not fix this issue.

@behlendorf
Copy link
Contributor

Right, a little over zealous. Reopening.

@behlendorf behlendorf reopened this Oct 25, 2021
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 10, 2022
`UNAVAIL` is maybe not quite as concerning as `DEGRADED`, but still an
event of notice, in my opinion. For example it is triggered when a
drive goes missing.

Reviewed-by: Don Brady <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Francesco Mazzoli <[email protected]>
Closes openzfs#12629
Closes openzfs#12630
@stale
Copy link

stale bot commented Oct 29, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 29, 2022
@behlendorf
Copy link
Contributor

Should be resolved by #12630 and #13797.

@behlendorf behlendorf removed the Status: Stale No recent activity for issue label Oct 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants