Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirror pool gets corrupted (error getting refcount for feature com.delphix:async_destroy & metaslab_init failed) #10161

Closed
KenanHH opened this issue Mar 27, 2020 · 1 comment
Labels
Status: Stale No recent activity for issue

Comments

@KenanHH
Copy link

KenanHH commented Mar 27, 2020

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version Gentoo Base System release 2.6
Linux Kernel 5.4.24
Architecture x86_64
ZFS Version 0.8.3-r0-gentoo
SPL Version 0.8.3-r0-gentoo

Describe the problem you're observing

Mirror pool gets corrupted after sudden power loss and one of mirrored disk removal.
Here is output of zpool import

pool: NETSTOR
id: 4777479395761719734
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: http://zfsonlinux.org/msg/ZFS-8000-5E
 config:

	NETSTOR       FAULTED  corrupted data
	  mirror-0    DEGRADED
	    NETSTOR1  UNAVAIL
	    NETSTOR2  ONLINE

When I connect back NETSTOR1 disk I am able to do import pool with zpool import NETSTOR

Describe how to reproduce the problem

  • Power off the host
  • Remove one of disks
  • Try to assemble pool with out one disk

Include any warning/errors/backtraces from the system logs

dbgmsg

timestamp    message 
1585343400   spa.c:5638:spa_tryimport(): spa_tryimport: importing SYSTEM-dfa9
1585343400   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): LOADING
1585343400   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/wwn-0x515d90717c000426-part2': best uberblock found for spa $import. txg 776
1585343400   spa_misc.c:407:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=776
1585343400   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): LOADED
1585343400   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): UNLOADING
1585343400   spa.c:5490:spa_import(): spa_import: importing SYSTEM-dfa9
1585343400   spa_misc.c:407:spa_load_note(): spa_load(SYSTEM-dfa9, config trusted): LOADING
1585343400   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/wwn-0x515d90717c000426-part2': best uberblock found for spa SYSTEM-dfa9. txg 776
1585343400   spa_misc.c:407:spa_load_note(): spa_load(SYSTEM-dfa9, config untrusted): using uberblock with txg=776
1585343400   mmp.c:248:mmp_thread_start(): MMP thread started pool 'SYSTEM-dfa9' gethrtime 7377676785
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=1
1585343400   spa_misc.c:407:spa_load_note(): spa_load(SYSTEM-dfa9, config trusted): LOADED
1585343400   spa_history.c:316:spa_history_log_sync(): txg 778 open pool version 5000; software version unknown; uts (none) 5.4.24-commsware-00016-ge09cc20b7005 #282 SMP Wed Mar 11 15:42:49 CET 2020 x86_64
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=1
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=1
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=32
1585343400   spa_history.c:316:spa_history_log_sync(): txg 780 import pool version 5000; software version unknown; uts (none) 5.4.24-commsware-00016-ge09cc20b7005 #282 SMP Wed Mar 11 15:42:49 CET 2020 x86_64
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=1
1585343400   spa.c:7592:spa_async_request(): spa=SYSTEM-dfa9 async request task=1
1585343405   spa_history.c:305:spa_history_log_sync(): command: zpool import -N -f SYSTEM-dfa9
1585339848   spa.c:5638:spa_tryimport(): spa_tryimport: importing NETSTOR
1585339848   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): LOADING
1585339849   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000006495/1000000000
1585339849   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR2': best uberblock found for spa $import. txg 208
1585339849   spa_misc.c:407:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=208
1585339850   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000001467/1000000000
1585339850   spa_misc.c:392:spa_load_failed(): spa_load($import, config trusted): FAILED: error getting refcount for feature com.delphix:async_destroy [error=52]
1585339850   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): UNLOADING
1585339850   spa.c:5490:spa_import(): spa_import: importing NETSTOR
1585339850   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585339851   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000006215/1000000000
1585339851   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR2': best uberblock found for spa NETSTOR. txg 208
1585339851   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=208
1585339852   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000004400/1000000000
1585339852   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: error getting refcount for feature com.delphix:async_destroy [error=52]
1585339852   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING
1585339852   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): spa_load_retry: rewind, max txg: 207
1585339852   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585339853   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000007613/1000000000
1585339853   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR2': best uberblock found for spa NETSTOR. txg 207
1585339853   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=207
1585339854   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000007054/1000000000
1585339854   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: error getting refcount for feature com.delphix:async_destroy [error=52]
1585339854   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING
1585339854   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): spa_load_retry: rewind, max txg: 206
1585339854   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585339855   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000001676/1000000000
1585339855   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR2': best uberblock found for spa NETSTOR. txg 206
1585339855   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=206
1585339856   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000001746/1000000000
1585339856   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: error getting refcount for feature com.delphix:async_destroy [error=52]
1585339856   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING
1585339856   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): spa_load_retry: rewind, max txg: 205
1585339856   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585339857   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000001466/1000000000
1585339857   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR2': best uberblock found for spa NETSTOR. txg 205
1585339857   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=205
1585339858   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/NETSTOR1': open error=2 timeout=1000003702/1000000000
1585339858   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: error getting refcount for feature com.delphix:async_destroy [error=52]
1585339858   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING

Extra info

On other host with same configuration this also happened with same dbgmsg log. After connecting removed disk and importing pool, I have done the following:

  • zfs create NETSTOR/test
  • zfs snapshot NETSTOR/test@snp
  • zfs destroy -d NETSTOR/test@snp

Then I tried to replicate the problem again (power off host, remove disk, try to import pool) and this is output of dmesg:

timestamp    message 
1585308453   spa.c:5638:spa_tryimport(): spa_tryimport: importing NETSTOR
1585308453   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): LOADING
1585308454   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000000975/1000000000
1585308454   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV1-1': best uberblock found for spa $import. txg 1489
1585308454   spa_misc.c:407:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=1489
1585308455   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000002259/1000000000
1585308455   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): unable to read the metaslab array [error=52]
1585308455   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): vdev_load: metaslab_init failed [error=52]
1585308455   spa_misc.c:392:spa_load_failed(): spa_load($import, config trusted): FAILED: vdev_load failed [error=52]
1585308455   spa_misc.c:407:spa_load_note(): spa_load($import, config trusted): UNLOADING
1585308455   spa.c:5490:spa_import(): spa_import: importing NETSTOR
1585308455   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585308456   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000001527/1000000000
1585308456   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV1-1': best uberblock found for spa NETSTOR. txg 1489
1585308456   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=1489
1585308457   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000002572/1000000000
1585308457   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): unable to read the metaslab array [error=52]
1585308457   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): vdev_load: metaslab_init failed [error=52]
1585308457   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: vdev_load failed [error=52]
1585308457   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING
1585308457   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): spa_load_retry: rewind, max txg: 1488
1585308457   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): LOADING
1585308458   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000002043/1000000000
1585308458   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV1-1': best uberblock found for spa NETSTOR. txg 1482
1585308458   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config untrusted): using uberblock with txg=1482
1585308459   vdev.c:124:vdev_dbgmsg(): disk vdev '/dev/disk/by-partlabel/SW3-NETSTOR-SRV2-1': open error=2 timeout=1000002595/1000000000
1585308459   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): unable to read the metaslab array [error=52]
1585308459   vdev.c:127:vdev_dbgmsg(): mirror-0 vdev (guid 10926999002515803230): vdev_load: metaslab_init failed [error=52]
1585308459   spa_misc.c:392:spa_load_failed(): spa_load(NETSTOR, config trusted): FAILED: vdev_load failed [error=52]
1585308459   spa_misc.c:407:spa_load_note(): spa_load(NETSTOR, config trusted): UNLOADING

I also have to say that I have tried to replicate same problem inside VM (VirtualBox Version 5.2.34_Ubuntu r133883), but with no success.

Update

After running, on imported pool, zpool initialize NETSTOR I can not replicate bug anymore. Problem is that zpool initialize takes too long (45 minutes on two disks of 60GB)

@stale
Copy link

stale bot commented Mar 30, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Mar 30, 2021
@stale stale bot closed this as completed Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

1 participant