Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task txg_sync blocked for more than 120 seconds. #9130

Open
AlexOwen opened this issue Aug 6, 2019 · 159 comments
Open

task txg_sync blocked for more than 120 seconds. #9130

AlexOwen opened this issue Aug 6, 2019 · 159 comments
Assignees

Comments

@AlexOwen
Copy link

AlexOwen commented Aug 6, 2019

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 6.0.1-2
Linux Kernel 5.0.18-1-pve
Architecture 64-bit Intel
ZFS Version 0.8.1-pve1
SPL Version 0.8.1-pve1

The pool is 40TB raw (8 x 5TB) in RAIDZ2. De-duplication is off.
Drives are new and verified working with badblocks (4 passes) and smart tests.
Compression is set to lz4.
This has been verified with native encryption on and off on the same hardware.
The drives are in 2 x external thunderbolt enclosures, which would be a red flag had they not run for > 1 year on another system with different (same model) drives without fault.

Describe the problem you're observing

The IO hangs during long rsync jobs (e.g. 1TB+) over the network, or shorter ones locally (one pool to another) (1GB+).

This causes rsync to hang and any other programs that use the pool will also go unresponsive if started (as would be expected if the IO is hung. E.g. the console is responsive over SSH, but ls on the root of the pool hangs.

When this happened on ZFS 0.7.x it kernel panicked and crashed the computer, but since 0.8.1 it only causes pool IO related tasks to hang but the server remains responsive otherwise.

This seems related to #2934, #7946, #4361, #2611 and #4011.

The system only has 8GB RAM, which may or may not be an issue, but I wouldn't expect a kernel panic regardless. Other users on the related issues report increasing the RAM, or higher RAM initially with the same error.

CPU usage is generally high (>80%) and write speeds are low throughout the transfer when the pool is encrypted, but this is expected as I believe the missing kernel symbols patch is not in this release (scheduled for 0.8.2?).

Describe how to reproduce the problem

Any long rsync job seems to crash it after a few hours, a fast, large local rsync job (1GB+ pool to pool) will crash it almost immediately.

Include any warning/errors/backtraces from the system logs

Initially:

Aug  6 03:44:37 local kernel: [114789.544998] INFO: task txg_sync:8515 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545003]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545006] txg_sync        D    0  8515      2 0x80000000
Aug  6 03:44:37 local kernel: [114789.545007] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545013]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545061]  ? vdev_disk_io_start+0x981/0xa30 [zfs]
Aug  6 03:44:37 local kernel: [114789.545063]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545064]  schedule_timeout+0x157/0x360
Aug  6 03:44:37 local kernel: [114789.545067]  ? __next_timer_interrupt+0xd0/0xd0
Aug  6 03:44:37 local kernel: [114789.545068]  io_schedule_timeout+0x1e/0x50
Aug  6 03:44:37 local kernel: [114789.545073]  __cv_timedwait_common+0x12f/0x170 [spl]
Aug  6 03:44:37 local kernel: [114789.545075]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545078]  __cv_timedwait_io+0x19/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545109]  zio_wait+0x13a/0x280 [zfs]
Aug  6 03:44:37 local kernel: [114789.545111]  ? _cond_resched+0x19/0x30
Aug  6 03:44:37 local kernel: [114789.545142]  vdev_config_sync+0xe8/0x240 [zfs]
Aug  6 03:44:37 local kernel: [114789.545170]  spa_sync+0xbef/0xfc0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545200]  txg_sync_thread+0x2d9/0x4c0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545258]  ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545262]  thread_generic_wrapper+0x74/0x90 [spl]
Aug  6 03:44:37 local kernel: [114789.545264]  kthread+0x120/0x140
Aug  6 03:44:37 local kernel: [114789.545267]  ? __thread_exit+0x20/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545269]  ? __kthread_parkme+0x70/0x70
Aug  6 03:44:37 local kernel: [114789.545270]  ret_from_fork+0x35/0x40
Aug  6 03:44:37 local kernel: [114789.545272] INFO: task rsync:4565 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545274]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545276] rsync           D    0  4565   9789 0x00000000
Aug  6 03:44:37 local kernel: [114789.545277] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545279]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545281]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545281]  schedule_timeout+0x157/0x360
Aug  6 03:44:37 local kernel: [114789.545283]  ? __next_timer_interrupt+0xd0/0xd0
Aug  6 03:44:37 local kernel: [114789.545285]  io_schedule_timeout+0x1e/0x50
Aug  6 03:44:37 local kernel: [114789.545287]  __cv_timedwait_common+0x12f/0x170 [spl]
Aug  6 03:44:37 local kernel: [114789.545289]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545291]  __cv_timedwait_io+0x19/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545321]  zio_wait+0x13a/0x280 [zfs]
Aug  6 03:44:37 local kernel: [114789.545338]  dbuf_read+0x4e1/0xb60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545357]  dmu_buf_hold+0x5d/0x90 [zfs]
Aug  6 03:44:37 local kernel: [114789.545387]  zap_lockdir+0x4e/0xb0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545417]  zap_lookup_norm+0x5d/0xc0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545447]  zap_lookup+0x16/0x20 [zfs]
Aug  6 03:44:37 local kernel: [114789.545477]  zfs_dirent_lock+0x524/0x670 [zfs]
Aug  6 03:44:37 local kernel: [114789.545506]  zfs_dirlook+0x92/0x290 [zfs]
Aug  6 03:44:37 local kernel: [114789.545536]  zfs_lookup+0x1ea/0x3a0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545564]  zpl_lookup+0xc4/0x1d0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545567]  ? __d_lookup+0x5a/0x140
Aug  6 03:44:37 local kernel: [114789.545568]  __lookup_slow+0x9b/0x150
Aug  6 03:44:37 local kernel: [114789.545570]  lookup_slow+0x3a/0x60
Aug  6 03:44:37 local kernel: [114789.545571]  walk_component+0x1bf/0x330
Aug  6 03:44:37 local kernel: [114789.545573]  ? link_path_walk.part.42+0x29e/0x540
Aug  6 03:44:37 local kernel: [114789.545574]  path_lookupat.isra.46+0x6d/0x220
Aug  6 03:44:37 local kernel: [114789.545576]  ? _cond_resched+0x19/0x30
Aug  6 03:44:37 local kernel: [114789.545576]  filename_lookup.part.60+0xa0/0x170
Aug  6 03:44:37 local kernel: [114789.545579]  ? __check_object_size+0x166/0x192
Aug  6 03:44:37 local kernel: [114789.545581]  ? strncpy_from_user+0x56/0x1b0
Aug  6 03:44:37 local kernel: [114789.545582]  user_path_at_empty+0x3e/0x50
Aug  6 03:44:37 local kernel: [114789.545583]  vfs_statx+0x76/0xe0
Aug  6 03:44:37 local kernel: [114789.545584]  ? chmod_common+0x12f/0x180
Aug  6 03:44:37 local kernel: [114789.545585]  __do_sys_newlstat+0x3d/0x70
Aug  6 03:44:37 local kernel: [114789.545586]  ? mntput+0x24/0x40
Aug  6 03:44:37 local kernel: [114789.545587]  ? path_put+0x1e/0x30
Aug  6 03:44:37 local kernel: [114789.545588]  ? do_fchmodat+0x6c/0xb0
Aug  6 03:44:37 local kernel: [114789.545589]  __x64_sys_newlstat+0x16/0x20
Aug  6 03:44:37 local kernel: [114789.545591]  do_syscall_64+0x5a/0x110
Aug  6 03:44:37 local kernel: [114789.545592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:44:37 local kernel: [114789.545593] RIP: 0033:0x7ffb93a7ab45
Aug  6 03:44:37 local kernel: [114789.545597] Code: Bad RIP value.
Aug  6 03:44:37 local kernel: [114789.545598] RSP: 002b:00007ffd99de4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Aug  6 03:44:37 local kernel: [114789.545599] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb93a7ab45
Aug  6 03:44:37 local kernel: [114789.545599] RDX: 00007ffd99de4ee0 RSI: 00007ffd99de4ee0 RDI: 00007ffd99de70c0
Aug  6 03:44:37 local kernel: [114789.545600] RBP: 0000000000000000 R08: 0000564394b42750 R09: 0000000000000004
Aug  6 03:44:37 local kernel: [114789.545600] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd99de70c0
Aug  6 03:44:37 local kernel: [114789.545600] R13: 00007ffd99de4ee0 R14: 000000000015fdb0 R15: 0000000000000000
Aug  6 03:44:37 local kernel: [114789.545602] INFO: task rsync:4580 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545604]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545607] rsync           D    0  4580   4565 0x00000000
Aug  6 03:44:37 local kernel: [114789.545607] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545609]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545627]  ? dbuf_read+0x7d4/0xb60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545629]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545631]  cv_wait_common+0x104/0x130 [spl]
Aug  6 03:44:37 local kernel: [114789.545633]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545635]  __cv_wait+0x15/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545656]  dmu_tx_wait+0xbf/0x3b0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545676]  dmu_tx_assign+0x161/0x480 [zfs]
Aug  6 03:44:37 local kernel: [114789.545706]  zfs_write+0x420/0xe80 [zfs]
Aug  6 03:44:37 local kernel: [114789.545708]  ? __skb_datagram_iter+0x1a9/0x2c0
Aug  6 03:44:37 local kernel: [114789.545711]  ? unix_stream_read_generic+0x703/0x900
Aug  6 03:44:37 local kernel: [114789.545740]  zpl_write_common_iovec+0xa3/0xf0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545769]  zpl_iter_write+0xed/0x120 [zfs]
Aug  6 03:44:37 local kernel: [114789.545771]  new_sync_write+0x10c/0x170
Aug  6 03:44:37 local kernel: [114789.545773]  __vfs_write+0x29/0x40
Aug  6 03:44:37 local kernel: [114789.545774]  vfs_write+0xab/0x1b0
Aug  6 03:44:37 local kernel: [114789.545775]  ksys_write+0x5c/0xd0
Aug  6 03:44:37 local kernel: [114789.545776]  __x64_sys_write+0x1a/0x20
Aug  6 03:44:37 local kernel: [114789.545778]  do_syscall_64+0x5a/0x110
Aug  6 03:44:37 local kernel: [114789.545779]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:44:37 local kernel: [114789.545779] RIP: 0033:0x7ffb93a7b504
Aug  6 03:44:37 local kernel: [114789.545781] Code: Bad RIP value.
Aug  6 03:44:37 local kernel: [114789.545781] RSP: 002b:00007ffd99de3e68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug  6 03:44:37 local kernel: [114789.545782] RAX: ffffffffffffffda RBX: 00007ffb8efd5010 RCX: 00007ffb93a7b504
Aug  6 03:44:37 local kernel: [114789.545783] RDX: 0000000000040000 RSI: 00007ffb8efd5010 RDI: 0000000000000003
Aug  6 03:44:37 local kernel: [114789.545783] RBP: 0000000000000000 R08: 0000000000040000 R09: 00007ffb8f014ae0
Aug  6 03:44:37 local kernel: [114789.545784] R10: 000000006b388e1f R11: 0000000000000246 R12: 0000000000000003
Aug  6 03:44:37 local kernel: [114789.545784] R13: 000056438ae7edc0 R14: 0000000000003ffa R15: 0000000000003ffa

Then periodically:

Aug  6 03:50:40 local kernel: [115152.034700] INFO: task rsync:4580 blocked for more than 120 seconds.
Aug  6 03:50:40 local kernel: [115152.034705]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:50:40 local kernel: [115152.034706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:50:40 local kernel: [115152.034708] rsync           D    0  4580   4565 0x00000000
Aug  6 03:50:40 local kernel: [115152.034709] Call Trace:
Aug  6 03:50:40 local kernel: [115152.034715]  __schedule+0x2d4/0x870
Aug  6 03:50:40 local kernel: [115152.034754]  ? dbuf_read+0x7d4/0xb60 [zfs]
Aug  6 03:50:40 local kernel: [115152.034756]  schedule+0x2c/0x70
Aug  6 03:50:40 local kernel: [115152.034761]  cv_wait_common+0x104/0x130 [spl]
Aug  6 03:50:40 local kernel: [115152.034764]  ? wait_woken+0x80/0x80
Aug  6 03:50:40 local kernel: [115152.034766]  __cv_wait+0x15/0x20 [spl]
Aug  6 03:50:40 local kernel: [115152.034787]  dmu_tx_wait+0xbf/0x3b0 [zfs]
Aug  6 03:50:40 local kernel: [115152.034808]  dmu_tx_assign+0x161/0x480 [zfs]
Aug  6 03:50:40 local kernel: [115152.034866]  zfs_write+0x420/0xe80 [zfs]
Aug  6 03:50:40 local kernel: [115152.034868]  ? __skb_datagram_iter+0x1a9/0x2c0
Aug  6 03:50:40 local kernel: [115152.034870]  ? unix_stream_read_generic+0x703/0x900
Aug  6 03:50:40 local kernel: [115152.034900]  zpl_write_common_iovec+0xa3/0xf0 [zfs]
Aug  6 03:50:40 local kernel: [115152.034928]  zpl_iter_write+0xed/0x120 [zfs]
Aug  6 03:50:40 local kernel: [115152.034930]  new_sync_write+0x10c/0x170
Aug  6 03:50:40 local kernel: [115152.034932]  __vfs_write+0x29/0x40
Aug  6 03:50:40 local kernel: [115152.034933]  vfs_write+0xab/0x1b0
Aug  6 03:50:40 local kernel: [115152.034934]  ksys_write+0x5c/0xd0
Aug  6 03:50:40 local kernel: [115152.034936]  __x64_sys_write+0x1a/0x20
Aug  6 03:50:40 local kernel: [115152.034937]  do_syscall_64+0x5a/0x110
Aug  6 03:50:40 local kernel: [115152.034939]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:50:40 local kernel: [115152.034940] RIP: 0033:0x7ffb93a7b504
Aug  6 03:50:40 local kernel: [115152.034943] Code: Bad RIP value.
Aug  6 03:50:40 local kernel: [115152.034944] RSP: 002b:00007ffd99de3e68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug  6 03:50:40 local kernel: [115152.034945] RAX: ffffffffffffffda RBX: 00007ffb8efd5010 RCX: 00007ffb93a7b504
Aug  6 03:50:40 local kernel: [115152.034946] RDX: 0000000000040000 RSI: 00007ffb8efd5010 RDI: 0000000000000003
Aug  6 03:50:40 local kernel: [115152.034946] RBP: 0000000000000000 R08: 0000000000040000 R09: 00007ffb8f014ae0
Aug  6 03:50:40 local kernel: [115152.034947] R10: 000000006b388e1f R11: 0000000000000246 R12: 0000000000000003
Aug  6 03:50:40 local kernel: [115152.034947] R13: 000056438ae7edc0 R14: 0000000000003ffa R15: 0000000000003ffa
@prgwiz
Copy link

prgwiz commented Aug 6, 2019

We had these issues with Ubuntu 18.04
Have you tried:
zfs_prefetch_disable=1
and maybe
zfs_nocacheflush=1

I am not sure but running the rsync with "nice -n 19" may have helped.

We have not seen many issues with latest patches for Ubuntu Eoan.

@c0d3z3r0
Copy link
Contributor

c0d3z3r0 commented Aug 7, 2019

@BrainSlayer isn't that the same problem we had with #9034? o.O

@BrainSlayer
Copy link
Contributor

BrainSlayer commented Aug 8, 2019 via email

@c0d3z3r0
Copy link
Contributor

c0d3z3r0 commented Aug 8, 2019

which again triggers the endless loop in memory allocation procedure i love so much

which should be fixed by our patch; so maybe we even fix a very common, recurring problem - great!

@pkramme
Copy link

pkramme commented Nov 12, 2019

I may have the same problem, altough my hang appears when I delete many snapshots (>10000). Is this the same problem? RAIDZ1 on Ubuntu 18.04 LTS with HWE kernel (5.0.0-31-generic). If this is indeed the same problem, i can send you more information, because i still have 30000 snapshots i need to delete and everytime I try deleting large amounts, ZFS hangs.
dmesg_zfs_hung_tasks_after_deleting_many_snapshots.log

@HiFiPhile
Copy link

Having the same problem while pools under heavy IO

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 6.0-11
Linux Kernel 5.0.21-4-pve
Architecture amd64
ZFS Version 0.8.2-pve2
SPL Version 0.8.2-pve2
CPU: Xeon E3-1285L V3
RAM: 32GB DDR3-1600 ECC
HBA: LSI9305-24i
root@pve:~# zpool status
  pool: NAS
 state: ONLINE
  scan: scrub repaired 0B in 0 days 07:58:47 with 0 errors on Sun Nov 10 08:22:49 2019
config:

        NAME                                   STATE     READ WRITE CKSUM
        NAS                                    ONLINE       0     0     0
          raidz1-0                             ONLINE       0     0     0
            wwn-0x5000cca26bd9cc69             ONLINE       0     0     0
            ata-HGST_HUH721010ALE600_7JJ3GG4C  ONLINE       0     0     0
            ata-HGST_HUH721010ALE600_7JJ535MC  ONLINE       0     0     0

errors: No known data errors

  pool: S860
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:06:37 with 0 errors on Sun Nov 10 00:30:44 2019
config:

        NAME                                           STATE     READ WRITE CKSUM
        S860                                           ONLINE       0     0     0
          ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00732Y  ONLINE       0     0     0
          ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00919A  ONLINE       0     0     0

errors: No known data errors

Describe the problem you're observing

txg_sync hang during heavy IO, in my case during fio benchmark of NAS and S860 pool st the same time.

Describe how to reproduce the problem

Run fio benchmark of both pools at the same time:

[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=256K
direct=0
numjobs=1
time_based=1
runtime=300

[file1]
size=50G
ioengine=libaio
iodepth=16

Include any warning/errors/backtraces from the system logs

[  968.292727] INFO: task txg_sync:2130 blocked for more than 120 seconds.
[  968.292762]        Tainted: P          0      5.0.21-4-pve #1
[  968.292784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  968.292815] txg_sync        D   0   2130      2 0x80000000
[  968.292837] Call Trace:
[  968.292855] __schedule+0x2d4/0x870
[  968.292872] schedule+Ox2c/0x70
[  968.292887] schedule_timeout+0x157/0x360
[  968.292906] ? __next_timer_interrupt+OxdO/OxdO
[  968.292925]  io_schedule_timeout+Ox1e/Ox50
[  968.292946] __cv_timedwait_common+Oxl2f/0x170  [spl]
[  968.292968] ? wait_woken+0x80/0x80
[  968.292985] __cv_timedwait_io+0x19/0x20 [spl]
[  968.293049] zio_wait+Ox13a/0x280 [zfs]
[  968.293067] ? _cond_resched+0x19/0x30
[  968.293115] dsl_pool_sync+Oxdc/Ox4f0 [zfs]
[  968.293168] spa_sync+0x5b2/Oxfc0 [zfs]
[  968.293221] ? spa_txg_history_init_io+0x106/0x110 [zfs]
[  968.293278] txg_sync_thread+0x2d9/0x4c0 [zfs]
[  968.293331] ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
[  968.293355] thread_generic_wrapper+0x74/0x90 [spl]
[  968.293376] kthread+Ox120/0x140
[  968.293393] ? __thread_exit+0x20/0x20 [sell
[  968.293410]    __kthread_parkme+0x70/0x70
[  968.293428] ret_from_fork+0x35/0x40

@PrivatePuffin
Copy link
Contributor

I've seen the same issue and just like @BrainSlayer and @c0d3z3r0 it was due to low memory.
In my case VM's with proxmox with not enough Memory (for its taste).

@seanmikhaels
Copy link

seanmikhaels commented Nov 14, 2019

I have the same issue. A bandwidth limited rsync job at 12MB/s, or vm starting in virtual box triggers this and the entire file system freezes. Doesn't seem to be memory related nor CPU related as both stay low during transfers and file access. Disc IO also stays low.

OS - Debian Stretch
Kernel - 4.19.0-0.bpo.6-amd64
ZFS Version - 0.8.2-1
Pool: 3 Drive RaidZ1
L2ARC : 80GB SSD Drive.
CPU - AMD Athlon 240G
System Memory: 32GB System Ram.


ov 13 11:23:32 nas kernel: [59329.141166] INFO: task txg_sync:1228 blocked for more than 120 seconds.
Nov 13 11:23:32 nas kernel: [59329.141368] Tainted: P OE 4.19.0-0.bpo.6-amd64 #1 Debian 4.19.67-2+deb10u1~bpo9+1
Nov 13 11:23:32 nas kernel: [59329.141598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 13 11:23:32 nas kernel: [59329.141827] txg_sync D 0 1228 2 0x80000000
Nov 13 11:23:32 nas kernel: [59329.141833] Call Trace:
Nov 13 11:23:32 nas kernel: [59329.141846] ? __schedule+0x3f5/0x880
Nov 13 11:23:32 nas kernel: [59329.141920] ? spa_taskq_dispatch_ent+0x85/0xc0 [zfs]
Nov 13 11:23:32 nas kernel: [59329.141924] schedule+0x32/0x80
Nov 13 11:23:32 nas kernel: [59329.141927] schedule_timeout+0x16f/0x350
Nov 13 11:23:32 nas kernel: [59329.141931] ? __next_timer_interrupt+0xc0/0xc0
Nov 13 11:23:32 nas kernel: [59329.141934] io_schedule_timeout+0x19/0x40
Nov 13 11:23:32 nas kernel: [59329.141942] __cv_timedwait_common+0x122/0x160 [spl]
Nov 13 11:23:32 nas kernel: [59329.141947] ? remove_wait_queue+0x60/0x60
Nov 13 11:23:32 nas kernel: [59329.142006] zio_wait+0x124/0x260 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142060] dsl_pool_sync+0x106/0x520 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142118] spa_sync+0x550/0xf90 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142177] ? spa_txg_history_init_io+0xfe/0x110 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142235] txg_sync_thread+0x291/0x450 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142331] ? txg_wait_open+0xd0/0xd0 [zfs]
Nov 13 11:23:32 nas kernel: [59329.142350] thread_generic_wrapper+0x6f/0x80 [spl]
Nov 13 11:23:32 nas kernel: [59329.142353] kthread+0xf8/0x130
Nov 13 11:23:32 nas kernel: [59329.142360] ? __thread_exit+0x20/0x20 [spl]
Nov 13 11:23:32 nas kernel: [59329.142361] ? kthread_create_worker_on_cpu+0x70/0x70
Nov 13 11:23:32 nas kernel: [59329.142364] ret_from_fork+0x22/0x40


avg-cpu: %user %nice %system %iowait %steal %idle
9.39 0.00 22.83 9.81 0.00 57.97

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdc 0.08 0.02 223.76 42.34 45460.22 650.13 346.57 0.68 2.71 3.15 0.42 1.47 39.24
sdb 0.05 0.02 146.82 42.74 45513.16 650.17 487.06 0.85 4.69 5.93 0.43 2.47 46.81
sda 0.00 0.45 0.41 488.19 0.85 46865.59 191.84 0.96 2.10 147.95 1.98 1.06 51.77
sdd 0.00 0.00 0.03 0.00 0.95 0.00 61.24 0.00 2.57 2.57 0.00 1.74 0.01
sde 0.09 0.02 230.35 42.18 45461.70 650.16 338.40 0.67 2.63 3.03 0.44 1.45 39.47
sdf 0.01 1.15 3.02 3.70 30.37 100.15 38.86 0.00 0.60 0.70 0.52 0.11 0.07

NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sde ONLINE 0 0 0
ata-ST4000VN000-1H4168_Z301425Y ONLINE 0 0 0
ata-ST4000DM005-2DP166_ZDH164PY ONLINE 0 0 0
logs
sdf3 ONLINE 0 0 0


cat /proc/spl/kstat/zfs/dmu_tx
11 1 0x01 12 3264 36614795947 60113570602791
name type data
dmu_tx_assigned 4 4137795
dmu_tx_delay 4 0
dmu_tx_error 4 0
dmu_tx_suspended 4 0
dmu_tx_group 4 0
dmu_tx_memory_reserve 4 0
dmu_tx_memory_reclaim 4 0
dmu_tx_dirty_throttle 4 0
dmu_tx_dirty_delay 4 132825
dmu_tx_dirty_over_max 4 28
dmu_tx_dirty_frees_delay 4 1
dmu_tx_quota 4 0

ZFS Subsystem Report Wed Nov 13 11:30:46 2019
ARC Summary: (HEALTHY)
Memory Throttle Count: 0

ARC Misc:
Deleted: 649.80k
Mutex Misses: 21
Evict Skips: 12

ARC Size: 98.85% 15.43 GiB
Target Size: (Adaptive) 100.00% 15.61 GiB
Min Size (Hard Limit): 6.25% 998.92 MiB
Max Size (High Water): 16:1 15.61 GiB

ARC Size Breakdown:
Recently Used Cache Size: 43.61% 4.38 GiB
Frequently Used Cache Size: 56.39% 5.66 GiB

ARC Hash Breakdown:
Elements Max: 1.20M
Elements Current: 89.82% 1.08M
Collisions: 1.07M
Chain Max: 5
Chains: 117.18k

ARC Total accesses: 497.41M
Cache Hit Ratio: 99.73% 496.07M
Cache Miss Ratio: 0.27% 1.34M
Actual Hit Ratio: 99.73% 496.06M

Data Demand Efficiency: 99.17% 433.01k
Data Prefetch Efficiency: 8.47% 2.87k

CACHE HITS BY CACHE LIST:
Most Recently Used: 2.84% 14.08M
Most Frequently Used: 97.16% 481.98M
Most Recently Used Ghost: 0.05% 262.03k
Most Frequently Used Ghost: 0.00% 11.54k

CACHE HITS BY DATA TYPE:
Demand Data: 0.09% 429.42k
Prefetch Data: 0.00% 243
Demand Metadata: 99.91% 495.63M
Prefetch Metadata: 0.00% 15.26k

CACHE MISSES BY DATA TYPE:
Demand Data: 0.27% 3.59k
Prefetch Data: 0.20% 2.63k
Demand Metadata: 82.28% 1.10M
Prefetch Metadata: 17.26% 231.71k

@PrivatePuffin
Copy link
Contributor

This might all be related to:
#9583

@lassebm
Copy link

lassebm commented Dec 19, 2019

I'm experiencing a similar issue, seen with ZFS packages from Ubuntu Eoan and packages built from ba434b1, containing the fix from #9583. The issue is seen on a mirrored pool with encrypted datasets, running on a Raspberry Pi 4B, 4 GB RAM, zfs_arc_max=536870912. The Pi is running various Docker workloads backed by the dockerd ZFS storage driver. The average free memory is around 1.5 GB.

[238407.538074] INFO: task z_wr_iss:1683 blocked for more than 120 seconds.
[238407.545147]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.552411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.560642] z_wr_iss        D    0  1683      2 0x00000028
[238407.560651] Call trace:
[238407.560665]  __switch_to+0xf0/0x150
[238407.560671]  __schedule+0x2cc/0x718
[238407.560675]  schedule+0x3c/0xb8
[238407.560679]  io_schedule+0x20/0x40
[238407.560685]  rq_qos_wait+0x100/0x178
[238407.560690]  wbt_wait+0xb4/0xf0
[238407.560694]  __rq_qos_throttle+0x38/0x50
[238407.560699]  blk_mq_make_request+0x120/0x5f8
[238407.560704]  generic_make_request+0xb4/0x2d8
[238407.560708]  submit_bio+0x44/0x1b0
[238407.560885]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.560992]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.561093]  zio_nowait+0xd4/0x170 [zfs]
[238407.561193]  vdev_queue_io+0x24c/0x268 [zfs]
[238407.561297]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[238407.561401]  zio_nowait+0xd4/0x170 [zfs]
[238407.561502]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.561605]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.561707]  zio_nowait+0xd4/0x170 [zfs]
[238407.561808]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.561907]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.562012]  zio_execute+0xac/0x110 [zfs]
[238407.562084]  taskq_thread+0x304/0x580 [spl]
[238407.562097]  kthread+0x104/0x130
[238407.562104]  ret_from_fork+0x10/0x1c
[238407.562117] INFO: task z_wr_iss_h:1685 blocked for more than 120 seconds.
[238407.569281]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.576523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.584745] z_wr_iss_h      D    0  1685      2 0x00000028
[238407.584755] Call trace:
[238407.584768]  __switch_to+0xf0/0x150
[238407.584775]  __schedule+0x2cc/0x718
[238407.584779]  schedule+0x3c/0xb8
[238407.584783]  io_schedule+0x20/0x40
[238407.584789]  rq_qos_wait+0x100/0x178
[238407.584793]  wbt_wait+0xb4/0xf0
[238407.584797]  __rq_qos_throttle+0x38/0x50
[238407.584803]  blk_mq_make_request+0x120/0x5f8
[238407.584807]  generic_make_request+0xb4/0x2d8
[238407.584812]  submit_bio+0x44/0x1b0
[238407.584970]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.585065]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.585160]  zio_nowait+0xd4/0x170 [zfs]
[238407.585254]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.585348]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.585442]  zio_nowait+0xd4/0x170 [zfs]
[238407.585536]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.585630]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.585724]  zio_execute+0xac/0x110 [zfs]
[238407.585742]  taskq_thread+0x304/0x580 [spl]
[238407.585748]  kthread+0x104/0x130
[238407.585753]  ret_from_fork+0x10/0x1c
[238407.585770] INFO: task txg_sync:1802 blocked for more than 120 seconds.
[238407.592634]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.599741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.607819] txg_sync        D    0  1802      2 0x00000028
[238407.607826] Call trace:
[238407.607835]  __switch_to+0xf0/0x150
[238407.607843]  __schedule+0x2cc/0x718
[238407.607847]  schedule+0x3c/0xb8
[238407.607852]  schedule_timeout+0x9c/0x190
[238407.607856]  io_schedule_timeout+0x28/0x48
[238407.607875]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[238407.607889]  __cv_timedwait_io+0x3c/0x50 [spl]
[238407.608041]  zio_wait+0x130/0x2a0 [zfs]
[238407.608140]  dsl_pool_sync+0x3fc/0x498 [zfs]
[238407.608244]  spa_sync+0x530/0xe98 [zfs]
[238407.608341]  txg_sync_thread+0x2d0/0x488 [zfs]
[238407.608357]  thread_generic_wrapper+0x74/0xa0 [spl]
[238407.608366]  kthread+0x104/0x130
[238407.608371]  ret_from_fork+0x10/0x1c
[238407.608430] INFO: task dockerd:32620 blocked for more than 120 seconds.
[238407.615300]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.622415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.630500] dockerd         D    0 32620      1 0x00000008
[238407.630508] Call trace:
[238407.630519]  __switch_to+0xf0/0x150
[238407.630528]  __schedule+0x2cc/0x718
[238407.630532]  schedule+0x3c/0xb8
[238407.630551]  cv_wait_common+0x188/0x1b0 [spl]
[238407.630565]  __cv_wait+0x30/0x40 [spl]
[238407.630687]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.630784]  zil_commit+0x48/0x70 [zfs]
[238407.630882]  zfs_fsync+0x84/0x108 [zfs]
[238407.630977]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.630984]  vfs_fsync_range+0x4c/0x88
[238407.630988]  do_fsync+0x48/0x88
[238407.630993]  __arm64_sys_fsync+0x24/0x38
[238407.630998]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.631001]  el0_svc_handler+0x34/0xa0
[238407.631006]  el0_svc+0x10/0x14
[238407.631047] INFO: task mongod:27428 blocked for more than 120 seconds.
[238407.637819]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.644927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.653010] mongod          D    0 27428  25240 0x00000800
[238407.653016] Call trace:
[238407.653028]  __switch_to+0xf0/0x150
[238407.653036]  __schedule+0x2cc/0x718
[238407.653041]  schedule+0x3c/0xb8
[238407.653063]  cv_wait_common+0x188/0x1b0 [spl]
[238407.653077]  __cv_wait+0x30/0x40 [spl]
[238407.653205]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.653310]  zil_commit+0x48/0x70 [zfs]
[238407.653407]  zfs_fsync+0x84/0x108 [zfs]
[238407.653502]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.653508]  vfs_fsync_range+0x4c/0x88
[238407.653513]  __arm64_sys_msync+0x190/0x1f8
[238407.653518]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.653522]  el0_svc_handler+0x34/0xa0
[238407.653526]  el0_svc+0x10/0x14
[238407.653572] INFO: task mongod:32146 blocked for more than 120 seconds.
[238407.660351]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.667463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.675546] mongod          D    0 32146  25240 0x00000800
[238407.675557] Call trace:
[238407.675565]  __switch_to+0xf0/0x150
[238407.675570]  __schedule+0x2cc/0x718
[238407.675574]  schedule+0x3c/0xb8
[238407.675592]  cv_wait_common+0x188/0x1b0 [spl]
[238407.675605]  __cv_wait+0x30/0x40 [spl]
[238407.675710]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.675806]  zil_commit+0x48/0x70 [zfs]
[238407.675900]  zfs_fsync+0x84/0x108 [zfs]
[238407.675995]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.676001]  vfs_fsync_range+0x4c/0x88
[238407.676005]  do_fsync+0x48/0x88
[238407.676010]  __arm64_sys_fdatasync+0x24/0x38
[238407.676015]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.676018]  el0_svc_handler+0x34/0xa0
[238407.676022]  el0_svc+0x10/0x14
[238407.676091] INFO: task z_wr_int:15893 blocked for more than 120 seconds.
[238407.683033]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.690134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.698212] z_wr_int        D    0 15893      2 0x00000028
[238407.698219] Call trace:
[238407.698231]  __switch_to+0xf0/0x150
[238407.698239]  __schedule+0x2cc/0x718
[238407.698243]  schedule+0x3c/0xb8
[238407.698247]  io_schedule+0x20/0x40
[238407.698253]  rq_qos_wait+0x100/0x178
[238407.698258]  wbt_wait+0xb4/0xf0
[238407.698263]  __rq_qos_throttle+0x38/0x50
[238407.698268]  blk_mq_make_request+0x120/0x5f8
[238407.698274]  generic_make_request+0xb4/0x2d8
[238407.698279]  submit_bio+0x44/0x1b0
[238407.698402]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.698500]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.698601]  zio_nowait+0xd4/0x170 [zfs]
[238407.698700]  vdev_queue_io_done+0x1f0/0x2a0 [zfs]
[238407.698795]  zio_vdev_io_done+0xec/0x220 [zfs]
[238407.698889]  zio_execute+0xac/0x110 [zfs]
[238407.698906]  taskq_thread+0x304/0x580 [spl]
[238407.698912]  kthread+0x104/0x130
[238407.698916]  ret_from_fork+0x10/0x1c
[238407.698925] INFO: task z_wr_iss_h:15990 blocked for more than 120 seconds.
[238407.706051]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.713152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.721228] z_wr_iss_h      D    0 15990      2 0x00000028
[238407.721234] Call trace:
[238407.721243]  __switch_to+0xf0/0x150
[238407.721254]  __schedule+0x2cc/0x718
[238407.721259]  schedule+0x3c/0xb8
[238407.721263]  io_schedule+0x20/0x40
[238407.721269]  rq_qos_wait+0x100/0x178
[238407.721274]  wbt_wait+0xb4/0xf0
[238407.721278]  __rq_qos_throttle+0x38/0x50
[238407.721283]  blk_mq_make_request+0x120/0x5f8
[238407.721289]  generic_make_request+0xb4/0x2d8
[238407.721294]  submit_bio+0x44/0x1b0
[238407.721401]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.721505]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.721604]  zio_nowait+0xd4/0x170 [zfs]
[238407.721698]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.721793]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.721887]  zio_nowait+0xd4/0x170 [zfs]
[238407.721984]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.722112]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.722212]  zio_execute+0xac/0x110 [zfs]
[238407.722229]  taskq_thread+0x304/0x580 [spl]
[238407.722237]  kthread+0x104/0x130
[238407.722242]  ret_from_fork+0x10/0x1c
[238407.722248] INFO: task z_wr_iss_h:15991 blocked for more than 121 seconds.
[238407.729367]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.736471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.744552] z_wr_iss_h      D    0 15991      2 0x00000028
[238407.744559] Call trace:
[238407.744572]  __switch_to+0xf0/0x150
[238407.744582]  __schedule+0x2cc/0x718
[238407.744586]  schedule+0x3c/0xb8
[238407.744589]  io_schedule+0x20/0x40
[238407.744595]  rq_qos_wait+0x100/0x178
[238407.744600]  wbt_wait+0xb4/0xf0
[238407.744604]  __rq_qos_throttle+0x38/0x50
[238407.744608]  blk_mq_make_request+0x120/0x5f8
[238407.744613]  generic_make_request+0xb4/0x2d8
[238407.744617]  submit_bio+0x44/0x1b0
[238407.744748]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.744844]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.744938]  zio_nowait+0xd4/0x170 [zfs]
[238407.745032]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.745126]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.745221]  zio_nowait+0xd4/0x170 [zfs]
[238407.745315]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.745409]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.745503]  zio_execute+0xac/0x110 [zfs]
[238407.745520]  taskq_thread+0x304/0x580 [spl]
[238407.745526]  kthread+0x104/0x130
[238407.745530]  ret_from_fork+0x10/0x1c
[238528.372629] INFO: task z_wr_iss:1683 blocked for more than 241 seconds.
[238528.379559]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238528.386729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238528.394820] z_wr_iss        D    0  1683      2 0x00000028
[238528.394828] Call trace:
[238528.394841]  __switch_to+0xf0/0x150
[238528.394848]  __schedule+0x2cc/0x718
[238528.394852]  schedule+0x3c/0xb8
[238528.394855]  io_schedule+0x20/0x40
[238528.394861]  rq_qos_wait+0x100/0x178
[238528.394866]  wbt_wait+0xb4/0xf0
[238528.394871]  __rq_qos_throttle+0x38/0x50
[238528.394876]  blk_mq_make_request+0x120/0x5f8
[238528.394881]  generic_make_request+0xb4/0x2d8
[238528.394885]  submit_bio+0x44/0x1b0
[238528.395046]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238528.395142]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238528.395238]  zio_nowait+0xd4/0x170 [zfs]
[238528.395332]  vdev_queue_io+0x24c/0x268 [zfs]
[238528.395427]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[238528.395521]  zio_nowait+0xd4/0x170 [zfs]
[238528.395615]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238528.395709]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238528.395803]  zio_nowait+0xd4/0x170 [zfs]
[238528.395897]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238528.395991]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238528.396084]  zio_execute+0xac/0x110 [zfs]
[238528.396103]  taskq_thread+0x304/0x580 [spl]
[238528.396109]  kthread+0x104/0x130
[238528.396113]  ret_from_fork+0x10/0x1c

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Dec 19, 2019

@lassebm
Besides it being blocked IO, your stacktrace looks notably different.

@lassebm
Copy link

lassebm commented Dec 19, 2019

@Ornias1993 I wasn't quite sure what to look for. Would you think it's worth a new issue?

@PrivatePuffin
Copy link
Contributor

@lassebm Actually, now I take a second look at it...
The lower parts of the trace are actually quite similar...

@kebian
Copy link

kebian commented Jan 8, 2020

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 19.10
Linux Kernel 5.3.0-24-generic
Architecture NUC7i3DNHE Intel i3 / 8GB RAM.
ZFS Version 0.8.1-1ubuntu14
SPL Version 0.8.1-1ubuntu14

I don't know if it's related but I have a similar issue on Ubuntu Eoan when attempting to rsync large amounts of NAS data from one machine to another. Transfer speeds are around 30MB/s and then suddenly they'll hit 5MB/s when these errors occur and will stay at that speed. Cancelling the rsync and waiting for the io to catch up temporarily fixes it.

[Tue Jan  7 06:06:14 2020] INFO: task txg_sync:1076 blocked for more than 120 seconds.
[Tue Jan  7 06:06:14 2020]       Tainted: P           O      5.3.0-24-generic #26-Ubuntu
[Tue Jan  7 06:06:14 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan  7 06:06:14 2020] txg_sync        D    0  1076      2 0x80004000
[Tue Jan  7 06:06:14 2020] Call Trace:
[Tue Jan  7 06:06:14 2020]  __schedule+0x2b9/0x6c0
[Tue Jan  7 06:06:14 2020]  schedule+0x42/0xb0
[Tue Jan  7 06:06:14 2020]  schedule_timeout+0x152/0x2f0
[Tue Jan  7 06:06:14 2020]  ? spa_taskq_dispatch_ent+0x4f/0x70 [zfs]
[Tue Jan  7 06:06:14 2020]  ? __next_timer_interrupt+0xe0/0xe0
[Tue Jan  7 06:06:14 2020]  io_schedule_timeout+0x1e/0x50
[Tue Jan  7 06:06:14 2020]  __cv_timedwait_common+0x15e/0x1c0 [spl]
[Tue Jan  7 06:06:14 2020]  ? wait_woken+0x80/0x80
[Tue Jan  7 06:06:14 2020]  __cv_timedwait_io+0x19/0x20 [spl]
[Tue Jan  7 06:06:14 2020]  zio_wait+0x11b/0x230 [zfs]
[Tue Jan  7 06:06:14 2020]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[Tue Jan  7 06:06:14 2020]  dsl_pool_sync+0xbc/0x410 [zfs]
[Tue Jan  7 06:06:14 2020]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]
[Tue Jan  7 06:06:14 2020]  spa_sync+0x312/0x5b0 [zfs]
[Tue Jan  7 06:06:14 2020]  txg_sync_thread+0x279/0x310 [zfs]
[Tue Jan  7 06:06:14 2020]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Tue Jan  7 06:06:14 2020]  thread_generic_wrapper+0x83/0xa0 [spl]
[Tue Jan  7 06:06:14 2020]  kthread+0x104/0x140
[Tue Jan  7 06:06:14 2020]  ? clear_bit+0x20/0x20 [spl]
[Tue Jan  7 06:06:14 2020]  ? kthread_park+0x80/0x80
[Tue Jan  7 06:06:14 2020]  ret_from_fork+0x35/0x40

@PrivatePuffin
Copy link
Contributor

@kebian
If you fill out the same little form thats used on the issue template (os, distro, zfsversion etc.), it would be usefull for confirmation.

@kebian
Copy link

kebian commented Jan 8, 2020

@kebian
If you fill out the same little form thats used on the issue template (os, distro, zfsversion etc.), it would be usefull for confirmation.

Ok, I edited my above comment and added those details. Thanks.

@PrivatePuffin
Copy link
Contributor

@kebian Awesome...

8GB ram, you sure you aren't out of memory?

@bghira
Copy link

bghira commented Jan 9, 2020

can you share the disk scheduler that is in use by the underlying storage devices?

@PrivatePuffin
Copy link
Contributor

can you share the disk scheduler that is in use by the underlying storage devices?

Welcome to github mister bug stuff! :)
Maybe it would be relevant to explain to reporters how they can find out (not everyone is as well versed in Linux) and why it might be relevant to the issue?

@Dinth
Copy link

Dinth commented May 7, 2020

Im having same issue on OpenMediaVault 5.3.10-1 with kernel 5.4.0-0-bpo.4-amd64 and ZFS v0.8.3-1~bpo10+1 although i remember having this issue for long time.
It is usually triggered when im trying to delete or move some files in the console and at the same time im accessing the storage over the network.

[52320.685399] INFO: task txg_sync:856 blocked for more than 120 seconds.
[52320.685435]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52320.685465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52320.685502] txg_sync        D    0   856      2 0x80004000
[52320.685505] Call Trace:
[52320.685513]  ? __schedule+0x2e6/0x6f0
[52320.685515]  schedule+0x2f/0xa0
[52320.685524]  taskq_wait+0x7a/0xc0 [spl]
[52320.685528]  ? finish_wait+0x80/0x80
[52320.685580]  dmu_objset_sync+0x357/0x560 [zfs]
[52320.685621]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52320.685663]  dsl_pool_sync+0xe8/0x510 [zfs]
[52320.685713]  spa_sync+0x5af/0xfb0 [zfs]
[52320.685764]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52320.685813]  txg_sync_thread+0x294/0x460 [zfs]
[52320.685817]  ? __switch_to_asm+0x34/0x70
[52320.685866]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52320.685871]  ? __thread_exit+0x20/0x20 [spl]
[52320.685876]  thread_generic_wrapper+0x6f/0x80 [spl]
[52320.685879]  kthread+0x112/0x130
[52320.685881]  ? kthread_park+0x80/0x80
[52320.685883]  ret_from_fork+0x35/0x40
[52320.685896] INFO: task z_rd_int:17925 blocked for more than 120 seconds.
[52320.685921]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52320.685950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52320.685977] z_rd_int        D    0 17925      2 0x80004000
[52320.685978] Call Trace:
[52320.685980]  ? __schedule+0x2e6/0x6f0
[52320.685981]  schedule+0x2f/0xa0
[52320.685982]  schedule_preempt_disabled+0xa/0x10
[52320.685984]  __mutex_lock.isra.8+0x2b5/0x4a0
[52320.686034]  vdev_queue_io_done+0x46/0x250 [zfs]
[52320.686036]  ? _cond_resched+0x15/0x30
[52320.686084]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52320.686133]  zio_execute+0x90/0xf0 [zfs]
[52320.686139]  taskq_thread+0x2df/0x520 [spl]
[52320.686141]  ? __switch_to_asm+0x34/0x70
[52320.686143]  ? wake_up_q+0x80/0x80
[52320.686202]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52320.686207]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52320.686209]  kthread+0x112/0x130
[52320.686210]  ? kthread_park+0x80/0x80
[52320.686212]  ret_from_fork+0x35/0x40
[52441.515498] INFO: task txg_sync:856 blocked for more than 241 seconds.
[52441.515549]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52441.515578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52441.515606] txg_sync        D    0   856      2 0x80004000
[52441.515609] Call Trace:
[52441.515617]  ? __schedule+0x2e6/0x6f0
[52441.515618]  schedule+0x2f/0xa0
[52441.515627]  taskq_wait+0x7a/0xc0 [spl]
[52441.515631]  ? finish_wait+0x80/0x80
[52441.515693]  dmu_objset_sync+0x357/0x560 [zfs]
[52441.515734]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52441.515776]  dsl_pool_sync+0xe8/0x510 [zfs]
[52441.515825]  spa_sync+0x5af/0xfb0 [zfs]
[52441.515876]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52441.515926]  txg_sync_thread+0x294/0x460 [zfs]
[52441.515929]  ? __switch_to_asm+0x34/0x70
[52441.515978]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52441.515984]  ? __thread_exit+0x20/0x20 [spl]
[52441.515988]  thread_generic_wrapper+0x6f/0x80 [spl]
[52441.515992]  kthread+0x112/0x130
[52441.515993]  ? kthread_park+0x80/0x80
[52441.515995]  ret_from_fork+0x35/0x40
[52441.516012] INFO: task z_rd_int:17925 blocked for more than 241 seconds.
[52441.516037]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52441.516081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52441.516115] z_rd_int        D    0 17925      2 0x80004000
[52441.516116] Call Trace:
[52441.516119]  ? __schedule+0x2e6/0x6f0
[52441.516120]  schedule+0x2f/0xa0
[52441.516121]  schedule_preempt_disabled+0xa/0x10
[52441.516123]  __mutex_lock.isra.8+0x2b5/0x4a0
[52441.516173]  vdev_queue_io_done+0x46/0x250 [zfs]
[52441.516176]  ? _cond_resched+0x15/0x30
[52441.516227]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52441.516288]  zio_execute+0x90/0xf0 [zfs]
[52441.516299]  taskq_thread+0x2df/0x520 [spl]
[52441.516301]  ? __switch_to_asm+0x34/0x70
[52441.516302]  ? wake_up_q+0x80/0x80
[52441.516350]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52441.516356]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52441.516357]  kthread+0x112/0x130
[52441.516359]  ? kthread_park+0x80/0x80
[52441.516360]  ret_from_fork+0x35/0x40
[52562.345526] INFO: task txg_sync:856 blocked for more than 362 seconds.
[52562.345561]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52562.345590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52562.345619] txg_sync        D    0   856      2 0x80004000
[52562.345621] Call Trace:
[52562.345630]  ? __schedule+0x2e6/0x6f0
[52562.345631]  schedule+0x2f/0xa0
[52562.345640]  taskq_wait+0x7a/0xc0 [spl]
[52562.345644]  ? finish_wait+0x80/0x80
[52562.345698]  dmu_objset_sync+0x357/0x560 [zfs]
[52562.345752]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52562.345792]  dsl_pool_sync+0xe8/0x510 [zfs]
[52562.345840]  spa_sync+0x5af/0xfb0 [zfs]
[52562.345889]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52562.345941]  txg_sync_thread+0x294/0x460 [zfs]
[52562.345944]  ? __switch_to_asm+0x34/0x70
[52562.346007]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52562.346012]  ? __thread_exit+0x20/0x20 [spl]
[52562.346017]  thread_generic_wrapper+0x6f/0x80 [spl]
[52562.346020]  kthread+0x112/0x130
[52562.346022]  ? kthread_park+0x80/0x80
[52562.346024]  ret_from_fork+0x35/0x40
[52562.346042] INFO: task z_rd_int:17925 blocked for more than 362 seconds.
[52562.346067]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52562.346096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52562.346123] z_rd_int        D    0 17925      2 0x80004000
[52562.346124] Call Trace:
[52562.346126]  ? __schedule+0x2e6/0x6f0
[52562.346128]  schedule+0x2f/0xa0
[52562.346129]  schedule_preempt_disabled+0xa/0x10
[52562.346130]  __mutex_lock.isra.8+0x2b5/0x4a0
[52562.346181]  vdev_queue_io_done+0x46/0x250 [zfs]
[52562.346183]  ? _cond_resched+0x15/0x30
[52562.346231]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52562.346280]  zio_execute+0x90/0xf0 [zfs]
[52562.346286]  taskq_thread+0x2df/0x520 [spl]
[52562.346288]  ? __switch_to_asm+0x34/0x70
[52562.346290]  ? wake_up_q+0x80/0x80
[52562.346337]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52562.346343]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52562.346345]  kthread+0x112/0x130
[52562.346346]  ? kthread_park+0x80/0x80
[52562.346348]  ret_from_fork+0x35/0x40
[52594.864825] systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
[52594.864886] systemd[1]: systemd-logind.service: Killing process 1379 (systemd-logind) with signal SIGABRT.
[52598.762708] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=6/ABRT
[52598.762923] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[52598.763820] systemd[1]: systemd-journald.service: Service has no hold-off time (RestartSec=0), scheduling restart.
[52598.764533] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[52598.765565] systemd[1]: systemd-journal-flush.service: Succeeded.
[52598.765650] systemd[1]: Stopped Flush Journal to Persistent Storage.
[52598.765680] systemd[1]: Stopping Flush Journal to Persistent Storage...
[52598.765752] systemd[1]: Stopped Journal Service.
[52598.982722] systemd-journald[17953]: File /run/log/journal/e4adb6ec49c44c1db846cadde006c4e4/system.journal corrupted or uncleanly shut down, renaming and replacing.
[383032.378146] INFO: task z_iput:752 blocked for more than 120 seconds.
[383032.378185]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.379018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.379833] z_iput          D    0   752      2 0x80004000
[383032.379837] Call Trace:
[383032.379847]  ? __schedule+0x2e6/0x6f0
[383032.379849]  schedule+0x2f/0xa0
[383032.379850]  io_schedule+0x12/0x40
[383032.379860]  cv_wait_common+0xaf/0x130 [spl]
[383032.379864]  ? finish_wait+0x80/0x80
[383032.379941]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.379980]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.380035]  zfs_rmnode+0x262/0x330 [zfs]
[383032.380089]  ? zfs_zinactive+0xdc/0xf0 [zfs]
[383032.380141]  zfs_inactive+0x84/0x210 [zfs]
[383032.380146]  ? unmap_mapping_pages+0x5e/0x130
[383032.380196]  zpl_evict_inode+0x3c/0x50 [zfs]
[383032.380201]  evict+0xd2/0x1a0
[383032.380207]  taskq_thread+0x2df/0x520 [spl]
[383032.380210]  ? wake_up_q+0x80/0x80
[383032.380216]  ? taskq_thread_spawn+0x50/0x50 [spl]
[383032.380219]  kthread+0x112/0x130
[383032.380221]  ? kthread_park+0x80/0x80
[383032.380224]  ret_from_fork+0x35/0x40
[383032.380228] INFO: task txg_sync:856 blocked for more than 120 seconds.
[383032.380998]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.381784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.382595] txg_sync        D    0   856      2 0x80004000
[383032.382597] Call Trace:
[383032.382600]  ? __schedule+0x2e6/0x6f0
[383032.382601]  schedule+0x2f/0xa0
[383032.382603]  schedule_timeout+0x159/0x310
[383032.382608]  ? __next_timer_interrupt+0xc0/0xc0
[383032.382609]  io_schedule_timeout+0x19/0x40
[383032.382617]  __cv_timedwait_common+0x125/0x160 [spl]
[383032.382633]  ? finish_wait+0x80/0x80
[383032.382695]  zio_wait+0x13a/0x280 [zfs]
[383032.382730]  ? bplist_iterate+0x126/0x140 [zfs]
[383032.382782]  spa_sync+0x5f0/0xfb0 [zfs]
[383032.382837]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[383032.382891]  txg_sync_thread+0x294/0x460 [zfs]
[383032.382893]  ? __switch_to_asm+0x34/0x70
[383032.382946]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[383032.382952]  ? __thread_exit+0x20/0x20 [spl]
[383032.382957]  thread_generic_wrapper+0x6f/0x80 [spl]
[383032.382960]  kthread+0x112/0x130
[383032.382961]  ? kthread_park+0x80/0x80
[383032.382963]  ret_from_fork+0x35/0x40
[383032.382972] INFO: task nfsd:1585 blocked for more than 120 seconds.
[383032.383754]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.384579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.385414] nfsd            D    0  1585      2 0x80004000
[383032.385415] Call Trace:
[383032.385417]  ? __schedule+0x2e6/0x6f0
[383032.385419]  ? __wake_up_common+0x7a/0x180
[383032.385420]  schedule+0x2f/0xa0
[383032.385421]  io_schedule+0x12/0x40
[383032.385426]  cv_wait_common+0xaf/0x130 [spl]
[383032.385428]  ? finish_wait+0x80/0x80
[383032.385483]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.385521]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.385573]  zfs_trunc+0x77/0x1f0 [zfs]
[383032.385625]  ? sa_lookup_locked+0x64/0xb0 [zfs]
[383032.385677]  zfs_freesp+0x9b/0x430 [zfs]
[383032.385729]  ? zfs_dirent_unlock+0xc0/0x140 [zfs]
[383032.385766]  ? dmu_object_size_from_db+0x22/0x80 [zfs]
[383032.385767]  ? _cond_resched+0x15/0x30
[383032.385768]  ? mutex_lock+0xe/0x30
[383032.385817]  ? rrw_exit+0x59/0x150 [zfs]
[383032.385869]  ? zfs_lookup+0x143/0x3a0 [zfs]
[383032.385919]  zpl_xattr_set_dir.isra.15+0x114/0x2a0 [zfs]
[383032.385970]  zpl_xattr_set+0x144/0x360 [zfs]
[383032.386021]  zpl_set_acl+0x14e/0x210 [zfs]
[383032.386071]  zpl_chmod_acl+0xc9/0xe0 [zfs]
[383032.386123]  zpl_setattr+0x150/0x180 [zfs]
[383032.386138]  notify_change+0x339/0x4a0
[383032.386155]  nfsd_setattr+0x3ff/0x470 [nfsd]
[383032.386176]  ? write_bytes_to_xdr_buf+0xbc/0xe0 [sunrpc]
[383032.386187]  nfsd4_setattr+0x7b/0x140 [nfsd]
[383032.386198]  nfsd4_proc_compound+0x381/0x6f0 [nfsd]
[383032.386206]  nfsd_dispatch+0x9e/0x210 [nfsd]
[383032.386221]  svc_process_common+0x342/0x6b0 [sunrpc]
[383032.386236]  ? svc_recv+0x2ff/0x980 [sunrpc]
[383032.386243]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
[383032.386251]  ? nfsd_destroy+0x50/0x50 [nfsd]
[383032.386264]  svc_process+0xd1/0x110 [sunrpc]
[383032.386272]  nfsd+0xe3/0x140 [nfsd]
[383032.386274]  kthread+0x112/0x130
[383032.386275]  ? kthread_park+0x80/0x80
[383032.386277]  ret_from_fork+0x35/0x40
[383032.386285] INFO: task mc:7614 blocked for more than 120 seconds.
[383032.387103]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.387947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.388793] mc              D    0  7614  30679 0x00000000
[383032.388794] Call Trace:
[383032.388797]  ? __schedule+0x2e6/0x6f0
[383032.388799]  ? __wake_up_common+0x7a/0x180
[383032.388800]  schedule+0x2f/0xa0
[383032.388801]  io_schedule+0x12/0x40
[383032.388805]  cv_wait_common+0xaf/0x130 [spl]
[383032.388807]  ? finish_wait+0x80/0x80
[383032.388863]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.388901]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.388953]  zfs_rmnode+0x262/0x330 [zfs]
[383032.389006]  ? zfs_zinactive+0xdc/0xf0 [zfs]
[383032.389058]  zfs_inactive+0x84/0x210 [zfs]
[383032.389061]  ? unmap_mapping_pages+0x5e/0x130
[383032.389111]  zpl_evict_inode+0x3c/0x50 [zfs]
[383032.389113]  evict+0xd2/0x1a0
[383032.389116]  do_unlinkat+0x263/0x300
[383032.389120]  do_syscall_64+0x52/0x160
[383032.389122]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[383032.389124] RIP: 0033:0x7fe113fadfc7
[383032.389131] Code: Bad RIP value.
[383032.389131] RSP: 002b:00007ffce7809e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
[383032.389133] RAX: ffffffffffffffda RBX: 000055cdbc571140 RCX: 00007fe113fadfc7
[383032.389134] RDX: 000055cdbc56e330 RSI: 00000000ffffffff RDI: 000055cdbc543500
[383032.389134] RBP: 000055cdbc5573a0 R08: 0000000000000000 R09: 0000000000000000
[383032.389135] R10: 0000000000000008 R11: 0000000000000202 R12: 000055cdbc52fd24
[383032.389136] R13: 00007ffce7809efc R14: 000055cdbc56bd10 R15: 000000ffff000000

@AlexOwen
Copy link
Author

AlexOwen commented May 7, 2020

@Dinth, how much RAM do you have, and how big are the files you're moving (total and individual) when you see the error?

My errors went away when I increased the RAM I have (from 8GB to 32GB), but obviously that was only a workaround and not a solution. It also only happened when moving thousands of files adding to over a terabyte across a file system boundary.

@Dinth
Copy link

Dinth commented May 7, 2020

@Dinth, how much RAM do you have, and how big are the files you're moving (total and individual) when you see the error?

16GB of RAM. The files ive been deleting and moving were not big. Today when this happened i've been moving 1GB file and deleting circa 200GB of files.

@Dinth
Copy link

Dinth commented May 7, 2020

Actually since the last time, zfs started doing that constantly, even after restart on idling system in single user mode and with pool unmounted. iostat -mx doesnt show anything abnormal and short SMART tests of drives were all successful. I have tried increasing RAM amount to 24 but has not fixed the issue.


[ 1572.097836] mount.zfs       D    0  3513      1 0x00004004
[ 1572.098251] Call Trace:
[ 1572.098690]  ? __schedule+0x2e6/0x6f0
[ 1572.099100]  schedule+0x2f/0xa0
[ 1572.099510]  io_schedule+0x12/0x40
[ 1572.099925]  cv_wait_common+0xaf/0x130 [spl]
[ 1572.100322]  ? finish_wait+0x80/0x80
[ 1572.100810]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1572.101264]  txg_wait_synced+0xc/0x40 [zfs]
[ 1572.101737]  zil_replay+0xdd/0x120 [zfs]
[ 1572.102191]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1572.102647]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1572.103107]  zpl_mount+0x13d/0x180 [zfs]
[ 1572.103539]  legacy_get_tree+0x27/0x40
[ 1572.103941]  vfs_get_tree+0x25/0xb0
[ 1572.104350]  do_mount+0x7ab/0x970
[ 1572.104825]  ksys_mount+0xb6/0xd0
[ 1572.105248]  __x64_sys_mount+0x21/0x30
[ 1572.105681]  do_syscall_64+0x52/0x160
[ 1572.106122]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1572.106580] RIP: 0033:0x7f7ba673afea
[ 1572.107049] Code: Bad RIP value.
[ 1572.107522] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1572.108064] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1572.108613] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1572.109151] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1572.109699] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1572.110258] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1581.262529] systemd-system-update-generator[1823]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1581.618931] systemd-system-update-generator[1847]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1581.916296] systemd-system-update-generator[1870]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1692.928764] INFO: task mount.zfs:3513 blocked for more than 483 seconds.
[ 1692.929912]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1692.930983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1692.932006] mount.zfs       D    0  3513      1 0x00004004
[ 1692.933126] Call Trace:
[ 1692.934263]  ? __schedule+0x2e6/0x6f0
[ 1692.935349]  schedule+0x2f/0xa0
[ 1692.936457]  io_schedule+0x12/0x40
[ 1692.937566]  cv_wait_common+0xaf/0x130 [spl]
[ 1692.938587]  ? finish_wait+0x80/0x80
[ 1692.939756]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1692.941040]  txg_wait_synced+0xc/0x40 [zfs]
[ 1692.942257]  zil_replay+0xdd/0x120 [zfs]
[ 1692.943486]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1692.944633]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1692.945829]  zpl_mount+0x13d/0x180 [zfs]
[ 1692.946849]  legacy_get_tree+0x27/0x40
[ 1692.947852]  vfs_get_tree+0x25/0xb0
[ 1692.948917]  do_mount+0x7ab/0x970
[ 1692.949909]  ksys_mount+0xb6/0xd0
[ 1692.950917]  __x64_sys_mount+0x21/0x30
[ 1692.951919]  do_syscall_64+0x52/0x160
[ 1692.952932]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1692.953822] RIP: 0033:0x7f7ba673afea
[ 1692.954743] Code: Bad RIP value.
[ 1692.955579] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1692.956480] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1692.957369] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1692.958246] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1692.959106] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1692.959966] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1813.760838] INFO: task mount.zfs:3513 blocked for more than 604 seconds.
[ 1813.760939]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1813.761065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.761196] mount.zfs       D    0  3513      1 0x00004004
[ 1813.761347] Call Trace:
[ 1813.761491]  ? __schedule+0x2e6/0x6f0
[ 1813.761641]  schedule+0x2f/0xa0
[ 1813.761794]  io_schedule+0x12/0x40
[ 1813.761978]  cv_wait_common+0xaf/0x130 [spl]
[ 1813.762168]  ? finish_wait+0x80/0x80
[ 1813.762419]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1813.762697]  txg_wait_synced+0xc/0x40 [zfs]
[ 1813.762964]  zil_replay+0xdd/0x120 [zfs]
[ 1813.763228]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1813.763508]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1813.763780]  zpl_mount+0x13d/0x180 [zfs]
[ 1813.764016]  legacy_get_tree+0x27/0x40
[ 1813.764265]  vfs_get_tree+0x25/0xb0
[ 1813.764537]  do_mount+0x7ab/0x970
[ 1813.764828]  ksys_mount+0xb6/0xd0
[ 1813.765101]  __x64_sys_mount+0x21/0x30
[ 1813.765383]  do_syscall_64+0x52/0x160
[ 1813.765671]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1813.765975] RIP: 0033:0x7f7ba673afea
[ 1813.766310] Code: Bad RIP value.
[ 1813.766627] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1813.766983] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1813.767346] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1813.767723] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1813.768109] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1813.768506] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1934.593065] INFO: task mount.zfs:3513 blocked for more than 724 seconds.
[ 1934.593432]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1934.593791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1934.594165] mount.zfs       D    0  3513      1 0x00004004
[ 1934.594534] Call Trace:
[ 1934.594901]  ? __schedule+0x2e6/0x6f0
[ 1934.595261]  schedule+0x2f/0xa0
[ 1934.595624]  io_schedule+0x12/0x40
[ 1934.596018]  cv_wait_common+0xaf/0x130 [spl]
[ 1934.596392]  ? finish_wait+0x80/0x80
[ 1934.596834]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1934.597302]  txg_wait_synced+0xc/0x40 [zfs]
[ 1934.597723]  zil_replay+0xdd/0x120 [zfs]
[ 1934.598134]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1934.598554]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1934.598972]  zpl_mount+0x13d/0x180 [zfs]
[ 1934.599349]  legacy_get_tree+0x27/0x40
[ 1934.599737]  vfs_get_tree+0x25/0xb0
[ 1934.600119]  do_mount+0x7ab/0x970
[ 1934.600500]  ksys_mount+0xb6/0xd0
[ 1934.600870]  __x64_sys_mount+0x21/0x30
[ 1934.601267]  do_syscall_64+0x52/0x160
[ 1934.601635]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1934.602012] RIP: 0033:0x7f7ba673afea
[ 1934.602395] Code: Bad RIP value.
[ 1934.602788] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1934.603231] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1934.603650] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1934.604065] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1934.604494] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1934.604945] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002

Linux 5.4.0-0.bpo.4-amd64 (r720-omv.wickhay) 	07/05/20 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.13    0.00    2.33   64.86    0.00   27.68

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              5.55   12.68      0.26      0.57     2.92    18.76  34.47  59.67    0.60    4.05   0.05    48.31    45.72   0.59   1.08
sdd             35.00    0.14      0.36      0.00     0.00     0.00   0.00   0.00   26.50    0.17   0.86    10.52    10.09   6.16  21.63
sdb             81.80    0.15      0.60      0.00     0.00     0.00   0.00   0.00   64.55    0.62   5.12     7.50    10.02   3.89  31.90
sdg             58.57    0.15      0.45      0.00     0.00     0.00   0.00   0.00   37.84    0.17   2.10     7.92    10.01   5.43  31.87
sde             81.44    0.12      0.60      0.00     0.00     0.00   0.00   0.00   64.16    0.62   5.07     7.50    11.63   3.95  32.21
sdc             35.10    0.15      0.36      0.00     0.00     0.00   0.00   0.34   26.67    0.15   0.87    10.52     9.44   6.16  21.71
sdf             58.76    0.15      0.45      0.00     0.00     0.00   0.00   0.00   37.55    0.15   2.09     7.92     9.64   5.45  32.10

(sda is not in a pool and on a different controller)

@scineram
Copy link

scineram commented May 7, 2020

Does it import read only? Can you scrub then? What kind of drives?

@Dinth
Copy link

Dinth commented May 7, 2020

It actually freezes the computer when trying to mount ZFS during normal boot.
Now I have managed to import the pool in single user mode:

root@r720-omv:~# zpool status -v Data
  pool: Data
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 1 days 20:48:00 with 0 errors on Mon Apr 13 21:12:01 2020
config:

	NAME                                         STATE     READ WRITE CKSUM
	Data                                         ONLINE       0     0     0
	  raidz2-0                                   ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HPDXJC  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PKA334PCG9L92N  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HPSVVC  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PKA334PCG9HBLN  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HM5K8C  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK1334PEJDE30S  ONLINE       0     0     0

and i have run'd zpool upgrade as thats what zpool is suggesting and i also found some posts online asking users to update the pool when it freezes a kernel.
So far the upgrade is running for 10+ mins without any feedback, im not sure if thats normal?
Sorry if its going slightly OT, but had this issue with host freezing during IO for long time, on two different hosts and its the first time when actually the freeze resulted in unmountable pool

@Dinth
Copy link

Dinth commented May 7, 2020

also iotop shows that txg_sync process is reading 250-290K/s, taking 99.9% of available IO.
Im still waiting for zpool upgrade to finish

@AlexOwen
Copy link
Author

AlexOwen commented May 7, 2020 via email

@queenkjuul
Copy link

queenkjuul commented May 21, 2020

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Linux Kernel 5.4.0-29-generic
Architecture i5-3570k / 24GB DDR3
zfs --version zfs-0.8.3-1ubuntu12 \ zfs-kmod-0.8.3-1ubuntu12

I seem to have hit this last night. I ran zpool remove dpool sdb which was the read cache disk. I had a little shy of 8GB of available RAM out of 24GB total. sdb is a 120GB SSD.

pool: dpool
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	dpool       ONLINE       0     0     0
	  sdc       ONLINE       0     0     0
	  sdd       ONLINE       0     0     0
	cache
	  sdb       ONLINE       0     0     0

errors: No known data errors
qkj@holodeck:/m/media    (=^・^=)ノ
➤➤➤➤➤ sudo zpool remove dpool sdb 

after that command, pretty much everything stopped responding. Apps that were previously open were responding, and I could ssh in, but most programs were not responding or hung when run.

The system just hung instantly, I saw <5% CPU utilization, RAM utilization did not change, and I wasn't able to see any disk IO

dmesg:

[283203.538506] Mem-Info:
[283203.538509] active_anon:921936 inactive_anon:379250 isolated_anon:0
                 active_file:506522 inactive_file:537943 isolated_file:0
                 unevictable:23542 dirty:243 writeback:0 unstable:0
                 slab_reclaimable:75422 slab_unreclaimable:516165
                 mapped:250435 shmem:140701 pagetables:12987 bounce:0
                 free:286341 free_pcp:71 free_cma:0
[283203.538512] Node 0 active_anon:3687744kB inactive_anon:1517000kB active_file:2026088kB inactive_file:2151772kB unevictable:94168kB isolated(anon):0kB isolated(file):0kB mapped:1001740kB dirty:972kB writeback:0kB shmem:562804kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[283203.538513] Node 0 DMA free:15892kB min:44kB low:56kB high:68kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[283203.538516] lowmem_reserve[]: 0 3317 23834 23834 23834
[283203.538517] Node 0 DMA32 free:369576kB min:9400kB low:12796kB high:16192kB active_anon:78872kB inactive_anon:59576kB active_file:37008kB inactive_file:17960kB unevictable:368kB writepending:0kB present:3551800kB managed:3486056kB mlocked:0kB kernel_stack:1048kB pagetables:3300kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[283203.538520] lowmem_reserve[]: 0 0 20516 20516 20516
[283203.538521] Node 0 Normal free:759896kB min:58136kB low:79144kB high:100152kB active_anon:3608348kB inactive_anon:1457424kB active_file:1989308kB inactive_file:2133752kB unevictable:93956kB writepending:972kB present:21485568kB managed:21016812kB mlocked:19044kB kernel_stack:21960kB pagetables:48648kB bounce:0kB free_pcp:268kB local_pcp:68kB free_cma:0kB
[283203.538523] lowmem_reserve[]: 0 0 0 0 0
[283203.538524] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[283203.538529] Node 0 DMA32: 13708*4kB (UE) 13186*8kB (UE) 12062*16kB (UE) 516*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 369824kB
[283203.538533] Node 0 Normal: 569*4kB (UME) 1506*8kB (UME) 43245*16kB (UME) 1617*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 0*4096kB = 760036kB
[283203.538538] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[283203.538538] 1188300 total pagecache pages
[283203.538540] 4030 pages in swap cache
[283203.538541] Swap cache stats: add 1236478, delete 1232441, find 912242/1105682
[283203.538542] Free swap  = 1745584kB
[283203.538542] Total swap = 2097148kB
[283203.538543] 6263340 pages RAM
[283203.538543] 0 pages HighMem/MovableOnly
[283203.538543] 133648 pages reserved
[283203.538543] 0 pages cma reserved
[283203.538544] 0 pages hwpoisoned
[287100.556647] INFO: task z_wr_iss:570 blocked for more than 120 seconds.
[287100.556650]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.556650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.556660] z_wr_iss        D    0   570      2 0x80004000
[287100.556662] Call Trace:
[287100.556667]  __schedule+0x2e3/0x740
[287100.556669]  schedule+0x42/0xb0
[287100.556671]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.556731]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.556734]  down_write+0x41/0x50
[287100.556766]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.556795]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.556837]  zio_ready+0x63/0x3f0 [zfs]
[287100.556878]  zio_execute+0x91/0xe0 [zfs]
[287100.556884]  taskq_thread+0x245/0x430 [spl]
[287100.556886]  ? wake_up_q+0x70/0x70
[287100.556936]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.556937]  kthread+0x104/0x140
[287100.556942]  ? task_done+0x90/0x90 [spl]
[287100.556943]  ? kthread_park+0x90/0x90
[287100.556945]  ret_from_fork+0x35/0x40
[287100.556947] INFO: task z_wr_iss:571 blocked for more than 120 seconds.
[287100.556948]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.556949] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.556950] z_wr_iss        D    0   571      2 0x80004000
[287100.556951] Call Trace:
[287100.556952]  __schedule+0x2e3/0x740
[287100.556953]  schedule+0x42/0xb0
[287100.556954]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.556990]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.556991]  down_write+0x41/0x50
[287100.557018]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.557042]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.557081]  zio_ready+0x63/0x3f0 [zfs]
[287100.557120]  zio_execute+0x91/0xe0 [zfs]
[287100.557124]  taskq_thread+0x245/0x430 [spl]
[287100.557126]  ? wake_up_q+0x70/0x70
[287100.557164]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.557165]  kthread+0x104/0x140
[287100.557169]  ? task_done+0x90/0x90 [spl]
[287100.557170]  ? kthread_park+0x90/0x90
[287100.557172]  ret_from_fork+0x35/0x40
[287100.557173] INFO: task z_wr_iss:572 blocked for more than 120 seconds.
[287100.557174]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557174] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557175] z_wr_iss        D    0   572      2 0x80004000
[287100.557176] Call Trace:
[287100.557177]  __schedule+0x2e3/0x740
[287100.557179]  ? sched_clock+0x9/0x10
[287100.557179]  schedule+0x42/0xb0
[287100.557181]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.557213]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557215]  down_write+0x41/0x50
[287100.557239]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.557262]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.557300]  zio_ready+0x63/0x3f0 [zfs]
[287100.557337]  zio_execute+0x91/0xe0 [zfs]
[287100.557341]  taskq_thread+0x245/0x430 [spl]
[287100.557343]  ? wake_up_q+0x70/0x70
[287100.557380]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.557381]  kthread+0x104/0x140
[287100.557385]  ? task_done+0x90/0x90 [spl]
[287100.557386]  ? kthread_park+0x90/0x90
[287100.557387]  ret_from_fork+0x35/0x40
[287100.557392] INFO: task txg_sync:647 blocked for more than 120 seconds.
[287100.557393]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557394] txg_sync        D    0   647      2 0x80004000
[287100.557395] Call Trace:
[287100.557396]  __schedule+0x2e3/0x740
[287100.557397]  schedule+0x42/0xb0
[287100.557407]  schedule_timeout+0x152/0x2f0
[287100.557409]  ? __next_timer_interrupt+0xe0/0xe0
[287100.557410]  io_schedule_timeout+0x1e/0x50
[287100.557413]  __cv_timedwait_common+0x15e/0x1c0 [spl]
[287100.557415]  ? wait_woken+0x80/0x80
[287100.557418]  __cv_timedwait_io+0x19/0x20 [spl]
[287100.557480]  zio_wait+0x11b/0x230 [zfs]
[287100.557515]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557545]  dsl_pool_sync+0xbc/0x410 [zfs]
[287100.557579]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]
[287100.557612]  spa_sync+0x312/0x5b0 [zfs]
[287100.557646]  txg_sync_thread+0x27a/0x310 [zfs]
[287100.557680]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[287100.557684]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.557685]  kthread+0x104/0x140
[287100.557688]  ? clear_bit+0x20/0x20 [spl]
[287100.557689]  ? kthread_park+0x90/0x90
[287100.557690]  ret_from_fork+0x35/0x40
[287100.557692] INFO: task systemd-journal:983 blocked for more than 120 seconds.
[287100.557693]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557694] systemd-journal D    0   983      1 0x00000120
[287100.557695] Call Trace:
[287100.557696]  __schedule+0x2e3/0x740
[287100.557697]  schedule+0x42/0xb0
[287100.557700]  cv_wait_common+0x133/0x180 [spl]
[287100.557701]  ? wait_woken+0x80/0x80
[287100.557704]  __cv_wait+0x15/0x20 [spl]
[287100.557740]  zfs_rangelock_enter_writer+0x41/0x150 [zfs]
[287100.557777]  zfs_rangelock_enter+0xb6/0xf0 [zfs]
[287100.557813]  zfs_trunc+0x53/0x200 [zfs]
[287100.557845]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
[287100.557877]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557908]  ? sa_lookup+0x7c/0x90 [zfs]
[287100.557945]  zfs_freesp+0xa8/0x2b0 [zfs]
[287100.557980]  ? zfs_zaccess_common+0x78/0xa0 [zfs]
[287100.558015]  ? zfs_zaccess+0x24b/0x3e0 [zfs]
[287100.558051]  zfs_setattr+0xd76/0x2230 [zfs]
[287100.558087]  ? zpl_xattr_get_dir+0x5e/0x130 [zfs]
[287100.558090]  ? atomic_dec+0xd/0x20 [spl]
[287100.558122]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.558153]  ? rrw_exit+0x67/0xc0 [zfs]
[287100.558156]  ? atomic_dec_and_test+0xd/0x20 [spl]
[287100.558159]  ? put_cred+0x1b/0x30 [spl]
[287100.558160]  ? _cond_resched+0x19/0x30
[287100.558161]  ? __kmalloc_node+0x205/0x320
[287100.558198]  ? zpl_xattr_get+0xdb/0x140 [zfs]
[287100.558201]  ? spl_kmem_alloc_impl+0xa8/0x100 [spl]
[287100.558237]  zpl_setattr+0xf7/0x160 [zfs]
[287100.558239]  notify_change+0x357/0x4e0
[287100.558241]  do_truncate+0x78/0xd0
[287100.558242]  do_sys_ftruncate+0x14a/0x1c0
[287100.558244]  __x64_sys_ftruncate+0x1b/0x20
[287100.558246]  do_syscall_64+0x57/0x190
[287100.558248]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[287100.558249] RIP: 0033:0x7f03b5813ccb
[287100.558253] Code: Bad RIP value.
[287100.558253] RSP: 002b:00007ffff85220c8 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
[287100.558254] RAX: ffffffffffffffda RBX: 00007ffff8522100 RCX: 00007f03b5813ccb
[287100.558255] RDX: 0000563417553f90 RSI: 0000000004800000 RDI: 0000000000000034
[287100.558255] RBP: 0000563417611240 R08: 0000000000000001 R09: 00005634176112c4
[287100.558256] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffff85220f8
[287100.558256] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[287100.558270] INFO: task txg_sync:2005 blocked for more than 120 seconds.
[287100.558271]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558273] txg_sync        D    0  2005      2 0x80004000
[287100.558273] Call Trace:
[287100.558275]  __schedule+0x2e3/0x740
[287100.558277]  ? __switch_to+0x7f/0x470
[287100.558278]  schedule+0x42/0xb0
[287100.558281]  cv_wait_common+0x133/0x180 [spl]
[287100.558282]  ? wait_woken+0x80/0x80
[287100.558285]  __cv_wait+0x15/0x20 [spl]
[287100.558318]  spa_config_enter+0x101/0x110 [zfs]
[287100.558352]  spa_txg_history_init_io+0x5e/0xd0 [zfs]
[287100.558385]  txg_sync_thread+0x260/0x310 [zfs]
[287100.558418]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[287100.558422]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.558423]  kthread+0x104/0x140
[287100.558426]  ? clear_bit+0x20/0x20 [spl]
[287100.558427]  ? kthread_park+0x90/0x90
[287100.558429]  ret_from_fork+0x35/0x40
[287100.558430] INFO: task mmp:2006 blocked for more than 120 seconds.
[287100.558431]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558432] mmp             D    0  2006      2 0x80004000
[287100.558433] Call Trace:
[287100.558434]  __schedule+0x2e3/0x740
[287100.558435]  ? __wake_up_common_lock+0x8a/0xc0
[287100.558436]  schedule+0x42/0xb0
[287100.558439]  cv_wait_common+0x133/0x180 [spl]
[287100.558440]  ? wait_woken+0x80/0x80
[287100.558443]  __cv_wait+0x15/0x20 [spl]
[287100.558475]  spa_config_enter+0x101/0x110 [zfs]
[287100.558509]  vdev_count_leaves+0x26/0x60 [zfs]
[287100.558540]  mmp_thread+0x2a4/0x5a0 [zfs]
[287100.558571]  ? mmp_write_uberblock+0x550/0x550 [zfs]
[287100.558575]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.558576]  kthread+0x104/0x140
[287100.558580]  ? clear_bit+0x20/0x20 [spl]
[287100.558580]  ? kthread_park+0x90/0x90
[287100.558582]  ret_from_fork+0x35/0x40
[287100.558627] INFO: task deluged:1671679 blocked for more than 120 seconds.
[287100.558627]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558629] deluged         D    0 1671679      1 0x00000000
[287100.558629] Call Trace:
[287100.558630]  __schedule+0x2e3/0x740
[287100.558631]  schedule+0x42/0xb0
[287100.558634]  cv_wait_common+0x133/0x180 [spl]
[287100.558635]  ? wait_woken+0x80/0x80
[287100.558638]  __cv_wait+0x15/0x20 [spl]
[287100.558672]  spa_config_enter+0x101/0x110 [zfs]
[287100.558708]  zfs_blkptr_verify+0x267/0x440 [zfs]
[287100.558744]  zio_read+0x42/0xc0 [zfs]
[287100.558766]  ? arc_buf_destroy+0x100/0x100 [zfs]
[287100.558788]  arc_read+0x876/0x1100 [zfs]
[287100.558810]  ? dbuf_rele_and_unlock+0x4e0/0x4e0 [zfs]
[287100.558846]  ? atomic_cmpxchg+0x16/0x30 [zfs]
[287100.558869]  dbuf_read_impl+0x1d2/0x610 [zfs]
[287100.558905]  ? zio_null+0x2f/0x40 [zfs]
[287100.558927]  dbuf_read+0xcb/0x5f0 [zfs]
[287100.558931]  ? spl_kmem_free_impl+0x23/0x30 [spl]
[287100.558954]  dbuf_hold_impl_arg+0x47a/0x5e0 [zfs]
[287100.558976]  dbuf_hold_impl+0x24/0x40 [zfs]
[287100.558998]  dbuf_hold_level+0x32/0x60 [zfs]
[287100.559020]  dbuf_hold+0x16/0x20 [zfs]
[287100.559044]  dmu_buf_hold_array_by_dnode+0xee/0x490 [zfs]
[287100.559068]  dmu_read_uio_dnode+0x49/0xf0 [zfs]
[287100.559104]  ? zrl_add_impl+0x31/0xb0 [zfs]
[287100.559128]  dmu_read_uio_dbuf+0x47/0x60 [zfs]
[287100.559164]  zfs_read+0x117/0x300 [zfs]
[287100.559199]  zpl_read_common_iovec+0x99/0xe0 [zfs]
[287100.559235]  zpl_iter_read_common+0xa8/0x100 [zfs]
[287100.559271]  zpl_iter_read+0x58/0x80 [zfs]
[287100.559272]  do_iter_readv_writev+0x1a3/0x1d0
[287100.559273]  do_iter_read+0xe2/0x1a0
[287100.559275]  vfs_readv+0x6e/0xb0
[287100.559276]  ? security_file_free+0x54/0x60
[287100.559278]  ? call_rcu+0x10/0x20
[287100.559279]  ? __fput+0x150/0x260
[287100.559280]  do_preadv+0x93/0xd0
[287100.559281]  __x64_sys_preadv+0x21/0x30
[287100.559283]  do_syscall_64+0x57/0x190
[287100.559284]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[287100.559285] RIP: 0033:0x7f536403d548
[287100.559286] Code: Bad RIP value.
[287100.559287] RSP: 002b:00007f535b7fd4a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[287100.559288] RAX: ffffffffffffffda RBX: 00007f535b7fd600 RCX: 00007f536403d548
[287100.559288] RDX: 0000000000000020 RSI: 00007f535b7fd600 RDI: 0000000000000026
[287100.559289] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[287100.559289] R10: 000000000becd2d0 R11: 0000000000000246 R12: 0000000000000020
[287100.559290] R13: 0000000000000026 R14: 0000000000000020 R15: 000000000becd2d0

Had to hard reset with the power button, then after a reboot zpool showed everything healthy and when I ran the same command zpool remove dpool sdb it completed instantly.

edit: after a reboot and reloading deluged, I found that a large number of files which had been open in deluge were all missing. On me for not stopping deluged before running the operation, but I also do not believe it should have crashed. Good thing for external backups, right? :)

@Dinth
Copy link

Dinth commented Sep 5, 2020

It was fine for past few months (although i wasnt doing any large IO) until yesterday i needed to delete some large files. It froze after few files, i have restarted the server, it booted normally, tried to delete those files again and again it froze after few files, but now i cannot boot as it fails to mount ZFS. So exactly same situation as a couple of months ago. I have increased system memory to 24GB of RAM, but it didnt help
Screenshot 2020-09-05 at 18 17 35

@justinclift
Copy link

@rejsmont Don't suppose you're using WD NVMe sticks (as per #14793, mentioned by @no-usernames-left above)?

@carmenguarino
Copy link

We are facing similar issue on a hetzner dedicated server, any clue on how to further troubleshoot our case and hopefully fix? The server got restarted multiple times per month and the root cause seems related to this, we are raising a case with hetzner support too to investigate the hardware error, please find below more info:

Type Version/Name
Debian 12.4
Kernel 6.1.0-17-amd64
ZFS Loaded module v2.1.11-1
ZFS pool version 5000
ZFS filesystem vers 5

zpool status
pool: pool0
state: ONLINE
scan: scrub repaired 0B in 1 days 17:19:33 with 0 errors on Mon Jun 10 17:44:25 2024
config:

    NAME        STATE     READ WRITE CKSUM
    pool0       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sda     ONLINE       0     0     0
        sdb     ONLINE       0     0     0

errors: No known data errors

zfs list
USED AVAIL
7.84T 8.54

free -g
total used free shared buff/cache available
Mem: 62 35 26 0 2 27
Swap: 31 0 31

2024-07-09T00:05:01.588756+02:00 CRON[10298]: (root) CMD (if zfs mount | grep -qs "/pool0/Ebackups" ; then /usr/bin/rsnapshot daily > /home/backupuser/log/rsnapshotDailyd ate +%a.log ; else echo "ZFS not mounted. Failed Rsnapshot" ; fi)

2024-07-09T00:08:49.591141+02:00 kernel: [62833.725235] INFO: task txg_sync:1440 blocked for more than 120 seconds.
2024-07-09T00:08:49.591154+02:00 kernel: [62833.725246] Tainted: P OE 6.1.0-17-amd64 #1 Debian 6.1.69-1
2024-07-09T00:08:49.591155+02:00 kernel: [62833.725251] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2024-07-09T00:08:49.591155+02:00 kernel: [62833.725255] task:txg_sync state:D stack:0 pid:1440 ppid:2 flags:0x00004000
2024-07-09T00:08:49.591157+02:00 kernel: [62833.725264] Call Trace:
2024-07-09T00:08:49.591157+02:00 kernel: [62833.725266]
2024-07-09T00:08:49.591158+02:00 kernel: [62833.725272] __schedule+0x34d/0x9e0
2024-07-09T00:08:49.591158+02:00 kernel: [62833.725281] schedule+0x5a/0xd0
2024-07-09T00:08:49.591159+02:00 kernel: [62833.725285] schedule_timeout+0x94/0x150
2024-07-09T00:08:49.591160+02:00 kernel: [62833.725290] ? __bpf_trace_tick_stop+0x10/0x10
2024-07-09T00:08:49.591161+02:00 kernel: [62833.725296] io_schedule_timeout+0x4c/0x80
2024-07-09T00:08:49.591161+02:00 kernel: [62833.725301] __cv_timedwait_common+0x12f/0x170 [spl]
2024-07-09T00:08:49.591162+02:00 kernel: [62833.725316] ? cpuusage_read+0x10/0x10
2024-07-09T00:08:49.591163+02:00 kernel: [62833.725322] __cv_timedwait_io+0x15/0x20 [spl]
2024-07-09T00:08:49.591164+02:00 kernel: [62833.725336] zio_wait+0x136/0x2b0 [zfs]
2024-07-09T00:08:49.591164+02:00 kernel: [62833.725478] ? bplist_iterate+0x101/0x120 [zfs]
2024-07-09T00:08:49.591165+02:00 kernel: [62833.725589] spa_sync+0x5b3/0xf90 [zfs]
2024-07-09T00:08:49.591165+02:00 kernel: [62833.725721] ? mutex_lock+0xe/0x30
2024-07-09T00:08:49.591177+02:00 kernel: [62833.725725] ? spa_txg_history_init_io+0x113/0x120 [zfs]
2024-07-09T00:08:49.591178+02:00 kernel: [62833.725857] txg_sync_thread+0x227/0x3e0 [zfs]
2024-07-09T00:08:49.591179+02:00 kernel: [62833.725987] ? txg_fini+0x260/0x260 [zfs]
2024-07-09T00:08:49.591179+02:00 kernel: [62833.726109] ? __thread_exit+0x20/0x20 [spl]
2024-07-09T00:08:49.591180+02:00 kernel: [62833.726124] thread_generic_wrapper+0x5a/0x70 [spl]
2024-07-09T00:08:49.591180+02:00 kernel: [62833.726139] kthread+0xda/0x100
2024-07-09T00:08:49.591181+02:00 kernel: [62833.726144] ? kthread_complete_and_exit+0x20/0x20
2024-07-09T00:08:49.591182+02:00 kernel: [62833.726149] ret_from_fork+0x22/0x30
2024-07-09T00:08:49.591182+02:00 kernel: [62833.726157]

2024-07-09T00:20:05.431212+02:00 kernel: [63509.565341] mce: [Hardware Error]: Machine check events logged
2024-07-09T00:20:05.431223+02:00 kernel: [63509.565352] [Hardware Error]: Corrected error, no action required.
2024-07-09T00:20:05.431224+02:00 kernel: [63509.565356] [Hardware Error]: CPU:0 (17:71:0) MC17_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|Scrub]: 0xdc2041000000011b
2024-07-09T00:20:05.431225+02:00 kernel: [63509.565367] [Hardware Error]: Error Addr: 0x00000003c24f2c00
2024-07-09T00:20:05.431226+02:00 kernel: [63509.565370] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x004000400a801203
2024-07-09T00:20:05.431227+02:00 kernel: [63509.565376] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
2024-07-09T00:20:05.431228+02:00 kernel: [63509.565390] EDAC MC0: 1 CE on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0xf493cb offset:0x0 grain:64 syndrome:0x40)

2024-07-09T00:24:01.595919+02:00 backup-hetzner CRON[265086]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi)

2024-07-09T00:26:23.063173+02:00 systemd[1]: Configuration file /etc/systemd/system/zfs-load-key-custom.service is marked executable. Please remove executable permission b
its. Proceeding anyway.

cat /etc/systemd/system/zfs-load-key-custom.service
[Unit]
Description=Load ZFS encryption keys
DefaultDependencies=no
After=zfs-import.target
Before=zfs-mount.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/zfs load-key -a
StandardInput=tty-force

[Install]
WantedBy=zfs-mount.service

Any help is appreciated!

@prgwiz
Copy link

prgwiz commented Jul 10, 2024 via email

@bombcar
Copy link

bombcar commented Aug 8, 2024

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version Latest
Kernel Version 6.6.38-gentoo-x86_64
Architecture x64
OpenZFS Version zfs-2.2.4-r0-gentoo

Describe the problem you're observing

Got the above to happen, free indicated that the kernel was using something like 50GB, and it was after using mv to copy 1.5 TB from one volume to another in the same pool

Describe how to reproduce the problem

Not sure, above might do it but I'm not going to try to copy 1.5TB again.

Include any warning/errors/backtraces from the system logs

Aug  2 11:37:17 tank kernel: INFO: task txg_sync:2540 blocked for more than 122 seconds.
Aug  2 11:37:17 tank kernel:       Tainted: P          IOE      6.6.38-gentoo-x86_64 #1
Aug  2 11:37:17 tank kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  2 11:37:17 tank kernel: task:txg_sync        state:D stack:0     pid:2540  ppid:2      flags:0x00004000
Aug  2 11:37:17 tank kernel: Call Trace:
Aug  2 11:37:17 tank kernel:  <TASK>
Aug  2 11:37:17 tank kernel:  __schedule+0x3ca/0xb10
Aug  2 11:37:17 tank kernel:  schedule+0x5e/0xd0
Aug  2 11:37:17 tank kernel:  schedule_timeout+0x8b/0x160
Aug  2 11:37:17 tank kernel:  ? __pfx_process_timeout+0x10/0x10
Aug  2 11:37:17 tank kernel:  io_schedule_timeout+0x50/0x80
Aug  2 11:37:17 tank kernel:  __cv_wait_idle+0x1bf/0x200 [spl]
Aug  2 11:37:17 tank kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
Aug  2 11:37:17 tank kernel:  __cv_timedwait_io+0x19/0x20 [spl]
Aug  2 11:37:17 tank kernel:  zio_wait+0x128/0x770 [zfs]
Aug  2 11:37:17 tank kernel:  dsl_pool_sync+0xfb/0x520 [zfs]
Aug  2 11:37:17 tank kernel:  spa_sync+0x582/0x1630 [zfs]
Aug  2 11:37:17 tank kernel:  ? spa_txg_history_init_io+0x106/0x110 [zfs]
Aug  2 11:37:17 tank kernel:  txg_fini+0x478/0x610 [zfs]
Aug  2 11:37:17 tank kernel:  ? txg_fini+0x270/0x610 [zfs]
Aug  2 11:37:17 tank kernel:  ? spl_taskq_fini+0x90/0x100 [spl]
Aug  2 11:37:17 tank kernel:  spl_taskq_fini+0xee/0x100 [spl]
Aug  2 11:37:17 tank kernel:  kthread+0xe8/0x120
Aug  2 11:37:17 tank kernel:  ? __pfx_kthread+0x10/0x10
Aug  2 11:37:17 tank kernel:  ret_from_fork+0x34/0x50
Aug  2 11:37:17 tank kernel:  ? __pfx_kthread+0x10/0x10
Aug  2 11:37:17 tank kernel:  ret_from_fork_asm+0x1b/0x30
Aug  2 11:37:17 tank kernel:  </TASK>

tank    size                           10.9T                          -
tank    capacity                       65%                            -
tank    altroot                        -                              default
tank    health                         ONLINE                         -
tank    guid                           5355146826367740161            -
tank    version                        -                              default
tank    bootfs                         -                              default
tank    delegation                     on                             default
tank    autoreplace                    off                            default
tank    cachefile                      -                              default
tank    failmode                       wait                           default
tank    listsnapshots                  on                             local
tank    autoexpand                     on                             local
tank    dedupratio                     1.00x                          -
tank    free                           3.79T                          -
tank    allocated                      7.08T                          -
tank    readonly                       off                            -
tank    ashift                         12                             local
tank    comment                        -                              default
tank    expandsize                     -                              -
tank    freeing                        0                              -
tank    fragmentation                  15%                            -
tank    leaked                         0                              -
tank    multihost                      off                            default
tank    checkpoint                     -                              -
tank    load_guid                      3471020367539754646            -
tank    autotrim                       off                            default
tank    compatibility                  off                            default
tank    bcloneused                     0                              -
tank    bclonesaved                    0                              -
tank    bcloneratio                    1.00x                          -
tank    feature@async_destroy          enabled                        local
tank    feature@empty_bpobj            active                         local
tank    feature@lz4_compress           active                         local
tank    feature@multi_vdev_crash_dump  enabled                        local
tank    feature@spacemap_histogram     active                         local
tank    feature@enabled_txg            active                         local
tank    feature@hole_birth             active                         local
tank    feature@extensible_dataset     active                         local
tank    feature@embedded_data          active                         local
tank    feature@bookmarks              enabled                        local
tank    feature@filesystem_limits      enabled                        local
tank    feature@large_blocks           active                         local
tank    feature@large_dnode            active                         local
tank    feature@sha512                 enabled                        local
tank    feature@skein                  enabled                        local
tank    feature@edonr                  enabled                        local
tank    feature@userobj_accounting     active                         local
tank    feature@encryption             enabled                        local
tank    feature@project_quota          active                         local
tank    feature@device_removal         enabled                        local
tank    feature@obsolete_counts        enabled                        local
tank    feature@zpool_checkpoint       enabled                        local
tank    feature@spacemap_v2            active                         local
tank    feature@allocation_classes     enabled                        local
tank    feature@resilver_defer         enabled                        local
tank    feature@bookmark_v2            enabled                        local
tank    feature@redaction_bookmarks    enabled                        local
tank    feature@redacted_datasets      enabled                        local
tank    feature@bookmark_written       enabled                        local
tank    feature@log_spacemap           active                         local
tank    feature@livelist               enabled                        local
tank    feature@device_rebuild         enabled                        local
tank    feature@zstd_compress          active                         local
tank    feature@draid                  enabled                        local
tank    feature@zilsaxattr             active                         local
tank    feature@head_errlog            active                         local
tank    feature@blake3                 enabled                        local
tank    feature@block_cloning          enabled                        local
tank    feature@vdev_zaps_v2           active                         local

tank log # zfs get all tank/shows
NAME        PROPERTY              VALUE                  SOURCE
tank/shows  type                  filesystem             -
tank/shows  creation              Fri Aug  2  7:59 2024  -
tank/shows  used                  1.41T                  -
tank/shows  available             3.67T                  -
tank/shows  referenced            1.41T                  -
tank/shows  compressratio         1.01x                  -
tank/shows  mounted               yes                    -
tank/shows  quota                 none                   default
tank/shows  reservation           none                   default
tank/shows  recordsize            128K                   default
tank/shows  mountpoint            /tank/shows            local
tank/shows  sharenfs              off                    default
tank/shows  checksum              on                     default
tank/shows  compression           lz4                    inherited from tank
tank/shows  atime                 off                    inherited from tank
tank/shows  devices               on                     default
tank/shows  exec                  on                     default
tank/shows  setuid                on                     default
tank/shows  readonly              off                    default
tank/shows  zoned                 off                    default
tank/shows  snapdir               hidden                 default
tank/shows  aclmode               discard                default
tank/shows  aclinherit            restricted             default
tank/shows  createtxg             32809370               -
tank/shows  canmount              on                     default
tank/shows  xattr                 sa                     inherited from tank
tank/shows  copies                1                      default
tank/shows  version               5                      -
tank/shows  utf8only              off                    -
tank/shows  normalization         none                   -
tank/shows  casesensitivity       sensitive              -
tank/shows  vscan                 off                    default
tank/shows  nbmand                off                    default
tank/shows  sharesmb              off                    default
tank/shows  refquota              none                   default
tank/shows  refreservation        none                   default
tank/shows  guid                  2981344572069956086    -
tank/shows  primarycache          all                    default
tank/shows  secondarycache        all                    default
tank/shows  usedbysnapshots       0B                     -
tank/shows  usedbydataset         1.41T                  -
tank/shows  usedbychildren        0B                     -
tank/shows  usedbyrefreservation  0B                     -
tank/shows  logbias               latency                default
tank/shows  objsetid              22884                  -
tank/shows  dedup                 off                    default
tank/shows  mlslabel              none                   default
tank/shows  sync                  standard               default
tank/shows  dnodesize             auto                   inherited from tank
tank/shows  refcompressratio      1.01x                  -
tank/shows  written               1.41T                  -
tank/shows  logicalused           1.43T                  -
tank/shows  logicalreferenced     1.43T                  -
tank/shows  volmode               default                default
tank/shows  filesystem_limit      none                   default
tank/shows  snapshot_limit        none                   default
tank/shows  filesystem_count      none                   default
tank/shows  snapshot_count        none                   default
tank/shows  snapdev               hidden                 default
tank/shows  acltype               off                    default
tank/shows  context               none                   default
tank/shows  fscontext             none                   default
tank/shows  defcontext            none                   default
tank/shows  rootcontext           none                   default
tank/shows  relatime              on                     default
tank/shows  redundant_metadata    all                    default
tank/shows  overlay               on                     default
tank/shows  encryption            off                    default
tank/shows  keylocation           none                   default
tank/shows  keyformat             none                   default
tank/shows  pbkdf2iters           0                      default
tank/shows  special_small_blocks  0                      default
tank/shows  prefetch              all                    default

Note the date; the system seemed to be working fine since then and I only noticed it because I compulsively do dmesg -c when I login. Rebooted it and it seems to be fine (scrubbing now).

@justinclift
Copy link

Ouch, so 2.2.4 still might have lurking bugs in it.

@bombcar Did the scrub show any errors on disk or other weirdness?

@kyle0r
Copy link

kyle0r commented Aug 9, 2024

@bombcar what hardware, drives CMR/SMR?, connectors, controller and backplane are you using?

Can you share zpool status?

@bombcar
Copy link

bombcar commented Aug 9, 2024

tank ~ # zpool status
  pool: backup
 state: ONLINE
config:

	NAME                                 STATE     READ WRITE CKSUM
	backup                               ONLINE       0     0     0
	  usb-TANDBERG_RDX_008630871573-0:0  ONLINE       0     0     0

errors: No known data errors

  pool: root
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:38:43 with 0 errors on Thu Aug  8 16:15:32 2024
config:

	NAME                                            STATE     READ WRITE CKSUM
	root                                            ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    ata-KINGSTON_SA400S37120G_50026B7682479661  ONLINE       0     0     0
	    ata-KINGSTON_SA400S37120G_50026B76826284DD  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 13:17:33 with 0 errors on Fri Aug  9 04:54:27 2024
config:

	NAME                                   STATE     READ WRITE CKSUM
	tank                                   ONLINE       0     0     0
	  mirror-0                             ONLINE       0     0     0
	    ata-ST12000NM0117-2GY101_ZJV33F0D  ONLINE       0     0     0
	    ata-ST12000NM0117-2GY101_ZJV374JH  ONLINE       0     0     0

errors: No known data errors

Ok that "TANDBERG" thing is a USB RDX drive, which can sometimes ... be a bit touchy, but it's been working. I don't know how to tell if the traceback came from the main pool, root pool, or the backup pool.

There were no entries in the kernel log indicating disk problems whatsoever:

Jul 30 09:43:52 tank kernel: sd 1:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
Jul 30 09:43:52 tank kernel:  sde: sde1 sde9
Jul 30 09:43:52 tank kernel: sd 1:0:0:0: [sde] Attached SCSI removable disk
Aug  1 11:52:06 tank kernel: hrtimer: interrupt took 9775 ns
Aug  2 11:37:17 tank kernel: INFO: task txg_sync:2540 blocked for more than 122 seconds.
Aug  2 11:37:17 tank kernel:       Tainted: P          IOE      6.6.38-gentoo-x86_64 #1

The system is an older Dell Chassis thing with this controller:

45: PCI 200.0: 0107 Serial Attached SCSI controller
  [Created at pci.386]
  Unique ID: B35A.SEpp8W9fkWA
  Parent ID: 3hqH.0IUtK3DrB2E
  SysFS ID: /devices/pci0000:00/0000:00:03.0/0000:02:00.0
  SysFS BusID: 0000:02:00.0
  Hardware Class: storage
  Model: "Dell 6Gbps SAS HBA Adapter"
  Vendor: pci 0x1000 "Broadcom / LSI"
  Device: pci 0x0072 "SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]"
  SubVendor: pci 0x1028 "Dell"
  SubDevice: pci 0x1f1c "6Gbps SAS HBA Adapter"
  Revision: 0x03
  Driver: "mpt3sas"
  Driver Modules: "mpt3sas"
  I/O Ports: 0xfc00-0xfcff (rw)
  Memory Range: 0xdf2b0000-0xdf2bffff (rw,non-prefetchable)
  Memory Range: 0xdf2c0000-0xdf2fffff (rw,non-prefetchable)
  Memory Range: 0xdf100000-0xdf1fffff (ro,non-prefetchable,disabled)
  IRQ: 32 (no events)
  Module Alias: "pci:v00001000d00000072sv00001028sd00001F1Cbc01sc07i00"
  Driver Info #0:
    Driver Status: mpt3sas is active
    Driver Activation Cmd: "modprobe mpt3sas"
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #78 (PCI bridge)

I can get the full hwinfo but not sure if it has personal data in it.

tank/shows was created Fri Aug 2 7:59 2024 and the mv started immediately after - and the copy took quite awhile, so the error appears to have occurred during the copy (Aug 2 11:37:17) assuming both are the same timezone.

The copy appears to have finished correctly, hopefully I didn't get file corruption (which I don't really have any way to see above ZFS itself).

@kyle0r
Copy link

kyle0r commented Aug 9, 2024

My memory could be wrong but I think it's more likely to see this timeout when writing to USB drives because of the nature of the usb-hdd "pipeline" and the fact this pipeline can see increased latency and drop outs/disconnects, and I'd wager the chances are higher (of seeing this timeout) when the throughput or IOPS are reaching the saturation point of the given device or interface.

I couldn't quickly decode the model of drive in the USB drive. If it's a larger capacity it's very likely to be SMR. SMR drives are known to have performance issues with the current OpenZFS codebase.

I know this is somewhat anecdotal feedback but that is what I can say after watching this issue for some years.

@bombcar
Copy link

bombcar commented Aug 10, 2024

It’s possible, though the copy was from tank to tank and wasn’t hitting the USB at all.

@kyle0r
Copy link

kyle0r commented Aug 10, 2024

It’s possible, though the copy was from tank to tank and wasn’t hitting the USB at all.

Ah, ok. So your tank zpool looks like Seagate EXOS X 12TB CMR drives.

The copy is very likely to of been fine but performance would of suffered during the operation especially around the time of the error. ZFS should throw CRC errors if something went wrong with the data integrity during the operation. I've experienced this error multiple times (dozens) and never seen data integrity issues, only performance issues.

For peace of mind, if you still have both SRC and DST files/datasets you could use a fast hashing algorithm like xxh to do a checksum of both SRC and DST files. You can also look at https://github.com/psy0rz/zfs_autobackup/wiki/zfs-check

So we are starting to get off topic. If you want to chat further send me a DM on reddit. Same username.

@richardm1
Copy link

richardm1 commented Sep 22, 2024

Just hit this issue on TrueNAS SCALE 24.10-BETA.1. Two slow HDD mirror vdevs accessed via iSCSI; performing a large file copy from a zvol to another zvol on the same pool. The system managed to unstick itself after a minute or so. It was so hung up that my continuous arcstat sat motionless....
Failing HDD - pls disregard.

@Dark3clipse
Copy link

Dark3clipse commented Oct 12, 2024

Just experienced this on my system:

12x MG09 18TiB Toshiba
dell HBA expander
lsi 9211-8i HBA
128GB DDR4 RAM
16C/32T Ryzen 9 5950X

Root is on separate zfs pool with 2x SATA ssd.

nfs hung, client devices hung, restarting nfs-server hung. Host is running proxmox VE 8.2.4. ZFS version in use was 2.2.4. Only resolution was a force shutdown.

@justinclift
Copy link

Interesting. Proxmox has ZFS 2.2.6 in it's repos. Well, that's in it's no-subscription repo anyway, I'm not sure about the enterprise repo.

No idea if that would potentially help eliminate this bug though.

@Dark3clipse
Copy link

Dark3clipse commented Oct 14, 2024

I had to recover the pool as it wouldn't import after the failure. I had to use the -F option to import it, which worked luckily. This caused a considerate downtime for me. I've now updated to zfs 2.2.6 which was indeed in the repo.

I sure hope this won't happen again.

EDIT: occurred again 28/10/2024

@midzelis
Copy link

midzelis commented Dec 9, 2024

Hitting this issue now. Using proxmox 8.3, zfs 2.2.6, kernel 6.8.12-4.

Almost exact stack as #9130 (comment)

[  616.403972] INFO: task txg_sync:2220 blocked for more than 122 seconds.
[  616.403998]       Tainted: P           O       6.8.12-4-pve #1
[  616.404011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  616.404025] task:txg_sync        state:D stack:0     pid:2220  tgid:2220  ppid:2      flags:0x00004000
[  616.404045] Call Trace:
[  616.404057]  <TASK>
[  616.404069]  __schedule+0x401/0x15e0
[  616.404085]  schedule+0x33/0x110
[  616.404095]  schedule_timeout+0x95/0x170
[  616.404107]  ? __pfx_process_timeout+0x10/0x10
[  616.404120]  io_schedule_timeout+0x51/0x80
[  616.404133]  __cv_timedwait_common+0x140/0x180 [spl]
[  616.404156]  ? __pfx_autoremove_wake_function+0x10/0x10
[  616.404171]  __cv_timedwait_io+0x19/0x30 [spl]
[  616.404193]  zio_wait+0x13a/0x2c0 [zfs]
[  616.404417]  ? bplist_iterate+0xe7/0x110 [zfs]
[  616.404633]  spa_sync+0x5c7/0x1050 [zfs]
[  616.404876]  ? spa_txg_history_init_io+0x120/0x130 [zfs]
[  616.405104]  txg_sync_thread+0x207/0x3a0 [zfs]
[  616.405320]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[  616.405530]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[  616.405555]  thread_generic_wrapper+0x5c/0x70 [spl]
[  616.405577]  kthread+0xef/0x120
[  616.405588]  ? __pfx_kthread+0x10/0x10
[  616.405598]  ret_from_fork+0x44/0x70
[  616.405609]  ? __pfx_kthread+0x10/0x10
[  616.405619]  ret_from_fork_asm+0x1b/0x30
[  616.405632]  </TASK>
[  616.405664] INFO: task zfs:7338 blocked for more than 122 seconds.
[  616.405677]       Tainted: P           O       6.8.12-4-pve #1
[  616.405688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  616.405701] task:zfs             state:D stack:0     pid:7338  tgid:7324  ppid:3251   flags:0x00004006
[  616.405719] Call Trace:
[  616.405726]  <TASK>
[  616.405732]  __schedule+0x401/0x15e0
[  616.405743]  ? ttwu_queue_wakelist+0x101/0x110
[  616.405762]  schedule+0x33/0x110
[  616.405772]  io_schedule+0x46/0x80
[  616.405782]  cv_wait_common+0xac/0x140 [spl]
[  616.406158]  ? __pfx_autoremove_wake_function+0x10/0x10
[  616.406501]  __cv_wait_io+0x18/0x30 [spl]
[  616.406862]  txg_wait_synced_impl+0xe1/0x130 [zfs]
[  616.407402]  txg_wait_synced+0x10/0x60 [zfs]
[  616.407932]  zil_replay+0xa1/0x150 [zfs]
[  616.408452]  zfsvfs_setup+0x244/0x280 [zfs]
[  616.408972]  zfs_domount+0x4b2/0x610 [zfs]
[  616.409478]  ? dbuf_rele_and_unlock+0x3a8/0x570 [zfs]
[  616.410014]  zpl_mount+0x286/0x300 [zfs]
[  616.410521]  legacy_get_tree+0x28/0x60
[  616.410845]  vfs_get_tree+0x27/0x100
[  616.411159]  path_mount+0x4f3/0xb30
[  616.411476]  ? putname+0x5b/0x80
[  616.411799]  __x64_sys_mount+0x127/0x160
[  616.412113]  x64_sys_call+0x2209/0x24b0
[  616.412427]  do_syscall_64+0x81/0x170
[  616.412749]  ? do_syscall_64+0x8d/0x170
[  616.413068]  ? __count_memcg_events+0x6f/0xe0
[  616.413388]  ? count_memcg_events.constprop.0+0x2a/0x50
[  616.413711]  ? handle_mm_fault+0xad/0x380
[  616.414041]  ? do_user_addr_fault+0x337/0x660
[  616.414371]  ? irqentry_exit_to_user_mode+0x7b/0x260
[  616.414703]  ? irqentry_exit+0x43/0x50
[  616.415045]  ? exc_page_fault+0x94/0x1b0
[  616.415380]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  616.415714] RIP: 0033:0x7da268ad7b3a
[  616.416069] RSP: 002b:00007da2669f2648 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  616.416400] RAX: ffffffffffffffda RBX: 00007da2669f59e0 RCX: 00007da268ad7b3a
[  616.416728] RDX: 00007da268c5b211 RSI: 00007da2669f6a40 RDI: 0000645f2eaebed0
[  616.417062] RBP: 00007da2669f4710 R08: 00007da2669f36c0 R09: 00000000ffffffff
[  616.417393] R10: 0000000000000400 R11: 0000000000000246 R12: 0000645f2eaebec0
[  616.417724] R13: 00007da2669f6a40 R14: 0000645f2eaebed0 R15: 00007da2669f36c0
[  616.418063]  </TASK>

This happens during zfs automount on boot. After I used the systemd debug shell to mask that job, I was able to start the system.

However, it happened again - triggered by copying large (4TB) files from remove over 10gbe network connection, using smb running in a LXC host. In fact, the very first time I hit this issue, it was caused by the same file copy operation.

I'm also running zrepl, which constantly creates snapshots, and performed send/recv to an remote machine. It prunes old snapshots occasionally.

Here are the dmesg logs for when it happened again, slightly different stack

[56402.750527]       Tainted: P           O       6.8.12-4-pve #1
[56402.751058] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[56402.751585] task:txg_sync        state:D stack:0     pid:2220  tgid:2220  ppid:2      flags:0x00004000
[56402.752119] Call Trace:
[56402.752630]  <TASK>
[56402.753125]  __schedule+0x401/0x15e0
[56402.753608]  schedule+0x33/0x110
[56402.754088]  schedule_timeout+0x95/0x170
[56402.754550]  ? __pfx_process_timeout+0x10/0x10
[56402.755019]  io_schedule_timeout+0x51/0x80
[56402.755501]  __cv_timedwait_common+0x140/0x180 [spl]
[56402.755979]  ? __pfx_autoremove_wake_function+0x10/0x10
[56402.756449]  __cv_timedwait_io+0x19/0x30 [spl]
[56402.756874]  zio_wait+0x13a/0x2c0 [zfs]
[56402.757490]  ? bplist_iterate+0xe7/0x110 [zfs]
[56402.758098]  spa_sync+0x5c7/0x1050 [zfs]
[56402.758707]  ? spa_txg_history_init_io+0x120/0x130 [zfs]
[56402.759305]  txg_sync_thread+0x207/0x3a0 [zfs]
[56402.759905]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[56402.760498]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[56402.760898]  thread_generic_wrapper+0x5c/0x70 [spl]
[56402.761295]  kthread+0xef/0x120
[56402.761688]  ? __pfx_kthread+0x10/0x10
[56402.762069]  ret_from_fork+0x44/0x70
[56402.762443]  ? __pfx_kthread+0x10/0x10
[56402.762819]  ret_from_fork_asm+0x1b/0x30
[56402.763190]  </TASK>
[56402.763579] INFO: task smbd[192.168.4.:9738 blocked for more than 122 seconds.
[56402.763956]       Tainted: P           O       6.8.12-4-pve #1
[56402.764321] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[56402.764694] task:smbd[192.168.4. state:D stack:0     pid:9738  tgid:9738  ppid:9453   flags:0x00004000
[56402.765069] Call Trace:
[56402.765422]  <TASK>
[56402.765771]  __schedule+0x401/0x15e0
[56402.766132]  schedule+0x33/0x110
[56402.766482]  io_schedule+0x46/0x80
[56402.766838]  cv_wait_common+0xac/0x140 [spl]
[56402.767198]  ? __pfx_autoremove_wake_function+0x10/0x10
[56402.767564]  __cv_wait_io+0x18/0x30 [spl]
[56402.767930]  txg_wait_synced_impl+0xe1/0x130 [zfs]
[56402.768493]  txg_wait_synced+0x10/0x60 [zfs]
[56402.769047]  dmu_tx_wait+0x2eb/0x460 [zfs]
[56402.769659]  dmu_tx_assign+0x162/0x510 [zfs]
[56402.770226]  zfs_write+0x4a0/0xd70 [zfs]
[56402.770804]  zpl_iter_write+0x11b/0x1a0 [zfs]
[56402.771364]  do_iter_readv_writev+0x149/0x1d0
[56402.771721]  vfs_writev+0x185/0x3e0
[56402.772081]  ? ksys_read+0xe6/0x100
[56402.772435]  do_writev+0x81/0x170
[56402.772791]  ? do_writev+0x81/0x170
[56402.773152]  __x64_sys_writev+0x1c/0x30
[56402.773501]  x64_sys_call+0x1b01/0x24b0
[56402.773852]  do_syscall_64+0x81/0x170
[56402.774197]  ? do_compat_epoll_pwait.part.0+0xf/0x90
[56402.774540]  ? __x64_sys_epoll_pwait+0x91/0x150
[56402.774905]  ? syscall_exit_to_user_mode+0x86/0x260
[56402.775257]  ? do_syscall_64+0x8d/0x170
[56402.775600]  ? irqentry_exit+0x43/0x50
[56402.775945]  ? exc_page_fault+0x94/0x1b0
[56402.776282]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[56402.776624] RIP: 0033:0x7480be1b4862
[56402.776984] RSP: 002b:00007fff12cfaed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
[56402.777325] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007480be1b4862
[56402.777660] RDX: 0000000000000002 RSI: 00007fff12cfaf30 RDI: 000000000000000f
[56402.777999] RBP: 00007480bde16b90 R08: 0000000000000000 R09: 0000000000000000
[56402.778330] R10: 0000000000000000 R11: 0000000000000246 R12: 00007480bdc8dac0
[56402.778657] R13: 00000000000000b4 R14: 00007480bdc8d980 R15: 0000000000000000
[56402.778988]  </TASK>
[56402.779395] INFO: task kworker/u65:0:728007 blocked for more than 122 seconds.
[56402.779743]       Tainted: P           O       6.8.12-4-pve #1
[56402.780094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[56402.780438] task:kworker/u65:0   state:D stack:0     pid:728007 tgid:728007 ppid:2      flags:0x00004000
[56402.780797] Workqueue: writeback wb_workfn (flush-zfs-19)
[56402.781154] Call Trace:
[56402.781513]  <TASK>
[56402.781872]  __schedule+0x401/0x15e0
[56402.782229]  schedule+0x33/0x110
[56402.782583]  io_schedule+0x46/0x80
[56402.782941]  cv_wait_common+0xac/0x140 [spl]
[56402.783305]  ? __pfx_autoremove_wake_function+0x10/0x10
[56402.783662]  __cv_wait_io+0x18/0x30 [spl]
[56402.784031]  txg_wait_synced_impl+0xe1/0x130 [zfs]
[56402.784610]  txg_wait_synced+0x10/0x60 [zfs]
[56402.785194]  dmu_tx_wait+0x2eb/0x460 [zfs]
[56402.785774]  dmu_tx_assign+0x162/0x510 [zfs]
[56402.786337]  zfs_putpage+0x269/0x900 [zfs]
[56402.786885]  zpl_putfolio+0x3e/0x60 [zfs]
[56402.787419]  write_cache_pages+0x169/0x400
[56402.787769]  ? __update_load_avg_se+0x276/0x310
[56402.788128]  ? __pfx_zpl_putfolio+0x10/0x10 [zfs]
[56402.788664]  zpl_writepages+0xb0/0x1d0 [zfs]
[56402.789192]  do_writepages+0xcd/0x1f0
[56402.789536]  ? wakeup_preempt+0x68/0x80
[56402.789895]  ? probe_sched_wakeup+0x2d/0x60
[56402.790229]  ? ttwu_do_activate+0x148/0x250
[56402.790557]  __writeback_single_inode+0x44/0x370
[56402.790892]  writeback_sb_inodes+0x211/0x510
[56402.791229]  __writeback_inodes_wb+0x54/0x100
[56402.791556]  ? queue_io+0x115/0x120
[56402.791885]  wb_writeback+0x2df/0x350
[56402.792208]  wb_workfn+0x368/0x4d0
[56402.792529]  ? __schedule+0x409/0x15e0
[56402.792854]  ? add_timer+0x20/0x40
[56402.793173]  process_one_work+0x173/0x350
[56402.793494]  worker_thread+0x306/0x440
[56402.793818]  ? __pfx_worker_thread+0x10/0x10
[56402.794138]  kthread+0xef/0x120
[56402.794464]  ? __pfx_kthread+0x10/0x10
[56402.794794]  ret_from_fork+0x44/0x70
[56402.795119]  ? __pfx_kthread+0x10/0x10
[56402.795454]  ret_from_fork_asm+0x1b/0x30
[56402.795787]  </TASK>
[56525.628368] INFO: task smbd:732195 blocked for more than 122 seconds.
[56525.628826]       Tainted: P           O       6.8.12-4-pve #1
[56525.629172] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[56525.629527] task:smbd            state:D stack:0     pid:732195 tgid:732195 ppid:9453   flags:0x00000000
[56525.629883] Call Trace:
[56525.630237]  <TASK>
[56525.630585]  __schedule+0x401/0x15e0
[56525.630936]  ? __legitimize_path+0x2d/0x70
[56525.631302]  schedule+0x33/0x110
[56525.631654]  schedule_preempt_disabled+0x15/0x30
[56525.632010]  rwsem_down_write_slowpath+0x392/0x6a0
[56525.632366]  down_write+0x5c/0x80
[56525.632733]  path_openat+0x369/0x1190
[56525.633087]  ? __alloc_pages+0x251/0x1320
[56525.633406]  do_filp_open+0xaf/0x170
[56525.633609]  do_sys_openat2+0xb3/0xe0
[56525.633807]  __x64_sys_open+0x6e/0xa0
[56525.633999]  x64_sys_call+0x1a2f/0x24b0
[56525.634195]  do_syscall_64+0x81/0x170
[56525.634383]  ? __do_sys_newlstat+0x7f/0xc0
[56525.634569]  ? __count_memcg_events+0x6f/0xe0
[56525.634756]  ? count_memcg_events.constprop.0+0x2a/0x50
[56525.634938]  ? handle_mm_fault+0xad/0x380
[56525.635116]  ? do_user_addr_fault+0x337/0x660
[56525.635417]  ? irqentry_exit_to_user_mode+0x7b/0x260
[56525.635715]  ? irqentry_exit+0x43/0x50
[56525.636032]  ? exc_page_fault+0x94/0x1b0
[56525.636336]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[56525.636639] RIP: 0033:0x7480be1b4862
[56525.636967] RSP: 002b:00007fff12cfb0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[56525.637282] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007480be1b4862
[56525.637593] RDX: 00000000000001a4 RSI: 0000000000008842 RDI: 00007fff12cfb280
[56525.637917] RBP: 00007480bde16b90 R08: 0000000000000000 R09: 0000000000000000
[56525.638232] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff12cfb212
[56525.638545] R13: 00007480bdf5861a R14: 00007fff12cfb280 R15: 00007480bd77ef00
[56525.638863]  </TASK>
[56525.639176] INFO: task smbd:732196 blocked for more than 122 seconds.
[56525.639503]       Tainted: P           O       6.8.12-4-pve #1
[56525.639827] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[56525.640158] task:smbd            state:D stack:0     pid:732196 tgid:732196 ppid:9453   flags:0x00000000
[56525.640504] Call Trace:
[56525.640837]  <TASK>
[56525.641169]  __schedule+0x401/0x15e0
[56525.641516]  schedule+0x33/0x110
[56525.641850]  schedule_preempt_disabled+0x15/0x30
[56525.642192]  rwsem_down_write_slowpath+0x392/0x6a0
[56525.642529]  down_write+0x5c/0x80
[56525.642863]  path_openat+0x369/0x1190
[56525.643212]  ? kmem_cache_alloc+0x133/0x370
[56525.643553]  do_filp_open+0xaf/0x170
[56525.643889]  do_sys_openat2+0xb3/0xe0
[56525.644226]  __x64_sys_open+0x6e/0xa0
[56525.644558]  x64_sys_call+0x1a2f/0x24b0
[56525.644891]  do_syscall_64+0x81/0x170
[56525.645230]  ? __alloc_pages+0x251/0x1320
[56525.645563]  ? mutex_lock+0x12/0x50
[56525.645898]  ? rrw_exit+0x72/0x170 [zfs]
[56525.646466]  ? xas_find+0x6e/0x1d0
[56525.646800]  ? next_uptodate_folio+0x93/0x290
[56525.647147]  ? filemap_map_pages+0x4b8/0x5b0
[56525.647484]  ? __mod_lruvec_state+0x36/0x50
[56525.647818]  ? __lruvec_stat_mod_folio+0x70/0xc0
[56525.648151]  ? do_fault+0x269/0x4c0
[56525.648488]  ? __handle_mm_fault+0x88f/0xf20
[56525.648819]  ? syscall_exit_to_user_mode+0x86/0x260
[56525.649149]  ? __count_memcg_events+0x6f/0xe0
[56525.649476]  ? count_memcg_events.constprop.0+0x2a/0x50
[56525.649796]  ? handle_mm_fault+0xad/0x380
[56525.650111]  ? do_user_addr_fault+0x337/0x660
[56525.650433]  ? irqentry_exit_to_user_mode+0x7b/0x260
[56525.650752]  ? irqentry_exit+0x43/0x50
[56525.651068]  ? exc_page_fault+0x94/0x1b0
[56525.651400]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[56525.651727] RIP: 0033:0x7480be1b4862
[56525.652059] RSP: 002b:00007fff12cfb0e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[56525.652396] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007480be1b4862
[56525.652731] RDX: 00000000000001a4 RSI: 0000000000008842 RDI: 00007fff12cfb280
[56525.653067] RBP: 00007480bde16b90 R08: 0000000000000000 R09: 0000000000000000
[56525.653410] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff12cfb212
[56525.653749] R13: 00007480bdf5861a R14: 00007fff12cfb280 R15: 00007480bd77ef00
[56525.654094]  </TASK>
[56525.654439] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

Has anyone figured out a cause or a workaround? I'm about to restore/rebuild my pool from backup. Considering using FreeBSD, as some coments above mention this is not a problem there.

@justinclift
Copy link

Just to check an idea, is that system using ZFS encryption?

@midzelis
Copy link

Yes, encrypted. Using LSI HBA and Intel RES2SV240 E91267-203 expander, SAS drives. 128 ECC ram. Ryzen 9 5950X, Asrock Taichi x570.

  *-sas
       description: Serial Attached SCSI controller
       product: SAS2308 PCI-Express Fusion-MPT SAS-2
       vendor: Broadcom / LSI
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: scsi2
       version: 05
       width: 64 bits
       clock: 33MHz
       capabilities: sas pm pciexpress vpd msi msix bus_master cap_list rom
       configuration: driver=mpt3sas latency=0
       resources: irq:36 ioport:e000(size=256) memory:fa740000-fa74ffff memory:fa700000-fa73ffff memory:fa600000-fa6fffff

 *-disk:0
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.0.0
       bus info: scsi@2:0.0.0
       logical name: /dev/sda
       version: EF03
       serial: ZAD3X2RQ0000C8322M8G
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=18a24dce-b98f-ef45-9205-8efa82cc57bf logicalsectorsize=512 sectorsize=4096
  *-disk:1
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.1.0
       bus info: scsi@2:0.1.0
       logical name: /dev/sdb
       version: EF03
       serial: ZAD3M0CB0000C831A4E6
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=810276cf-bda3-4444-99fa-9d571697f09e logicalsectorsize=512 sectorsize=4096
  *-disk:4
       description: SCSI Disk
       product: ST20000NM002D
       vendor: SEAGATE
       physical id: 0.2.0
       bus info: scsi@2:0.2.0
       logical name: /dev/sdc
       version: E005
       serial: ZVTC7YHP0000W349P64L
       size: 18TiB (20TB)
       capacity: 50TiB (55TB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=7 guid=5a6a1d68-aea8-e043-8fd1-cef90df2e8d6 logicalsectorsize=512 sectorsize=4096
  *-disk:5
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.3.0
       bus info: scsi@2:0.3.0
       logical name: /dev/sdd
       version: EF03
       serial: ZAD3X2090000C8322K8H
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=4415cd6c-b43e-e748-82e5-c081117a74b9 logicalsectorsize=512 sectorsize=4096
  *-disk:6
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.4.0
       bus info: scsi@2:0.4.0
       logical name: /dev/sde
       version: EF03
       serial: ZAD3X2DT0000C829C3AN
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=b9fc580e-6dea-f84e-8b8c-5dc2509abe14 logicalsectorsize=512 sectorsize=4096
  *-disk:7
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.5.0
       bus info: scsi@2:0.5.0
       logical name: /dev/sdf
       version: EF03
       serial: ZAD3VRYK0000C8299ASB
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=f48324f7-0925-8643-a68c-cc5c0d89c398 logicalsectorsize=512 sectorsize=4096
  *-disk:8
       description: SCSI Disk
       product: ST6000NM0285
       vendor: SEAGATE
       physical id: 0.6.0
       bus info: scsi@2:0.6.0
       logical name: /dev/sdg
       version: EF03
       serial: ZAD3X2EB0000C829A6Z9
       size: 5589GiB (6001GB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=6 guid=63b326c9-b694-ac4c-8bc4-73e25809e839 logicalsectorsize=512 sectorsize=4096
  *-disk:9
       description: SCSI Disk
       product: ST20000NM002D
       vendor: SEAGATE
       physical id: 0.7.0
       bus info: scsi@2:0.7.0
       logical name: /dev/sdh
       version: E005
       serial: ZVTEMZ8N0000R4366QP8
       size: 18TiB (20TB)
       capacity: 47TiB (51TB)
       capabilities: 7200rpm gpt-1.00 partitioned partitioned:gpt
       configuration: ansiversion=7 guid=f1364d18-18b5-b740-938f-ae8fe7139a77 logicalsectorsize=512 sectorsize=4096
               total        used        free      shared  buff/cache   available
Mem:           125Gi        55Gi        69Gi        51Mi       1.8Gi        70Gi
Swap:             0B          0B          0B
LUNA                                 34.5T  27.3T  7.25T        -         -    19%    79%  1.00x    ONLINE  -
  indirect-0                             -      -      -        -         -      -      -      -    ONLINE
  indirect-1                             -      -      -        -         -      -      -      -    ONLINE
  mirror-2                           5.45T  5.39T  62.8G        -         -    73%  98.9%      -    ONLINE
    wwn-0x5000c50095461a0f           5.46T      -      -        -         -      -      -      -    ONLINE
    wwn-0x5000c50095463d37           5.46T      -      -        -         -      -      -      -    ONLINE
  mirror-3                           18.2T  14.3T  3.84T        -         -     9%  78.9%      -    ONLINE
    wwn-0x5000c500f29e63cf           18.2T      -      -        -         -      -      -      -    ONLINE
    scsi-35000c500f34a58cf           18.2T      -      -        -         -      -      -      -    ONLINE
  mirror-4                           5.45T  4.82T   649G        -         -    15%  88.4%      -    ONLINE
    wwn-0x5000c500954613af           5.46T      -      -        -         -      -      -      -    ONLINE
    wwn-0x5000c5009546181b           5.46T      -      -        -         -      -      -      -    ONLINE
  mirror-5                           5.45T  2.74T  2.71T        -         -     3%  50.3%      -    ONLINE
    wwn-0x5000c5009545fb13           5.46T      -      -        -         -      -      -      -    ONLINE
    wwn-0x5000c50095461003           5.46T      -      -        -         -      -      -      -    ONLINE
arcstat 1
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
21:34:30     0       0     0       0     0      0    0    43G    62G    65G
21:34:31    17      17   100       0     0      0    0    43G    62G    65G
21:34:32     3       3   100       0     0      0    0    43G    62G    65G
21:34:33     3       3   100       0     0      0    0    43G    62G    65G
21:34:34   985       5   100     980   100      0    0    43G    62G    65G
21:34:35     7       7   100       0     0      0    0    43G    62G    65G
21:34:36     3       3   100       0     0      0    0    43G    62G    65G
21:34:37   179      41   100     138   100      0    0    43G    62G    65G
21:34:38     5       3   100       2   100      0    0    43G    62G    65G

@justinclift
Copy link

Interesting. I've seen other people (credible ones) mention that using ZFS encryption + ZFS send/recv (ie to ship the snapshots to a remote system) is buggy and not at all a good idea for production systems.

I'd kind of hoped that the bug fixes in ZFS 2.2.6 might have somehow fixed those particular bugs, but it sounds like there are still problems hanging around. ☹️

For my systems, that meant switching them to using LUKS/cryptsetup for the encryption part of things instead (still using ZFS snapshots). That's been working in small scale production for several months now.

You might need to look into doing something similar too to try and work around the problem.

Sorry I don't have better ideas for resolving this. 😦

@1e100
Copy link

1e100 commented Dec 10, 2024

This doesn't seem to be related to encryption necessarily, nor does it seem to be related to send/receive. Seems like some kind of a buffering race condition or something - it only happens when you write a ton of data to a vdev. I'd run into this when copying data from a couple of sources (just plain file copy, no send/receive) to a vdev, saturating the write throughput of the vdev. It doesn't happen right away, but eventually you start to see errors about things getting stuck, the timeout gets adjusted upwards again and again, and then you get a fault on the vdev. In my case I worked around this by limiting the speed at which the data comes in. Or maybe I just got lucky and the problem did not re-appear by chance.

@markusdd
Copy link

exactly. we also observed this. we were running Gitlab on an affected server and anytime large cleanup tasks where happening or a lot of CI artifacts came flying in we would see this.
As I described many posts above switching distros and hence kernel versions improved the situation, but ultimately this looks like a bug either in the kernel or in the interaction of openzfs with it.
But this can only be solved if both sides come together and really start digging into it, this will never get solved here by posting more symptom reports. This is a condition that should never occur like this.

@Dinth
Copy link

Dinth commented Dec 10, 2024

This doesn't seem to be related to encryption necessarily, nor does it seem to be related to send/receive

Indeed, i am not using snapshots nor encryption and im still affected. Rarely, but still.

@midzelis
Copy link

If its not related to encryption, then its worse. I suspect this may have caused permanent corruption in the pool. If @1e100 is correct, and its related to writing at or over the vdev capacity, then why would it also occur during boot up, during zfs mount -a? Additionally, previously, I was able to max out my vdev at 700MB/sec over 10Gbe doing simple rsyncs and file copies over SMB with no problems. However, both times this happened, it was only writing out at about 200-300MB/sec. In this case, I was using FinalCutP Pro to transcode 8K raw files to ProRes directly to the SMB share. What may be interesting is that this would generally produce 1 single very large file (3.5TB+) sequentially.

I've used ZFS for 10+yrs now, starting with FreeNAS 8, and this has totally shaken my belief in this filesystem.

I need to unblock myself as quickly as possible. Here's what I plan to try:

  1. Proxmox has a 6.11 kernel in their test repository. Will give this a try.
  2. If that doesn't work, then will setup a TrueNAS Core 13 and see how that works.

After I finish my video edits, I'll redo my primary pool from a backup, this time without encryption. I'll keep encryption on my remote backup server, and not use raw sends like I was doing previously. Of course, this all assumes that TrueCore works - if it doesn't, I'm going to have to rethink using ZFS.

@prgwiz
Copy link

prgwiz commented Dec 11, 2024 via email

@howels
Copy link

howels commented Dec 13, 2024

I did notice this on some of my systems, I think the pattern is that I was seeing it mainly on systems with integrated SATA ports. When I swapped to LSI SAS HBAs instead of the motherboard ports this went away, so maybe ZFS is heavily using some queuing feature that the onboard SATA controller does not implement correctly.

@midzelis
Copy link

midzelis commented Dec 23, 2024

I tried doing what @prgwiz suggested. It didn't help. FWIW, @howels - I was not using SATA ports, all my drives are in fact SAS drives.

Last update: I was able to trigger it when deleting folders directly from the host. (rm -rf on the host itself). This command completed. Simultaneously, I was transferring data to the pool over SMB, and got the problem. I tried to restart the host, the process hung (trying to kill the LXC container running SMB) - I hard reset. Pool comes back, but the data I deleted "successfully" reappeared. I'm now wondering if running SMB over LXC, and using a lxc.mount.entry configuration like lxc.mount.entry: /LUNA LUNA none rbind,create=dir,optional 0 0 the full config is

lxc.cgroup2.devices.allow: a
lxc.cap.drop:
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.mount.entry: /dev/serial/by-id  dev/serial/by-id  none bind,optional,create=dir
lxc.mount.entry: /dev/ttyUSB0       dev/ttyUSB0       none bind,optional,create=file
lxc.mount.entry: /dev/ttyUSB1       dev/ttyUSB1       none bind,optional,create=file
lxc.mount.entry: /dev/ttyACM0       dev/ttyACM0       none bind,optional,create=file
lxc.mount.entry: /dev/ttyACM1       dev/ttyACM1       none bind,optional,create=file
lxc.mount.entry: /LUNA LUNA none rbind,create=dir,optional 0 0
lxc.mount.entry: /TMP_POOL TMP_POOL none rbind,create=dir,optional 0 0

It did seem related only to one dataset - however, I didn't want to chance it. I'm decided to nuke my primary pool - currently restoring off my backup pool. 24h to go! Nice side effect of rebalancing data on my drives after adding some new capacity.

Decided to remove encryption while I was at it, just in case it was a problem. If it happens again, I'll run smb/nfs directly on the host instead of in a container/cgroups to see if thats related.

@prgwiz
Copy link

prgwiz commented Dec 23, 2024 via email

@ryao
Copy link
Contributor

ryao commented Jan 25, 2025

There are a few different issues here. I will try to respond to each as it is clear this slipped through the cracks.

@AlexOwen The txg_sync thread is waiting excessively is often a symptom of the underlying storage being slow at servicing IOs. The historically most common cause of this would be a disk having trouble getting sector ECC to return a sensible value. Many disks will retry reads an excessively long time before deciding the sector is bad and reporting it to ZFS. It would seem counter-intuitive for this to block the txg_sync thread because the txg_sync thread primarily writes, but at a low level, ZFS will do copy on write and if it only knows a partial amount of what must be written, then it must read the existing record on disk to get the remainder before it can do a write. This can happen more often than you would think given that various metadata updates only affect partial records.

For example, if we are modifying a partial record on a large file, we must read each level of the indirect block tree until we know where the original record is. Then we need to write that out, plus new records for each level of the indirect block tree. Thankfully, these will be cached from the initial reads, sparing us the need to read them again, but if the drive enters error correction on sectors on a few of these levels, the reads needed for us to be able to begin the writes that the txg_sync thread does can be delayed by a significant amount of time. If each one is delayed by 30 seconds, and there are 5 levels, then that would easily exceed 120 seconds and cause a report that the txg_sync thread has hung.

In any case, something like this could be what happened on your machine. You might get some hint about this by looking at smart data, but usually, it won't report how many errors it corrected and only reports reallocated sector counts, which makes this problem a headache to diagnose. It is possible on some drives to set Error Recovery Control to address this issue:

https://en.wikipedia.org/wiki/Error_recovery_control

The recommendation to set error recovery control is documented here:

https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Hardware.html#error-recovery-control

Documentation on how to set this is here:

https://www.smartmontools.org/wiki/FAQ#WhatiserrorrecoverycontrolERCandwhyitisimportanttoenableitfortheSATAdisksinRAID

I wrote the original recommendation to set it to 0.1 in the documentation and I suggest setting it to that for performance purposes.

Interestingly, wikipedia claims that ZFS has handling for this, but I am not actually aware of any such feature. I have been less active for the past few years, so either the addition of the feature was added when I was not looking or wikipedia is wrong. My suggestion is to follow the documentation on this one, as the others should have modified it if there were any change in this area. Unfortunately, not all drives support error recovery control, but there are still some things you can do, so keep reading.

Regular scrubs might help with this. It will force the drive to attempt to read all written sectors. That will cause delays if it is slow at reading (unless you set error recovery control), but after a while, it will either self correct or pass an error to ZFS, which will correct the issue. After the scrub has finished, any sectors in a state that causes delays should be fixed.

If regular scrubs do not work, I suggest watching the IO completion times of your drives. One is likely much slower than the others. You can sort of use iostat to do this, but it generates averages rather than histograms. What you really want is a tool that generates histograms so you can catch outliers. biolatency from bcc could work for doing this:

https://github.com/iovisor/bcc/blob/master/tools/biolatency.py

Alternatively, you could skip to using this utility to identify the bad drive upfront and then try using a scrub and see if it helps to remediate the drive. If things continue to happen even after a scrub and you have identified the bad drive, you should just replace the drive.

That said, I am not ruling out the possibility of a bug, but I think we should try to rule out the more obvious explanations. Others posting that these issues are not present on non-Linux operating systems suggests the possibility that there is a Linux bug that is affecting us. This would give similar symptoms to what I described here and likely would be detectable using the biolatency tool, although in the case of a Linux bug, I suspect we would see many drives having high outlier IO completion times rather than just one.

There were also a few replies saying that this affects proxmox, but not Ubuntu and that changing the IO elevator helps. Interestingly, ZFS when given the disks directly sets the IO elevator to noop, while proxmox might be doing its own partitioning, which would prevent ZFS from setting the IO elevator. This ordinarily would not be the end of the world, but it might be related to the issues people are having. What IO elevator is set at /sys/block/$BDEV/queue/scheduler for your drives? Does setting it to none help?

There is a report that replacing a SMR drive with a non-SMR drive solved this problem. SMR could cause sluggishness whenever ZFS tries to write to a sector. Are you using any SMR drives?

Finally, are you using data deduplication?

@HiFiPhile @seanmikhaels @kebian @rkorzeniewski @Blackclaws @lukaszaoralek @kyle0r @laurensbl @MatthiasKuehneEllerhold @rgrunbla @sotiris-bos @carmenguarino @bombcar @midzelis Your issues look the same as the original poster's and my advice is the same for you. It is possible that there are multiple issues causing the same symptoms, but I cannot tell your issues apart from the original poster's issue based on what you reported.

@lassebm @rejsmont Your issue is different. In your case, ZFS submitted the request to the block layer and it got stuck. This issue is either in the block layer or your hardware. I suggest filing a separate issue for this.

@Dinth I see from the history that your issue is deduplication related, which is another mechanism by which the txg_sync thread can be delayed.

@queenkjuul @foxx Your issues are different than the others here. You might have hit a bug, although I have a suspicion this bug has been fixed since you posted.

@SpikedCola I really wish I had seen this 4 years ago. Since you ran a scrub and it did not address the issue, my suggestion would have been to skip the advice I gave to the original poster and use a tool such as biolatency from BCC to monitor IO completion times.

@kirscheGIT @0x3333 Your issues are similar to the others, but different. The txg_sync thread is making forward progress faster than others' here, but not fast enough for the zpool command to unblock on it. My advice is the same as it is for the others.

@dm17 USB flash drives tend to be bad at random IO. I suspect the one you have is exceptionally bad at it. I suggest getting a different USB flash drive. You will likely have problems with this one for anything other than writing large files to it using FAT32. I suggest filing a separate issue for this to discuss this there. Maybe we could make this better, but some USB flash drives are really terrible and there is not much hope.

@kjelderg You did not post a backtrace, but you said you had almost exactly the same problem as @rkorzeniewski. My suggestion for you would have been the same as it is for him, although upon reading your later remark that switching to FreeBSD fixed the issue, I am beginning to wonder if there is a Linux bug that is affecting us. :/

@matti Unfortunately, I need more information than "same here" to give advice, but if it really is the same backtrace, then my remarks to the original poster apply.

@khinsen Your remark makes me think there might be a Linux bug affecting us. The biolatency tool I suggested could help prove it.

@Red-Eyed Your backtrace is slightly different than the others. Python did a synchronous write, which must wait for past transaction groups to have finished so that the ZIL record for it will be read upon resume if there is a power loss. However, the transaction group commits are running slowly. They are not slow enough to trigger a warning for the txg_sync thread, but they are slow enough to cause this synchronous write to issue a warning. I suggest filing a separate issue with @kirscheGIT and @0x3333. I believe your issues are either the same or very similar.

@ThatCoffeeGuy This affecting proxmox but not Ubuntu is a very interesting data point. Thank you.

@agrevtcev This is an interesting datapoint as well. Thank you.

@phjr That is a great data point. Thanks.

@markusdd The data points on a pool scrub and on Ubuntu Linux not having this issue when Rocky Linux did are interesting. The first means something other than drive error correction is at work on your system. Did you partition the drives yourself or did you let ZFS handle partitioning? If you did partitioning yourself, what IO elevator is in use? I wonder if the IO elevator being used on Rocky Linux and on Ubuntu are different. Does setting it to none make things better on Rocky Linux?

@kobuki Hearing that this affects you on Ubuntu is interesting considering that a number of others stated that switching to Ubuntu solved their problem. I suspect that you are having a different underlying problem than the others, but the symptoms are the same. My advice is to try what I suggested to the original poster.

@Real-Ztrawberry You and @MatthiasKuehneEllerhold are correct that this is typically the result of the storage hardware running slowly. The reasons for this can vary, but a common cause would be error correction, which is why I am suggesting people try setting up error recovery control if their drives support it.

@ananthb That is a good data point on this affecting Linux 5.4.y. Did you partition the drives yourself or did you let ZFS handle the partitioning? If you partitioned them yourself, what IO elevator is being used?

@furicle Your issue is distinct from the one others are reporting here. Does running echo 0 | sudo tee /sys/module/zfs/parameters/zfs_dmu_offset_next_sync make a difference?

@dperelman That is a good data point on the USB hub. As for it causing ZFS to be unrecoverable, there is not much that we can do when a pool is on a single drive and that drive hangs. That issue hangs all filesystems, not just ZFS. In any case, your single drive hanging issue is a distinct problem from what the majority of others are having here, although it might be related to the issues that most of the other raspberry pi users are having. That said, I suggest you file a separate issue with your suggestion that we be more robust in this scenario. Please include as much as you can with your suggestion.

@ms-jpq The data not being there when you force a reboot is because the transaction group commit is what contains that data and forcing a reboot before it has finished means that it never happened. This does not solve the problem of it running slowly however. I suggest using biolatency to see what is happening in terms of your disks' IO completion times. As for send/recv of a zvol being ridiculously slow, does running echo 0 | sudo tee /sys/module/zfs/parameters/zfs_dmu_offset_next_sync make a difference?

@c-imp11 Unfortunately, I need more data than that to provide a useful reply, although perhaps what I suggested for the others might help you.

@4xoc Nobody running FreeBSD has reported this issue. It is Linux exclusive, and it seems to largely only affect certain Linux distributions. Ubuntu is largely unaffected. There is one person who claimed to be affected on Ubuntu, but I suspect his issue is different than the others and has a different underlying cause, with the same symptoms.

@ryao ryao self-assigned this Jan 26, 2025
@ryao
Copy link
Contributor

ryao commented Jan 26, 2025

If anyone is affected by txg_sync being blocked for more than 120 seconds, here is a list of questions:

  • Do you use hard drives or solid state drives? If you use hard drives, do you have Error Recovery Control set on your drives and are any of these drives SMR drives?
  • How are the drives connected to the machine? If USB is used, try bypassing USB hubs.
  • Have you run a scrub recently? Is the scrub affected? Does running a scrub help at all?
  • Do you use data deduplication?
  • If this issue occurs when running send/receive, does echo 0 | sudo tee /sys/module/zfs/parameters/zfs_dmu_offset_next_sync help?
  • Did ZFS partition the drives or you did you? If you partitioned them or do not know, what IO elevator are the drives using? You can check in /sys/block/$BDEV/queue/scheduler for each drive. Does setting it to none/noop help?

If you can readily reproduce the problem on your machine, please install iovisor/bcc and use the biolatency tool to see if any drives have abnormal IO latency outliers. You will know any if you see them, since they will be >100ms and possibly even in the 10s of seconds. Note that you will need to specify a long period for biolatency's summaries to catch extremely long IO hangs (for example, 120 seconds would catch most cases of the latencies being in the 10s of seconds).

If one drive has abnormal IO latency outliers, does replacing it fix the problem? If all drives have abnormal IO latency outliers, then that indicates a serious problem between the drives and ZFS (e.g. bad controller, bad IO elevator, bad storage driver). Please let me know if you are in a situation where all drives have IO latency outliers.

@ryao
Copy link
Contributor

ryao commented Jan 26, 2025

I was curious whether the biolatency utility would catch latencies caused by the Linux IO elevator, so I checked the kernel sources. It turns out that the instrumentation used by it is before the Linux IO elevator is given IOs, so biolatency should catch latencies caused by the Linux IO elevator. In specific, the starting tracepoint is trace_block_io_start(), which is in this call stack (based on the Linux 6.12.6 sources):

trace_block_io_start
blk_account_io_start
blk_mq_bio_to_request
blk_mq_submit_bio
__submit_bio
__submit_bio_noacct_mq
submit_bio_noacct_nocheck
submit_bio_noacct
submit_bio
vdev_submit_bio_impl
...

This assumes that device mapper, MD, ZRAM, BRD, multipath and a few other extremely obscure things are not being used under ZFS. Those use a different call path and they do not use the trace point.

The full list as found by cscope is:

0 nfblock.c     93 .submit_bio = nfhd_submit_bio,
1 simdisk.c    143 .submit_bio = simdisk_submit_bio,
2 brd.c        283 .submit_bio = brd_submit_bio,
3 drbd_main.c  135 .submit_bio = drbd_submit_bio,
4 n64cart.c    107 .submit_bio = n64cart_submit_bio,
5 pktcdvd.c   2606 .submit_bio = pkt_submit_bio,
6 ps3vram.c    604 .submit_bio = ps3vram_submit_bio,
7 zram_drv.c  2298 .submit_bio = zram_submit_bio,
8 super.c      762 .submit_bio = cached_dev_submit_bio,
9 super.c      770 .submit_bio = flash_dev_submit_bio,
a dm.c        3670 .submit_bio = dm_submit_bio,
b md.c        7992 .submit_bio = md_submit_bio,
c btt.c       1492 .submit_bio = btt_submit_bio,
d pmem.c       291 .submit_bio = pmem_submit_bio,
e multipath.c  535 .submit_bio = nvme_ns_head_submit_bio,
f dcssblk.c     43 .submit_bio = dcssblk_submit_bio,

Interstingly, zvols also qualify to be in that list. biolatency does not instrument bdev_start_io_acct(), which is what most of those things in that list use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests