Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arc: avoid possible deadlock in arc_read #17071

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ixhamza
Copy link
Member

@ixhamza ixhamza commented Feb 19, 2025

Motivation and Context

In l2arc_evict(), the config lock may be acquired in reverse order (e.g., first the config lock (writer), then a hash lock) unlike in arc_read() during scenarios like L2ARC device removal. To avoid deadlocks, if the attempt to acquire the config lock (reader) fails in arc_read(), release the hash lock, wait for the config lock, and retry from the beginning.

Description

While I have not been able to reproduce the issue locally, I decoded the following kernel trace from a customer's logs that resulted in a complete system lockup during L2ARC vdev removal:

L2Cache Remove Context

l2arc_evict() => Tries to acquire `hash_lock` mutex it already acquired by arc_read() in below context
    l2arc_remove_vdev()
        spa_load_l2cache()
            spa_vdev_remove() => Acquired spa_config_lock, spa_namespace_lock
                zfs_ioc_vdev_remove()

ZFS Write context

spa_cofig_enter() => Waiting for spa_config_lock Lock to release but acquired by spa_vdev_remove() context
    zfs_blkptr_verify()
        arc_read() => Acquire hash_lock mutex
            dbuf_read_impl()
                dbuf_read()
                    dmu_tx_check_ioerr()
                        dmu_tx_count_write()
                            dmu_tx_hold_write_by_dnode()
                                zfs_write()
                                    zpl_iter_write()

How Has This Been Tested?

  • CI Testing

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

In l2arc_evict(), the config lock may be acquired in reverse order
(e.g., first the config lock (writer), then a hash lock) unlike in
arc_read() during scenarios like L2ARC device removal. To avoid
deadlocks, if the attempt to acquire the config lock (reader) fails
in arc_read(), release the hash lock, wait for the config lock, and
retry from the beginning.

Signed-off-by: Ameer Hamza <[email protected]>
@amotin amotin added the Status: Code Review Needed Ready for review and testing label Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants