-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSOD during physical disk removal from zpool #206
Comments
Looks like you are onto something here, I wonder if I can some people more familiar with the ZIL to take a peek. |
@lundman sure, thank you. |
Hi @lundman, did you get any response from the community? |
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a zil_commit() is delayed by the scheduler long enough for a parallel zil_suspend() operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to txg_wait_synced() after `zil_suspend()` has begun. On PREEMPT_RT Linux kernels, the rw_enter() implementation suffers from writer starvation. This means that a ZIL intensive system can delay zil_suspend indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against PREEMPT_RT Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a zil_commit() is delayed by the scheduler long enough for a parallel zil_suspend() operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to txg_wait_synced() after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Signed-off-by: Richard Yao <[email protected]>
@arun-kv I just heard about this today. That said, you found a data race inside ZIL. I imagine that we did not detect this on other platforms because the schedulers on those platforms made losing this race unlikely. I opened openzfs#14514 with a patch that should prevent this from happening. |
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a zil_commit() is delayed by the scheduler long enough for a parallel zil_suspend() operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to txg_wait_synced() after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_commit_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Signed-off-by: Richard Yao <[email protected]>
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes #14514
The solution for this was revised after feedback and then merged to master as openzfs/zfs@4c856fb. |
Thanks @ryao |
@lundman when can we merge this into windows repo? |
I'm doing a rebase right now - just fighting git at the moment |
#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
@lundman I suspect that we can close this. I would close it for you, but I do not have privileges to close it. |
openzfsonwindows#206 found that it is possible to trip `VERIFY(list_is_empty(&lwb->lwb_itxs))` when a `zil_commit()` is delayed by the scheduler long enough for a parallel `zil_suspend()` operation to exit `zil_commit_impl()`. This is a data race. To prevent this, we introduce a `zilog->zl_suspend_lock` rwlock to ensure that all outstanding `zil_commit()` operations finish before `zil_suspend()` begins and that subsequent operations fallback to `txg_wait_synced()` after `zil_suspend()` has begun. On `PREEMPT_RT` Linux kernels, the `rw_enter()` implementation suffers from writer starvation. This means that a ZIL intensive system can delay `zil_suspend()` indefinitely. This is a pre-existing problem that affects everything that uses rw locks, so it needs to be addressed in the SPL. However, builds against `PREEMPT_RT` Linux kernels are currently broken due to a GPL symbol issue (openzfs#11097), so we can safely disregard that issue for now. Reported-by: Arun KV <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Richard Yao <[email protected]> Closes openzfs#14514
Since the original fix was reverted, I've created alternative one: openzfs#14979 . Reviewers and testers are welcome. |
@amotin: could you give a short summary of why the fix was reverted in upstream? I don't see commit of the reversal. |
@sskras There were some deadlock reports. I haven't investigated what caused them, but implemented it in different way without introducing new locks. See: openzfs#14790 . |
Describe the problem you're observing
Verify VERIFY(list_is_empty(&lwb->lwb_itxs)) failure in zil_free_lwb while running zpool.exe remove zpool-name "physical disk name"
Include any warning/errors/backtraces from the system logs
Describe how to reproduce the problem
I was able to reproduce the issue by introducing the some sleep in the code and also removing the if (zilog->zl_suspend > 0) from zil_commit.
I see zilog->zl_suspend is not protected by any lock, and can this cause a raise condition and zil_commit miss the update done by zil_suspend on zilog->zl_suspend variable?.
I removed the if (zilog->zl_suspend > 0) to see how the system will behave if the zl_suspend is not properly protected.
Below is the patch i used to reproduce the issue,
Steps to reproduce the issue,
The text was updated successfully, but these errors were encountered: