-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS doesn't respect Linux kernel CPU isolation mechanisms #8908
Comments
Minimal quick and dirty patch that appears to work for me here: sjuxax@7c2a896 . Looks like |
@sjuxax your observations are correct. The other place you would have to do this is in __thread_create() in module/spl/spl-thread.c. You can see a very primitive example here: |
@sjuxax would you mind opening a PR with the proposed fix for taskqs and dedicated threads. Then we can get you some better feedback and shouldn't lose track of this again. |
@behlendorf Would it be worth it to have additionally a cpulist as an spl module parameter that would bind those threads to defined cpus? |
Has there been any progress on fixing this defect please? |
CPU hotpluging code changes the relevant code: |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Has this been fixed? |
Interesting, I saw your reply via email and tried it myself to confirm. I am on Archlinux here using:
My boot arguments were:
I opened htop on one screen and could already see that only cores 0,1,2 + 12,13,14 were given work by my host. At this point I used I could see the I tried another pv from data in an encrypted dataset and while the read speed was expectedly slower, it still only executed on the 6 cpu threads which were not isolated. I don't know why your situation is behaving differently. |
Hello, thanks for the quick and detailed reply! I forgot to mention that I am running NixOS unstable. I tried to adapt my system as far as possible to your kernel parameters, now I have the following cmdline (hashes and PCI IDs removed for readability): The line However, I took a look in htop and zfs is not the only kernel process using that core, so likely something entirely wrong on my part. So, this is cleary some sort of user error on my part. If you have any suggestions or Ideas, I would of course be very thankful nonetheless. |
Okay, so I think I figured it out, although the reasons why it is the way it is are beyond my understanding. To make a long story short, if I leave a CPU core between 8 and 15 for the kernel, it uses that CPU core, otherwise it will just assign a random one at boot time and be stuck with it. Thank you! |
I concur with @Jauchi that with isolcpus=0-7, z_trim_int still managed to get scheduled onto cpu 0. I'll dig deeper if I have time. |
Just FYI: the |
I just tested (on fully updated Ubuntu 24.04) with the config shared by @IvanVolosyuk (with the exception of Had to give the box back so ran out of time to test any more, if I can get a box with a similar CPU i'll have another go. My objective was to only allow ZFS to use cores 8-15 and 24-31 (one of those AMD X3D CPUs with chunky L3 cache on only those cores) |
Yeah, I also have 7950x3d CPU. I use this config for a while without issues. My first CCD is running qemu/kvm with realtime priority (except for CPU/thread=0, which is left alone as Linux still schedules something on it). I don't use isolcpus, but I do pin vcpu and io threads in qemu; and interrupt lines in Linux kernel away from the CCD. I can observe ZFS only using second CCD in 'top' when doing heavy zstd compression in ZFS. If you plan on buying that CPU - I would advice against it if you plan to do kvm+vfio: https://www.reddit.com/r/VFIO/comments/194ndu7/anyone_experiencing_host_random_reboots_using/ |
How do we get this reopened and/or raise a separate bug? |
@behlendorf can we reopen this as the issue is still exist. I understand that there are some technical / license issues to make it happen. |
System information
Describe the problem you're observing
module/spl/spl-taskq.c
contains this code:Thus, kthreads spawn either with the default cpumask or, if
spl_taskq_thread_bind=1
is set on module import, are bound to CPUs without regard for their availability to the scheduler. This can be a substantial source of latency, which is not acceptable on many systems that use theisolcpus
boot parameter to isolate designated "real-time" cores.While
spl_taskq_thread_bind=1
prevents latency from thread migration on/off RT CPUs, it can make things substantially worse by locking the threads to arbitrary cores in a way that can't be changed withtaskset
, leaving the RT CPUs permanently saddled with the kthread for its full lifecycle.Ideally, the modular CPU selection would be replaced with something that uses the kernel's housekeeping API in
include/linux/sched/isolation.h
to get the cpumask of non-isolated CPUs and usekthread_create_on_cpu
inspl_kthread_create
and/orto schedule and bind threads across non-RT cores only. Note, however, this is an incomplete solution because the kernel's interface to get ankthread_bind_mask
isolcpus
cpumask has changed several times across the versions supported by ZFS.Various hacks can be done to try to prevent unbound kthreads from using isolated cores, and threads not bound with
spl_taskq_thread_bind
can be moved, but these solutions are iffy and incomplete at best. It would be great if ZFS respectedisolcpus
from the start.Describe how to reproduce the problem
Boot with
isolcpus
, capture a trace of the RT CPUs withperf sched record
or other tracing mechanisms, observe ZFS-spawned kthreads coming on and off isolated cores. This is the primary remaining source of latency on my local system.Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: