Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] psutil.cpu_frequency() is slow #1851

Closed
marxin opened this issue Oct 14, 2020 · 5 comments · Fixed by #1852
Closed

[Linux] psutil.cpu_frequency() is slow #1851

marxin opened this issue Oct 14, 2020 · 5 comments · Fixed by #1852

Comments

@marxin
Copy link
Contributor

marxin commented Oct 14, 2020

Platform

  • { openSUSE Tumbleweed }
  • { psutil version: 5.7.0 }
  • { Python 3.8.5 }

Running the following code snippet takes quite some time on my machine:

$ cpuinfo
Python Version: 3.8.5.final.0 (64 bit)
Cpuinfo Version: (5, 0, 0)
Vendor ID: AuthenticAMD
Hardware Raw: 
Brand: AMD Ryzen 7 2700X Eight-Core Processor
Hz Advertised: 2.1495 GHz
Hz Actual: 2.1495 GHz
Hz Advertised Raw: (2149457000, 0)
Hz Actual Raw: (2149457000, 0)
Arch: X86_64
Bits: 64
Count: 16
Raw Arch String: x86_64
L1 Data Cache Size: 256 KiB
L1 Instruction Cache Size: 512 KiB
L2 Cache Size: 4 MiB
L2 Cache Line Size: 6
L2 Cache Associativity: 0x200
L3 Cache Size: 512 KB
Stepping: 2
Model: 8
Family: 23
Processor Type: 
Extended Model: 
Extended Family: 8
Flags: 3dnowprefetch, abm, adx, aes, aperfmperf, apic, arat, avic, avx, avx2, bmi1, bmi2, bpext, clflush, clflushopt, clzero, cmov, cmp_legacy, constant_tsc, cpb, cpuid, cr8_legacy, cx16, cx8, dbx, de, decodeassists, extapic, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fxsr, fxsr_opt, ht, hw_pstate, ibpb, irperf, lahf_lm, lbrv, lm, mca, mce, misalignsse, mmx, mmxext, monitor, movbe, msr, mtrr, mwaitx, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, overflow_recov, pae, pat, pausefilter, pci_l2i, pclmulqdq, pdpe1gb, perfctr_core, perfctr_llc, perfctr_nb, pfthreshold, pge, pni, popcnt, pse, pse36, rdrand, rdrnd, rdseed, rdtscp, rep_good, sep, sev, sha, sha_ni, skinit, smap, smca, sme, smep, ssbd, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, succor, svm, svm_lock, syscall, tce, topoext, tsc, tsc_scale, v_vmsave_vmload, vgif, vmcb_clean, vme, vmmcall, wdt, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves
$ time python3 -c "import psutil; print(psutil.cpu_freq())"
scpufreq(current=1962.3736875, min=2200.0, max=3700.0)

real	0m0.386s
user	0m0.043s
sys	0m0.022s

and it's even slower on a EPYC machine:

$ cpuinfo
Python Version: 3.8.3.final.0 (64 bit)
Cpuinfo Version: (5, 0, 0)
Vendor ID: AuthenticAMD
Hardware Raw: 
Brand: AMD EPYC 7702 64-Core Processor
Hz Advertised: 1.7936 GHz
Hz Actual: 1.7936 GHz
Hz Advertised Raw: (1793646000, 0)
Hz Actual Raw: (1793646000, 0)
Arch: X86_64
Bits: 64
Count: 128
Raw Arch String: x86_64
L1 Data Cache Size: 2 MiB
L1 Instruction Cache Size: 2 MiB
L2 Cache Size: 32 MiB
L2 Cache Line Size: 6
L2 Cache Associativity: 0x200
L3 Cache Size: 512 KB
Stepping: 
Model: 49
Family: 23
Processor Type: 
Extended Model: 3
Extended Family: 8
Flags: 3dnowext, 3dnowprefetch, abm, adx, aes, amd_ppin, aperfmperf, apic, arat, avic, avx, avx2, bmi1, bmi2, bpext, cat_l3, cdp_l3, clflush, clflushopt, clwb, clzero, cmov, cmp_legacy, constant_tsc, cpb, cpuid, cqm, cqm_llc, cqm_mbm_local, cqm_mbm_total, cqm_occup_llc, cr8_legacy, cx16, cx8, dbx, de, decodeassists, extapic, extd_apicid, f16c, flushbyasid, fma, fpu, fsgsbase, fxsr, fxsr_opt, ht, hw_pstate, ibpb, ibrs, ibs, irperf, lahf_lm, lbrv, lm, mba, mca, mce, misalignsse, mmx, mmxext, monitor, movbe, msr, mtrr, mwaitx, nonstop_tsc, nopl, npt, nrip_save, nx, osvw, osxsave, overflow_recov, pae, pat, pausefilter, pci_l2i, pclmulqdq, pdpe1gb, perfctr_core, perfctr_llc, perfctr_nb, pfthreshold, pge, pni, popcnt, pqe, pqm, pse, pse36, rdpid, rdpru, rdrand, rdrnd, rdseed, rdt_a, rdtscp, rep_good, sep, sha, sha_ni, skinit, smap, smca, smep, ssbd, sse, sse2, sse4_1, sse4_2, sse4a, ssse3, stibp, succor, svm, svm_lock, syscall, tce, topoext, tsc, tsc_scale, umip, v_vmsave_vmload, vgif, vmcb_clean, vme, vmmcall, wbnoinvd, wdt, xgetbv1, xsave, xsavec, xsaveerptr, xsaveopt, xsaves
$ time python3 -c "import psutil; print(psutil.cpu_freq())"
scpufreq(current=1794.9649843749994, min=1500.0, max=2000.0)

real	0m2.630s
user	0m0.062s
sys	0m0.034s
@marxin marxin added the bug label Oct 14, 2020
@marxin
Copy link
Contributor Author

marxin commented Oct 15, 2020

Strace from the machine:
cpu_freq.strace.txt

Apparently first read in /sys/devices/system/cpu/cpufreq/policy%d/ really takes 20ms:

time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq && time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
1793665

real	0m0.019s
user	0m0.001s
sys	0m0.001s
1793665

real	0m0.002s
user	0m0.001s
sys	0m0.001s

The other reads of scaling_min_freq and scaling_max_freq are fast then.
So doing that we easily get to 128 * 0.02s = 2.56s.

@marxin
Copy link
Contributor Author

marxin commented Oct 15, 2020

I noticed that /proc/cpuinfo is not the preferred way, but it's much faster:

cat /proc/cpuinfo | grep 'cpu MHz'
cpu MHz		: 1795.782
cpu MHz		: 1795.591
cpu MHz		: 1794.905
cpu MHz		: 1792.459
...
cpu MHz		: 1796.074

real	0m0.009s
user	0m0.003s
sys	0m0.009s

adding @giampaolo.

@marxin
Copy link
Contributor Author

marxin commented Oct 15, 2020

@gghh may be interested as well.

@giampaolo
Copy link
Owner

I noticed that /proc/cpuinfo is not the preferred way, but it's much faster:

We also need min and max frequencies though. This is what I get with 8 CPUs:

$ time python3 -c "import psutil; psutil.cpu_freq()"

real	0m0,082s
user	0m0,074s
sys	0m0,008s

Not very fast indeed. Perhaps there's another method to do this (ioctl()?).

@marxin
Copy link
Contributor Author

marxin commented Oct 15, 2020

We also need min and max frequencies though. This is what I get with 8 CPUs:

Note that these can get quickly:

$ time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq && time cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
2000000

real	0m0.001s
user	0m0.001s
sys	0m0.000s
2000000

real	0m0.001s
user	0m0.001s
sys	0m0.000s

According to what @gghh told me that reading scaling_cur_freq takes more time to obtain, and requires reading from special registers. The registers are called APERF and MPERF, which are two counters. MPERF increments at a constant speed (a known value), APERF increments at the effective speed of the CPU.

On the contrary, /proc/cpuinfo is simple and fast to obtain (it's cached),

@giampaolo : So maybe we can combine these 2 approaches?

marxin added a commit to marxin/psutil that referenced this issue Oct 15, 2020
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
marxin added a commit to marxin/psutil that referenced this issue Oct 15, 2020
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
marxin added a commit to marxin/psutil that referenced this issue Oct 15, 2020
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
giampaolo added a commit that referenced this issue Dec 29, 2020
Micro optimization in reference to #1852 and #1851.
Use glob.glob(), which internally relies on os.scandir()
in order to list /sys/devices/system/cpu/cpufreq files.
In doing so, we avoid os.path.exists() for each CPU, which
internally uses os.stat().

Signed-off-by: Giampaolo Rodola <[email protected]>
marxin added a commit to marxin/psutil that referenced this issue Jan 6, 2021
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
marxin added a commit to marxin/psutil that referenced this issue Jan 6, 2021
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
marxin added a commit to marxin/psutil that referenced this issue Jan 6, 2021
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
marxin added a commit to marxin/psutil that referenced this issue Jan 6, 2021
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes giampaolo#1851.
giampaolo pushed a commit that referenced this issue Jan 7, 2021
The change is about using /proc/cpuinfo when available. It provides
cached values for frequencies and one can fill up minimum and maximum
frequency from /sys/devices/system/cpu/cpufreq/policy/* sub-system
(which is fast).

Fixes #1851.
giampaolo added a commit that referenced this issue Jan 7, 2021
Signed-off-by: Giampaolo Rodola <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants