-
-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Parallelism CPU calculation inside Kubernetes and Docker with limits #799
fix: Parallelism CPU calculation inside Kubernetes and Docker with limits #799
Conversation
The value of /sys/fs/cgroup/cpu/cpu.cfs_quota_us is not in milliseconds and cannot be simply divided by 1000 to determine the CPU limit. As per kernel documentation[^1], the cpu limit can be determined by dividing that value by /sys/fs/cgroup/cpu/cpu.cfs_period_us. [^1]: https://docs.kernel.org/scheduler/sched-bwc.html
WalkthroughThe changes modify the Changes
Sequence DiagramsequenceDiagram
participant Script
participant get_cpu_num Function
participant cgroup
Script->>get_cpu_num Function: Call function
get_cpu_num Function->>cgroup: Read cpu_quota
get_cpu_num Function->>cgroup: Read cpu_period
get_cpu_num Function->>get_cpu_num Function: Calculate cpu_num = cpu_quota / cpu_period
alt cpu_num < 1
get_cpu_num Function-->>Script: Return 1
else cpu_num >= 1
get_cpu_num Function-->>Script: Return cpu_num
end
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hooks/_common.sh (1)
Line range hint
200-243
: Consider adding documentation about cgroup versions.The implementation looks good, but consider adding comments to explain:
- The kernel documentation reference for CPU quota calculation
- The differences between cgroup v1 (
/sys/fs/cgroup/cpu/
) and v2 (/sys/fs/cgroup/
) pathsfunction common::get_cpu_num { local -r parallelism_ci_cpu_cores=$1 + # According to kernel documentation, CPU quota should be calculated as cpu.cfs_quota_us/cpu.cfs_period_us + # See: https://www.kernel.org/doc/html/latest/scheduler/sched-bwc.html#cpu-bandwidth-control local cpu_quota cpu_period cpu_num local millicpu + # Check for cgroup v1 paths and exclude WSL which has misleading cgroup values if [[ -f /sys/fs/cgroup/cpu/cpu.cfs_quota_us && ! -f /proc/sys/fs/binfmt_misc/WSLInterop ]]; then
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hooks/_common.sh
(2 hunks)
🔇 Additional comments (2)
hooks/_common.sh (2)
200-206
: LGTM! Clear variable declarations and proper WSL detection.The code correctly handles edge cases by checking for WSL environment, which could provide misleading cgroup values.
Line range hint
208-243
: Correct implementation of CPU quota calculation.The fix properly implements the CPU core calculation by using
cpu.cfs_period_us
as per kernel documentation. This resolves the bug where CPU limits were incorrectly calculated in Kubernetes environments.Key improvements:
- Uses
cpu_quota/cpu_period
instead of the incorrectmillicpu/1000
- Properly handles edge cases:
- No CPU limits (
cpu_quota = -1
)- Values less than 1
- Custom CPU core specifications via
parallelism_ci_cpu_cores
hooks/_common.sh
Outdated
cpu_period=$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us) | ||
cpu_num=$((cpu_quota / cpu_period)) | ||
if ((cpu_num < 1)); then | ||
echo 1 | ||
else | ||
echo $cpu_num | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daniel-sampliner Do you think that there's no need for echo $((millicpu / 1000))
as e.g. a fallback if /sys/fs/cgroup/cpu/cpu.cfs_period_us
does not exist (in case it may not exist at all)? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I don't believe it's possible for cpu.cfs_period_us
to not exist if cpu.cfs_quota_us
does exist and is > 0. But I also don't know for certain.
It definitely can't hurt to fall back to something if cpu.cfs_period_us
doesn't exist, but I don't think millicpu / 1000
would be the right value. Perhaps like this to default to 1 CPU when cpu.cfs_period_us
doesn't exist
cpu_period=$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us) | |
cpu_num=$((cpu_quota / cpu_period)) | |
if ((cpu_num < 1)); then | |
echo 1 | |
else | |
echo $cpu_num | |
fi | |
cpu_period=$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us || echo -1) | |
cpu_num=$((cpu_quota / cpu_period)) | |
[[ $cpu_num -lt 1 ]] && echo 1 || echo $cpu_num |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the above commit suggestion makes sense to me.
@MaxymVlasov WDYT?
Co-authored-by: George L. Yermulnik <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hooks/_common.sh (1)
238-239
: Consider adding a comment explaining the CPU calculation formula.While the implementation is correct, adding a brief comment explaining why we divide
cpu_quota
bycpu_period
would help future maintainers understand the logic.+ # Calculate CPU cores by dividing quota by period as per cgroup v1 documentation cpu_num=$((cpu_quota / cpu_period)) [[ $cpu_num -lt 1 ]] && echo 1 || echo $cpu_num
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hooks/_common.sh
(2 hunks)
🔇 Additional comments (3)
hooks/_common.sh (3)
200-207
: LGTM! Improved CPU calculation logic.The changes correctly implement the CPU calculation by using
cpu.cfs_quota_us
andcpu.cfs_period_us
values, which aligns with the kernel documentation for accurate CPU limit calculations in Kubernetes environments.
Line range hint
209-237
: Comprehensive error handling for edge cases.The implementation properly handles various scenarios:
- When no CPU limits are set (
cpu_quota = -1
)- When
cpu_period
is invalid (< 1
)- When running in DinD without limits
- When
parallelism_ci_cpu_cores
is provided but invalidThe warning messages are clear and provide actionable guidance to users.
Line range hint
200-241
: Verify the impact on different container runtimes.The changes handle Kubernetes and Docker environments differently. Let's verify that the CPU calculation works correctly across different container runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @MaxymVlasov WDYT?
Co-authored-by: George L. Yermulnik <[email protected]>
hooks/_common.sh
Outdated
return | ||
fi | ||
|
||
if [[ -f /sys/fs/cgroup/cpu.max ]]; then | ||
# Inside Linux (Docker?) container | ||
local millicpu | ||
millicpu=$(cut -d' ' -f1 /sys/fs/cgroup/cpu.max) | ||
|
||
if [[ $millicpu == max ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding thread above about fallbacking to millicpu
In docker we have next picture:
https://docs.docker.com/engine/containers/resource_constraints/#configure-the-default-cfs-scheduler
➜ docker run -ti --cpus="1.5" alpine sh
/ # grep ^ /dev/null /sys/fs/cgroup/cpu*
/sys/fs/cgroup/cpu.idle:0
/sys/fs/cgroup/cpu.max:150000 100000
Where 150000 is cpu-quota
and 100000 is cpu-period
150000 / 1000 (millicpu) != 1.5
, so, I'll fix it in this PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great findings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
hooks/_common.sh (1)
205-211
: Accurate Retrieval of CPU Quota and Period Values
The updates correctly read thecpu_quota
from/sys/fs/cgroup/cpu/cpu.cfs_quota_us
and attempt to retrievecpu_period
from/sys/fs/cgroup/cpu/cpu.cfs_period_us
. This ensures that the calculation later uses the proper values from cgroup settings. For consistency and clarity, consider using the same file input redirection style for both values.- cpu_period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us 2> /dev/null || echo "$cpu_quota") + cpu_period=$(< /sys/fs/cgroup/cpu/cpu.cfs_period_us 2> /dev/null || echo "$cpu_quota")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hooks/_common.sh
(3 hunks)
🔇 Additional comments (3)
hooks/_common.sh (3)
Line range hint
213-243
: Robust Fallback and Validation for Kubernetes CPU Limits
The conditional block properly checks if eithercpu_quota
is-1
orcpu_period
is less than 1. This branch then either uses the providedparallelism_ci_cpu_cores
(after validating it as a positive integer) or falls back gracefully by echoing1
with an informative warning. This robust handling is well aligned with the PR objective of accurate CPU calculation in Kubernetes environments.
249-259
: Proper Handling of cgroup v2 Environments
The new block that handles the/sys/fs/cgroup/cpu.max
file correctly addresses scenarios (typically in cgroup v2) where the quota might be expressed differently. The check for a quota value of"max"
and the accompanying condition oncpu_period
ensure that the function falls back to usingnproc
when limits are not defined. This implementation cleanly extends support to environments using cgroup v2.
Line range hint
191-259
: Overall, CPU Limit Calculation Fix Is Correct and Well-Documented
The modifications incommon::get_cpu_num
update the CPU calculation to use the formulacpu_quota / cpu_period
, which now aligns with the kernel documentation and addresses the previously incorrect division by 1000. The function properly distinguishes between cgroup v1 and v2 while offering robust fallbacks and informative warnings. This fix effectively meets the PR objectives of accurate CPU limit detection in Kubernetes.
## [1.97.1](v1.97.0...v1.97.1) (2025-02-01) ### Bug Fixes * Parallelism CPU calculation inside Kubernetes and Docker with limits ([#799](#799)) ([58a89a1](58a89a1))
This PR is included in version 1.97.1 🎉 |
The value of /sys/fs/cgroup/cpu/cpu.cfs_quota_us is not in milliseconds and cannot be simply divided by 1000 to determine the CPU limit. As per kernel documentation1, the cpu limit can be determined by dividing that value by /sys/fs/cgroup/cpu/cpu.cfs_period_us.
Put an
x
into the box if that apply:Description of your changes
Fixes the max CPU calculation when run inside kubernetes.
How can we test changes
--parallelism-limit
, confirm that pre-commit is only spawning N-1 concurrent processes.Summary by CodeRabbit
Footnotes
https://docs.kernel.org/scheduler/sched-bwc.html ↩