-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
优化 CMake 多核编译参数逻辑 #523
优化 CMake 多核编译参数逻辑 #523
Conversation
fixed issue #215 |
Thank you for your contribution, we will test it. |
@miaooo0000OOOO 感谢你的贡献,我刚刚为 https://github.com/ubergarm/r1-ktransformers-guide/blob/main/README.zh.md 不过在测试过程中发现:虽然编译初期 以下是部分截图以供参考: @miaooo0000OOOO thanks for the contribution, I just compiled a binary .whel release for However, in testing, I do see 8x processes early in the build for Here are some screen shots to help show: |
@ubergarm 感谢你的测试,我成功复现了你遇到的问题,我猜想 首先,我尝试设置 随后,我查看了nvcc文档,设置nvcc多核编译参数 See NVCC Documentation §4.2.5.8 最后,我运行了
这会生成一个ninja文件,查看build.ninja文件,只有三个文件需要编译 build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/binding.o: compile .../ktransformers/ktransformers/ktransformers_ext/cuda/binding.cpp
build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.o: cuda_compile .../ktransformers/ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.cu
build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.o: cuda_compile .../ktransformers/ktransformers/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu 我认为是因为要编译的文件数量少导致无法多核编译 Translated by DeepSeek R1 @ubergarm Thank you for testing. I successfully reproduced the issue you encountered. My current hypothesis is that First, I attempted setting the Next, I reviewed the NVCC documentation and set the multi-core compilation parameter Finally, I ran
This generated a Ninja build file. Upon inspecting build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/binding.o: compile .../ktransformers/ktransformers/ktransformers_ext/cuda/binding.cpp
build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.o: cuda_compile .../ktransformers/ktransformers/ktransformers_ext/cuda/custom_gguf/dequant.cu
build .../ktransformers/build/temp.linux-x86_64-cpython-311/ktransformers/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.o: cuda_compile .../ktransformers/ktransformers/ktransformers_ext/cuda/gptq_marlin/gptq_marlin.cu I suspect the root cause is the limited number of compilation targets (only 3 files), which inherently restricts multi-core utilization. The build system may not parallelize tasks when the workload is too small to justify thread overhead. |
编译后期有一个单独的大文件,其内部包含了多种template,单个文件的编译暂时无法并行,我们后续会尝试将其拆分到多个不同文件中。 |
以下内容由AI生成,人工审核
更改的代码段通过智能选择并行编译参数,显著提升构建速度并确保跨平台兼容性。以下是不同场景下的行为说明:
1. 用户已设置
CMAKE_BUILD_PARALLEL_LEVEL
环境变量更改的代码块不会执行,直接使用用户指定的值。
示例:
export CMAKE_BUILD_PARALLEL_LEVEL=4 pip install ktransformers --no-build-isolation
--parallel=4
,完全尊重用户配置。2. 用户未设置
CMAKE_BUILD_PARALLEL_LEVEL
场景 2.1:
self.parallel
属性存在且有效优先使用
self.parallel
的值。示例:
若
self.parallel = 8
,则添加--parallel=8
。适用场景:
用户通过其他途径(如命令行参数)显式指定并行度。
场景 2.2:
self.parallel
不存在或为None
/0
自动检测逻辑 CPU 核心数(含超线程),并以此设置并行度。
示例:
--parallel=8
--parallel=1
(安全编译)3. 跨平台行为
Linux/macOS(Make/Ninja)
--parallel=N
→ 底层工具接收-jN
(Make)或-jN
(Ninja)。效果:
完全利用多核资源,加速编译。
Windows(MSBuild)
--parallel=N
→ MSBuild 的/m
参数。效果:
多进程编译,避免
-jN
的兼容性问题。4. 极端情况处理
场景 4.1:CPU 核心数检测失败
代码将
cpu_count
设为1
,回退到单线程编译。示例:
某些虚拟化环境或旧硬件可能无法检测 CPU 数,此时安全编译。
场景 4.2:用户显式禁用并行
设置
CMAKE_BUILD_PARALLEL_LEVEL=1
或self.parallel=1
。效果:
强制单线程编译,便于调试或资源受限环境。
总结
此修改通过智能适配多核编译参数,实现了:
self.parallel
> 自动检测。