You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In an effort to make the CMake more readable, stable and easy to use we have a few tasks we'd like to work on, creating a GitHub issue here to track that progress, some planned changes/investigations:
Have vllm-flash-attn use ExternalProject currently vllm-flash-attn uses the parent CMake scope which creates many footguns since it is in a separate repo, using ExternalProject will mean that the vllm-flash-attn will be run in a separate CMake scope/process
Warn that PTX builds are not currently supported (post [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) #8845), currently if there is a +PTX in TORCH_CUDA_ARCH_LIST this will be ignored. We should warn when this is the case. Alternatively we can add support for PTX builds although this is generally not desirable since PTX increases the wheel size by quite a bit (PTX is larger than SASS), and we already build for all currently supported arches.
Rename define_gpu_extension_target, currently this is used for CPU extensions too so the name is now misleading
Potential build both C++ and CUDA extensions when building for CUDA and using torch dispatcher to dispatch between the two, [Kernel] Factor registrations #8424
Look into removing early returns in CMakeLists.txt (potentially move backends into its own files)
Add a CI test of local builds, i.e. pip install -e .
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
In an effort to make the CMake more readable, stable and easy to use we have a few tasks we'd like to work on, creating a GitHub issue here to track that progress, some planned changes/investigations:
ExternalProject
will mean that the vllm-flash-attn will be run in a separate CMake scope/process+PTX
inTORCH_CUDA_ARCH_LIST
this will be ignored. We should warn when this is the case. Alternatively we can add support for PTX builds although this is generally not desirable since PTX increases the wheel size by quite a bit (PTX is larger than SASS), and we already build for all currently supported arches.define_gpu_extension_target
, currently this is used for CPU extensions too so the name is now misleadingpip install -e .
The text was updated successfully, but these errors were encountered: