-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only call rocblas_initialize for versions < 4 to eliminate unncessary VRAM allocation on some AMD cards #11080
Changes from 1 commit
9ad2e7d
7aba1f9
8d01c89
7088822
cbf779c
9c27481
bb37819
61d341f
5fde721
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,6 +93,7 @@ let | |
rocmBuildInputs = with rocmPackages; [ | ||
clr | ||
hipblas | ||
rocblas | ||
]; | ||
|
||
vulkanBuildInputs = [ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,6 +120,20 @@ static cudaError_t ggml_cuda_device_malloc(void ** ptr, size_t size, int device) | |
} | ||
|
||
static ggml_cuda_device_info ggml_cuda_init() { | ||
#ifdef __HIP_PLATFORM_AMD__ | ||
// Workaround for a rocBLAS bug when using multiple graphics cards: | ||
// https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1346 | ||
{ | ||
char version_string[64]; | ||
version_string[0] = '\0'; | ||
const rocblas_status status = rocblas_get_version_string(version_string, sizeof(version_string)); | ||
if (status != rocblas_status_success || version_string[0] < '4') { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't like this too much as this will ofc fail if rocblas changes its version to 10.0.0 or whatever. I think we should make a bit more effort to parse this properly. Looking at rocblas (https://github.com/ROCm/rocBLAS/blob/59825a7367a24eed4e7e8a483820592089eaf17e/library/src/buildinfo.cpp#L29) it seams we would be on the safe side to use however currently common.h is not used outside of the clients/examples and contains code that makes no sense in the backend. |
||
rocblas_initialize(); | ||
CUDA_CHECK(cudaDeviceSynchronize()); | ||
} | ||
} | ||
#endif | ||
|
||
ggml_cuda_device_info info = {}; | ||
|
||
cudaError_t err = cudaGetDeviceCount(&info.device_count); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is fine it would be slightly better here to use rocblas_get_version_string_size to let rocblas tell you how big the buffer needs to be.