CUDA: fix scratch buffers being allocated on non-main device #3220

JohannesGaessler · 2023-09-16T19:52:39Z

Fixes #3163 .

The issue from what I can tell is that ggml_cuda_assign_scratch_offset does not set the device to the main device before allocating memory. As a consequence it is possible that the scratch buffers end up on other devices which then later causes an error. This PR simply adds a call to ggml_cuda_set_device before any of the other CUDA calls.

slaren · 2023-09-17T00:07:02Z

ggml-cuda.cu

+
+    ggml_cuda_set_device(g_main_device);
+


Could this be moved inside the if block below? It seems to be the only CUDA op relevant here.

This was referenced Sep 17, 2023

CUDA: enable peer access between devices #2470

Merged

CUDA illegal memory access | AWS #3163

Closed

slaren reviewed Sep 17, 2023

View reviewed changes

CUDA: fix scratch malloced on non-main device

391dab7

JohannesGaessler force-pushed the cuda-fix-malloc-wrong-device branch from 70ffed6 to 391dab7 Compare September 17, 2023 10:23

slaren approved these changes Sep 17, 2023

View reviewed changes

JohannesGaessler merged commit 578d8c8 into ggml-org:master Sep 17, 2023

pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request Sep 26, 2023

CUDA: fix scratch malloced on non-main device (ggml-org#3220)

3488793

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix scratch buffers being allocated on non-main device #3220

CUDA: fix scratch buffers being allocated on non-main device #3220

JohannesGaessler commented Sep 16, 2023

slaren Sep 17, 2023

CUDA: fix scratch buffers being allocated on non-main device #3220

CUDA: fix scratch buffers being allocated on non-main device #3220

Conversation

JohannesGaessler commented Sep 16, 2023

slaren Sep 17, 2023

Choose a reason for hiding this comment