-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"CUDA error" when set resolution higher than 1280 x 1280 #156
Comments
I switch to RTX 5000 Ada(48G) and the model goes the same. please help!! |
It seems to be an error in the way matrix multiplications are performed in ggml. Does it work if you do it only with CPU? |
@XienXX cmake .. - DSD_CUBLAS=OFF |
|
Can replicate this with HIPBLAS. 768x768 works, 768x1024 works, 1024x1024 fails, 1280x1280 fails. EDIT: Actually that seems to be only happening with v1.5 model. SDXL works fine with 1280x1280.
|
Same here, HIPblas, RX 7900XT the maximum i managed to make on SD 1.5 is 960x1024, while on SDXL i managed to make a 1920x1920 picture, before encountering the same issue. |
CUDA Version:12.3
GPU: RTX 4080 16G
Model works alright under the condition of 1024 x 1024. But if I set it to 1280x1280 or above, the launch will fails. Check below:
1280x1280 resolution, failed:
PS D:\xien\stable-diffusion.cpp\build\bin\Release> .\sd.exe -m ../v2-1_768-nonema-pruned.safetensors --type f16 -p "a lovely cat" -H 1280 -W 1280
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:137 - loading model from '../v2-1_768-nonema-pruned.safetensors'
[INFO ] model.cpp:641 - load ../v2-1_768-nonema-pruned.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:163 - Stable Diffusion 2.x
[INFO ] stable-diffusion.cpp:169 - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:268 - total memory buffer size = 2450.99MB (clip 684.18MB, unet 1662.34MB, vae 104.47MB)
[INFO ] stable-diffusion.cpp:270 - loading model from '../v2-1_768-nonema-pruned.safetensors' completed, taking 2.67s
[INFO ] stable-diffusion.cpp:282 - running in v-prediction mode
[INFO ] stable-diffusion.cpp:1182 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 28 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 42
|> | 0/20 - 0.00it/sCUDA error: the function failed to launch on the GPU
current device: 0, in function ggml_cuda_op_mul_mat_cublas at D:\xien\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:7650
cublasSgemm_v2(g_cublas_handles[id], CUBLAS_OP_T, CUBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ddf_i, ne00, src1_ddf1_i, ne10, &beta, dst_dd_i, ldc)
GGML_ASSERT: D:\xien\stable-diffusion.cpp\ggml\src\ggml-cuda.cu:226: !"CUDA error"
1280x1024 resolution, worked:
PS D:\xien\stable-diffusion.cpp\build\bin\Release> .\sd.exe -m ../v2-1_768-nonema-pruned.safetensors --type f16 -p "a lovely cat" -H 1280 -W 1024
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4080, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:137 - loading model from '../v2-1_768-nonema-pruned.safetensors'
[INFO ] model.cpp:641 - load ../v2-1_768-nonema-pruned.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:163 - Stable Diffusion 2.x
[INFO ] stable-diffusion.cpp:169 - Stable Diffusion weight type: f16
[INFO ] stable-diffusion.cpp:268 - total memory buffer size = 2450.99MB (clip 684.18MB, unet 1662.34MB, vae 104.47MB)
[INFO ] stable-diffusion.cpp:270 - loading model from '../v2-1_768-nonema-pruned.safetensors' completed, taking 2.69s
[INFO ] stable-diffusion.cpp:282 - running in v-prediction mode
[INFO ] stable-diffusion.cpp:1182 - apply_loras completed, taking 0.00s
[INFO ] stable-diffusion.cpp:1221 - get_learned_condition completed, taking 30 ms
[INFO ] stable-diffusion.cpp:1231 - sampling using Euler A method
[INFO ] stable-diffusion.cpp:1235 - generating image: 1/1 - seed 42
|==================================================| 20/20 - 1.08it/s
[INFO ] stable-diffusion.cpp:1247 - sampling completed, taking 19.60s
[INFO ] stable-diffusion.cpp:1255 - generating 1 latent images completed, taking 19.61s
[INFO ] stable-diffusion.cpp:1257 - decoding 1 latents
[INFO ] stable-diffusion.cpp:1267 - latent 1 decoded, taking 1.45s
[INFO ] stable-diffusion.cpp:1271 - decode_first_stage completed, taking 1.45s
[INFO ] stable-diffusion.cpp:1290 - txt2img completed in 21.09s
save result image to 'output.png'
The text was updated successfully, but these errors were encountered: