[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims #9052

zhentaoyu · 2024-08-16T02:46:31Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

changes

downsample global_range for im2col when OH and OW are bigger (happens in stable-diffusion) and sync with related cuda code
make convert int k to int64_t k
downsample global_range for convert_unary when k is large (happens in stable-diffusion).

ggml/src/ggml-sycl/im2col.cpp

airMeng · 2024-08-19T01:54:08Z

@joeatodd @OuadiElfarouki if no issues from your side, I will merge soon

NeoZhangJianyu · 2024-08-19T05:44:34Z

Is it possible to add a unit test case for this case?

zhentaoyu · 2024-08-19T06:39:17Z

Is it possible to add a unit test case for this case?

Yes. I will add some im2col test_cases in test-backend-ops.cpp. And try to add ggml_get_to_fp32_sycl test case in mul_mat. Is that ok for you @NeoZhangJianyu?

Alcpz · 2024-08-19T10:37:22Z

@airMeng @zhentaoyu

This PR seems to create failures in both A100 and an Arc A770:

A100 test-backend-ops log:

IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 1073741824) failed to allocate tensors [SYCL0] 
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 9663676416, max buffer size: 1073741824) failed to allocate tensors [SYCL0]
MUL_MAT(type_a=f16,type_b=f16,m=512,n=262144,k=9216,bs=[1,1],nr=[1,1]): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 1073741824)
failed to allocate tensors [SYCL0]

Arc 770 Log:

IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): failed to allocate tensors [SYCL0] 
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): failed to allocate tensors [SYCL0]   
CONV_TRANSPOSE_1D(ne_input=[197,32,1,1],ne_kernel=[16,32,32,1],s0=1,p0=0,d0=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104)
MUL_MAT(type_a=f16,type_b=f16,m=512,n=262144,k=9216,bs=[1,1],nr=[1,1]): failed to allocate tensors [SYCL0]   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104

slaren · 2024-08-19T13:35:10Z

tests/test-backend-ops.cpp

+#ifdef GGML_USE_SYCL
+    // test cases for 2D im2col with large input H and W
+    test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F16, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true));
+    test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F32, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true));
+#endif


Do not add backend-specific code or tests to test-backend-ops.

I meet an error on Arc770 (16GB)

IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104) failed to allocate tensors [SYCL0] IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor is too large to fit in a SYCL0 buffer (tensor size: 9663676416, max buffer size: 4294959104)

ok, I'll try to fix it.

Hi, @NeoZhangJianyu, @Alcpz. After discussing with @airMeng, I decide to comment these new test cases since the backend_buffer thing cannot be handled well in ARC770 for now. And it may break other backends tests as well for the same reason. I have tested them successfully in Intel(R) Data Center GPU Max 1100 (sycl backend, this branch) and NV A30 (cuda backend, master and this branch). As for ARC-770, I will find some time to verify stable-diffusion models with large output img and try to fix. Thanks.

Hi, @slaren, Could you please take a look at the newest commit 9262dcb if you have free time?

@zhentaoyu
I'd like to see your improvement in this feature.

It's OK, not add new case for it if you make the existed cases are passed.

Thank you!

Signed-off-by: zhentaoyu <[email protected]>

NeoZhangJianyu · 2024-08-20T14:51:36Z

Is it possible to add a unit test case for this case?

Yes. I will add some im2col test_cases in test-backend-ops.cpp. And try to add ggml_get_to_fp32_sycl test case in mul_mat. Is that ok for you @NeoZhangJianyu?

Yes, it's OK!

…ganov#9052) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert overflow Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <[email protected]> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <[email protected]> * sycl:refine convert Signed-off-by: zhentaoyu <[email protected]> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <[email protected]> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <[email protected]> * test: make new cases only in sycl Signed-off-by: zhentaoyu <[email protected]> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>

airMeng approved these changes Aug 16, 2024

View reviewed changes

airMeng reviewed Aug 16, 2024

View reviewed changes

ggml/src/ggml-sycl/im2col.cpp Outdated Show resolved Hide resolved

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 16, 2024

airMeng approved these changes Aug 16, 2024

View reviewed changes

airMeng requested a review from joeatodd August 16, 2024 05:59

zhentaoyu force-pushed the fix_sycl_overflow branch from dd61a02 to 94c1ec9 Compare August 19, 2024 01:51

github-actions bot added the testing Everything test related label Aug 19, 2024

slaren requested changes Aug 19, 2024

View reviewed changes

zhentaoyu added 9 commits August 20, 2024 17:15

sycl: fix im2col overflow and sync with cuda

d36d654

Signed-off-by: zhentaoyu <[email protected]>

sycl: fix convert overflow

9a9f7c9

Signed-off-by: zhentaoyu <[email protected]>

sycl: fix convert and dequantize

3ecfbcf

Signed-off-by: zhentaoyu <[email protected]>

sycl: fix ib in dmmv

bd960a6

Signed-off-by: zhentaoyu <[email protected]>

sycl:refine convert

df3f1c1

Signed-off-by: zhentaoyu <[email protected]>

sycl: move downsample global_range into common

8bd46e8

Signed-off-by: zhentaoyu <[email protected]>

test: add im2col and convert test cases

f351ac5

Signed-off-by: zhentaoyu <[email protected]>

test: make new cases only in sycl

4f5c138

Signed-off-by: zhentaoyu <[email protected]>

test: comment new test_cases for only local testing

9262dcb

Signed-off-by: zhentaoyu <[email protected]>

zhentaoyu force-pushed the fix_sycl_overflow branch from 940f74d to 9262dcb Compare August 20, 2024 09:17

slaren approved these changes Aug 20, 2024

View reviewed changes

NeoZhangJianyu approved these changes Aug 20, 2024

View reviewed changes

NeoZhangJianyu merged commit 4f8d19f into ggerganov:master Aug 20, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims #9052

[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims #9052

zhentaoyu commented Aug 16, 2024

airMeng commented Aug 19, 2024

NeoZhangJianyu commented Aug 19, 2024

zhentaoyu commented Aug 19, 2024 •

edited

Loading

Alcpz commented Aug 19, 2024 •

edited

Loading

slaren Aug 19, 2024

NeoZhangJianyu Aug 19, 2024

zhentaoyu Aug 20, 2024

zhentaoyu Aug 20, 2024

zhentaoyu Aug 20, 2024

NeoZhangJianyu Aug 20, 2024

NeoZhangJianyu commented Aug 20, 2024

[SYCL] Fix SYCL im2col and convert Overflow with Large Dims #9052

[SYCL] Fix SYCL im2col and convert Overflow with Large Dims #9052

Conversation

zhentaoyu commented Aug 16, 2024

changes

airMeng commented Aug 19, 2024

NeoZhangJianyu commented Aug 19, 2024

zhentaoyu commented Aug 19, 2024 • edited Loading

Alcpz commented Aug 19, 2024 • edited Loading

slaren Aug 19, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Aug 19, 2024

Choose a reason for hiding this comment

zhentaoyu Aug 20, 2024

Choose a reason for hiding this comment

zhentaoyu Aug 20, 2024

Choose a reason for hiding this comment

zhentaoyu Aug 20, 2024

Choose a reason for hiding this comment

NeoZhangJianyu Aug 20, 2024

Choose a reason for hiding this comment

NeoZhangJianyu commented Aug 20, 2024

[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims #9052

[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims #9052

zhentaoyu commented Aug 19, 2024 •

edited

Loading

Alcpz commented Aug 19, 2024 •

edited

Loading