Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix SYCL im2col and convert Overflow with Large Dims #9052

Merged
merged 9 commits into from
Aug 20, 2024

Conversation

zhentaoyu
Copy link
Contributor

changes

  • downsample global_range for im2col when OH and OW are bigger (happens in stable-diffusion) and sync with related cuda code
  • make convert int k to int64_t k
  • downsample global_range for convert_unary when k is large (happens in stable-diffusion).

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 16, 2024
@airMeng airMeng requested a review from joeatodd August 16, 2024 05:59
@airMeng
Copy link
Collaborator

airMeng commented Aug 19, 2024

@joeatodd @OuadiElfarouki if no issues from your side, I will merge soon

@NeoZhangJianyu
Copy link
Collaborator

Is it possible to add a unit test case for this case?

@zhentaoyu
Copy link
Contributor Author

zhentaoyu commented Aug 19, 2024

Is it possible to add a unit test case for this case?

Yes. I will add some im2col test_cases in test-backend-ops.cpp. And try to add ggml_get_to_fp32_sycl test case in mul_mat. Is that ok for you @NeoZhangJianyu?

@github-actions github-actions bot added the testing Everything test related label Aug 19, 2024
@Alcpz
Copy link
Collaborator

Alcpz commented Aug 19, 2024

@airMeng @zhentaoyu

This PR seems to create failures in both A100 and an Arc A770:

A100 test-backend-ops log:

IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 1073741824) failed to allocate tensors [SYCL0] 
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 9663676416, max buffer size: 1073741824) failed to allocate tensors [SYCL0]
MUL_MAT(type_a=f16,type_b=f16,m=512,n=262144,k=9216,bs=[1,1],nr=[1,1]): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 1073741824)
failed to allocate tensors [SYCL0]

Arc 770 Log:

IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): failed to allocate tensors [SYCL0] 
IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): failed to allocate tensors [SYCL0]   
CONV_TRANSPOSE_1D(ne_input=[197,32,1,1],ne_kernel=[16,32,32,1],s0=1,p0=0,d0=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104)
MUL_MAT(type_a=f16,type_b=f16,m=512,n=262144,k=9216,bs=[1,1],nr=[1,1]): failed to allocate tensors [SYCL0]   MUL_MAT_ID(type_a=f32,type_b=f32,n_mats=4,n_used=1,b=0,m=512,n=1,k=256): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104

Comment on lines 2148 to 2152
#ifdef GGML_USE_SYCL
// test cases for 2D im2col with large input H and W
test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F16, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true));
test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F32, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true));
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not add backend-specific code or tests to test-backend-ops.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meet an error on Arc770 (16GB)

 IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104)
failed to allocate tensors [SYCL0]   IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor  is too large to fit in a SYCL0 buffer (tensor size: 9663676416, max buffer size: 4294959104)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @NeoZhangJianyu, @Alcpz. After discussing with @airMeng, I decide to comment these new test cases since the backend_buffer thing cannot be handled well in ARC770 for now. And it may break other backends tests as well for the same reason. I have tested them successfully in Intel(R) Data Center GPU Max 1100 (sycl backend, this branch) and NV A30 (cuda backend, master and this branch). As for ARC-770, I will find some time to verify stable-diffusion models with large output img and try to fix. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @slaren, Could you please take a look at the newest commit 9262dcb if you have free time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhentaoyu
I'd like to see your improvement in this feature.

It's OK, not add new case for it if you make the existed cases are passed.

Thank you!

@NeoZhangJianyu
Copy link
Collaborator

Is it possible to add a unit test case for this case?

Yes. I will add some im2col test_cases in test-backend-ops.cpp. And try to add ggml_get_to_fp32_sycl test case in mul_mat. Is that ok for you @NeoZhangJianyu?

Yes, it's OK!

@NeoZhangJianyu NeoZhangJianyu merged commit 4f8d19f into ggerganov:master Aug 20, 2024
52 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…ganov#9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert overflow

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <[email protected]>

* sycl:refine convert

Signed-off-by: zhentaoyu <[email protected]>

* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <[email protected]>

* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <[email protected]>

* test: make new cases only in sycl

Signed-off-by: zhentaoyu <[email protected]>

* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…ganov#9052)

* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert overflow

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <[email protected]>

* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <[email protected]>

* sycl:refine convert

Signed-off-by: zhentaoyu <[email protected]>

* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <[email protected]>

* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <[email protected]>

* test: make new cases only in sycl

Signed-off-by: zhentaoyu <[email protected]>

* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <[email protected]>

---------

Signed-off-by: zhentaoyu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants