-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Fix SYCL im2col
and convert
Overflow with Large Dims
#9052
Conversation
dd61a02
to
94c1ec9
Compare
@joeatodd @OuadiElfarouki if no issues from your side, I will merge soon |
Is it possible to add a unit test case for this case? |
Yes. I will add some |
This PR seems to create failures in both A100 and an Arc A770: A100 test-backend-ops log:
Arc 770 Log:
|
tests/test-backend-ops.cpp
Outdated
#ifdef GGML_USE_SYCL | ||
// test cases for 2D im2col with large input H and W | ||
test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F16, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true)); | ||
test_cases.emplace_back(new test_im2col(GGML_TYPE_F32, GGML_TYPE_F16, GGML_TYPE_F32, {1024, 1024, 256, 1}, {3, 3, 256, 1}, 1, 1, 1, 1, 1, 1, true)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not add backend-specific code or tests to test-backend-ops
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meet an error on Arc770 (16GB)
IM2COL(type_input=f32,type_kernel=f16,dst_type=f16,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor is too large to fit in a SYCL0 buffer (tensor size: 4831838208, max buffer size: 4294959104)
failed to allocate tensors [SYCL0] IM2COL(type_input=f32,type_kernel=f16,dst_type=f32,ne_input=[1024,1024,256,1],ne_kernel=[3,3,256,1],s0=1,s1=1,p0=1,p1=1,d0=1,d1=1,is_2D=1): ggml_backend_alloc_ctx_tensors_from_buft: tensor is too large to fit in a SYCL0 buffer (tensor size: 9663676416, max buffer size: 4294959104)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll try to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @NeoZhangJianyu, @Alcpz. After discussing with @airMeng, I decide to comment these new test cases since the backend_buffer
thing cannot be handled well in ARC770
for now. And it may break other backends tests as well for the same reason. I have tested them successfully in Intel(R) Data Center GPU Max 1100 (sycl backend, this branch)
and NV A30 (cuda backend, master and this branch)
. As for ARC-770
, I will find some time to verify stable-diffusion models with large output img and try to fix. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhentaoyu
I'd like to see your improvement in this feature.
It's OK, not add new case for it if you make the existed cases are passed.
Thank you!
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
Signed-off-by: zhentaoyu <[email protected]>
940f74d
to
9262dcb
Compare
Yes, it's OK! |
…ganov#9052) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert overflow Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <[email protected]> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <[email protected]> * sycl:refine convert Signed-off-by: zhentaoyu <[email protected]> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <[email protected]> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <[email protected]> * test: make new cases only in sycl Signed-off-by: zhentaoyu <[email protected]> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
…ganov#9052) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert overflow Signed-off-by: zhentaoyu <[email protected]> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <[email protected]> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <[email protected]> * sycl:refine convert Signed-off-by: zhentaoyu <[email protected]> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <[email protected]> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <[email protected]> * test: make new cases only in sycl Signed-off-by: zhentaoyu <[email protected]> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <[email protected]> --------- Signed-off-by: zhentaoyu <[email protected]>
changes
im2col
whenOH
andOW
are bigger (happens in stable-diffusion) and sync with related cuda codeconvert
int k
toint64_t k
convert_unary
whenk
is large (happens in stable-diffusion).