You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to https://github.com/intel/compute-runtime/blob/master/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md, there are ways to make allocations greater than 4GB allocations on devices which follows the standard Intel stateful addressing model at this point in time. But you must be able to pass CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL or ze_relaxed_allocation_limits_exp_desc_t through OpenCL or Level Zero respectively. Unfortunately, there doesn't seem to be a way to do this through SYCL right now. This applies to anything in the SYCL backend that that would use zeMemAllocDevice, zeMemAllocShared and zeMemAllocHost for Level Zero and clCreateBuffer, clCreateBufferWithProperties, clCreateBufferWithPropertiesINTEL, clSVMAlloc, clSharedMemAllocINTEL, clDeviceMemAllocINTEL, clHostMemAllocINTEL for OpenCL.
Since the compiler here is what essentially takes in SYCL and spits out Level Zero or OpenCL code for various Intel projects, I think this is the right place to discuss this. Unfortunately, I'm not sure what it would take for this to happen. Would this become a non-standard extension to SYCL like a vendor extension or would something like this need to get standardized? The reason I am opening this is because this seems to be affecting downstream packages like oneDNN here and Intel Extension for Pytorch here where they use SYCL to make their allocations and are hitting this limitation. IPEX is choosing to limit allocations to 4GB only and disallowing >4GB allocations which I don't think is a good solution given there are valid usecases for needing to use more than 4GB even if it involves a performance penalty. I hope this can be considered and some path forward can be made. Thank you.
The text was updated successfully, but these errors were encountered:
simonlui
changed the title
Allow for stateless addressing flags for >4GB for devices to be passed through SYCL
Allow for stateless addressing flags for >4GB allocations for devices to be passed through SYCL
Aug 23, 2023
I don't doubt that that would allow you to pass the required compile flags for >4GB allocations. But according to the document I linked, that doesn't solve the issue with passing the flags I mentioned which is needed for the allocation to work correctly. I also don't have an application personally that would use this, this is more or less a gap I identified given the issues I had with this limitation when using Intel's Extension for Pytorch and running into frequently this 4GB memory limit. That is why I submitted this report.
According to https://github.com/intel/compute-runtime/blob/master/programmers-guide/ALLOCATIONS_GREATER_THAN_4GB.md, there are ways to make allocations greater than 4GB allocations on devices which follows the standard Intel stateful addressing model at this point in time. But you must be able to pass
CL_MEM_ALLOW_UNRESTRICTED_SIZE_INTEL
orze_relaxed_allocation_limits_exp_desc_t
through OpenCL or Level Zero respectively. Unfortunately, there doesn't seem to be a way to do this through SYCL right now. This applies to anything in the SYCL backend that that would usezeMemAllocDevice
,zeMemAllocShared
andzeMemAllocHost
for Level Zero andclCreateBuffer
,clCreateBufferWithProperties
,clCreateBufferWithPropertiesINTEL
,clSVMAlloc
,clSharedMemAllocINTEL
,clDeviceMemAllocINTEL
,clHostMemAllocINTEL
for OpenCL.Since the compiler here is what essentially takes in SYCL and spits out Level Zero or OpenCL code for various Intel projects, I think this is the right place to discuss this. Unfortunately, I'm not sure what it would take for this to happen. Would this become a non-standard extension to SYCL like a vendor extension or would something like this need to get standardized? The reason I am opening this is because this seems to be affecting downstream packages like oneDNN here and Intel Extension for Pytorch here where they use SYCL to make their allocations and are hitting this limitation. IPEX is choosing to limit allocations to 4GB only and disallowing >4GB allocations which I don't think is a good solution given there are valid usecases for needing to use more than 4GB even if it involves a performance penalty. I hope this can be considered and some path forward can be made. Thank you.
The text was updated successfully, but these errors were encountered: