Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with retrieving large image data from GPU #344

Closed
pr4deepr opened this issue Feb 21, 2025 · 16 comments
Closed

issues with retrieving large image data from GPU #344

pr4deepr opened this issue Feb 21, 2025 · 16 comments

Comments

@pr4deepr
Copy link
Member

Hi @haesleinhuepf
Hope you have been well.
I'm following up on an issue we had with large image data

BioimageAnalysisCoreWEHI/napari_lattice#105

When I run this code

import numpy as np
import pyclesperanto_prototype as cle

np.random.seed(0)
image = np.random.randint(0, 255, (401, 2304, 512), dtype=np.uint16)

angle_in_degrees=35
voxel_size_x=0.101
voxel_size_y=0.101
voxel_size_z=1.2


deskewed = cle.deskew_x(image,angle_in_degrees=35,
                        voxel_size_x=voxel_size_x,
                        voxel_size_y=voxel_size_y,
                        voxel_size_z=voxel_size_z,)
deskewed.shape

(294, 2304, 5184)
The deskew process works.

When I pull the image from the GPU,

deskewed_np = cle.pull_zyx(deskewed)

I get this error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 deskewed_np = cle.pull_zyx(deskewed)

File ~\AppData\Roaming\Python\Python39\site-packages\pyclesperanto_prototype\_tier0\_pull.py:36, in pull_zyx(gpu_array)
     31 import warnings
     32 warnings.warn(
     33     "Deprecated: `pull_zyx()` is now deprecated as it does the same as `pull()`.",
     34     DeprecationWarning
     35 )
---> 36 return pull(gpu_array)

File ~\AppData\Roaming\Python\Python39\site-packages\pyclesperanto_prototype\_tier0\_pull.py:28, in pull(gpu_array)
      4 def pull(gpu_array):
      5     """Returns an image from GPU memory as numpy compatible array
      6 
      7     .. deprecated:: 0.6.0
   (...)
     26     .. [1] https://clij.github.io/clij2-docs/reference_pull
     27     """
---> 28     return Backend.get_instance().get().asarray(gpu_array)

File ~\AppData\Roaming\Python\Python39\site-packages\pyclesperanto_prototype\_tier0\_opencl_backend.py:18, in OpenCLBackend.asarray(self, image)
     17 def asarray(self, image):
---> 18     return np.asarray(image)

File ~\AppData\Roaming\Python\Python39\site-packages\pyclesperanto_prototype\_tier0\_pycl.py:111, in OCLArray.__array__(self, dtype)
    109 def __array__(self, dtype=None):
    110     if dtype is None:
--> 111         return self.get()
    112     else:
    113         return self.get().astype(dtype)

File c:\ProgramData\Miniconda3\envs\napari_lattice_ilastik\lib\site-packages\pyopencl\array.py:921, in Array.get(self, queue, ary, async_, **kwargs)
    915 if async_:
    916     warn("calling pyopencl.Array.get with 'async_=True' is deprecated. "
    917             "Please use pyopencl.Array.get_async for asynchronous "
    918             "device-to-host transfers",
    919             DeprecationWarning, stacklevel=2)
--> 921 ary, event1 = self._get(queue=queue, ary=ary, async_=async_, **kwargs)
    923 return ary

File c:\ProgramData\Miniconda3\envs\napari_lattice_ilastik\lib\site-packages\pyopencl\array.py:880, in Array._get(self, queue, ary, async_, **kwargs)
    874     raise ValueError("Cannot copy array to host. "
    875             "Array has no queue. Use "
    876             "'new_array = array.with_queue(queue)' "
    877             "to associate one.")
    879 if self.size:
--> 880     event1 = cl.enqueue_copy(queue, ary, self.base_data,
    881             src_offset=self.offset,
    882             wait_for=self.events, is_blocking=not async_)
    884     self.add_event(event1)
    885 else:

File c:\ProgramData\Miniconda3\envs\napari_lattice_ilastik\lib\site-packages\pyopencl\__init__.py:2013, in enqueue_copy(queue, dest, src, **kwargs)
   2006             warn("The 'device_offset' argument of enqueue_copy "
   2007                     "is deprecated. Use 'src_offset' instead. "
   2008                     "'dst_offset' will stop working in 2023.x.",
   2009                     DeprecationWarning, stacklevel=2)
   2011             kwargs["src_offset"] = device_offset
-> 2013         return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
   2015 elif src.type in _IMAGE_MEM_OBJ_TYPES:
   2016     origin = kwargs.pop("origin")

RuntimeError: clEnqueueReadBuffer failed: OUT_OF_RESOURCES

It looks like we are running out of memory, but I assumed as the image is already on GPU, it shouldn't be a problem?

Cheers
Pradeep

@haesleinhuepf
Copy link
Member

Hi @pr4deepr ,

I'm good and hope you too! I have a suspicion, but want to check if the AI knows as well ;-)

git-bob, please comment on potential reasons for the OpenCL error OUT_OF_RESOURCES and ways to further debug such an error.

Copy link

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

The OUT_OF_RESOURCES error is happening during transfer from GPU back to CPU memory, not during the actual deskewing computation. This suggests the host system is running out of memory when trying to allocate space for the large deskewed array (~3GB).

A few things that could help debug this:

  1. Print available GPU memory before and after deskewing:
print(f"GPU memory before: {cle.get_gpu().get_info(cl.device_info.GLOBAL_MEM_SIZE)/(1024**3):.1f} GB")
deskewed = cle.deskew_x(image, ...)
print(f"GPU memory after: {cle.get_gpu().get_info(cl.device_info.GLOBAL_MEM_SIZE)/(1024**3):.1f} GB")
  1. Try pulling smaller chunks of the deskewed data at a time instead of all at once (though this would require lower-level PyOpenCL calls)

@haesleinhuepf - would you know if there's a way to check host memory availability through PyOpenCL or if there are any existing methods in pyclesperanto for chunked data transfer?

@haesleinhuepf
Copy link
Member

I agree with git-bob (even though the code snippet might not work) ☺️. The error might be related to lack of CPU memory. @pr4deepr could you print out the size of the output image before pulling it and check if that much CPU memory is available?

@pr4deepr
Copy link
Member Author

If you're talking about the RAM, the PC has about 700GB RAM, so plenty available.

@haesleinhuepf
Copy link
Member

I'm wondering what other resources could be depleted.

git-bob answer with a list

Copy link

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

Following potential OpenCL resources could be depleted and cause the OUT_OF_RESOURCES error:

  1. Pin-able host memory - memory that can be mapped for DMA transfer
  2. Command queue resources
  3. Memory mapping resources
  4. Memory fragmentation on GPU
  5. Kernel execution resources
  6. Memory bank/channel limitations

To investigate further, I suggest checking if using cle.push and cle.pull with smaller chunks of the image works. We could try transferring slices individually rather than the full array at once.

@haesleinhuepf - would you know if there are OpenCL command queue or pinned memory limitations we should consider here?

@haesleinhuepf
Copy link
Member

(294, 2304, 5184)
The deskew process works.

At that point, the deskewing might not have happened yet because OpenCL queues work like lazy dask processing. The image is 13 GB large if it is of type float. @pr4deepr what GPU are you using?

@pr4deepr
Copy link
Member Author

It's a Nvidia Quadro RTX 6000, 24GB of RAM

@pr4deepr
Copy link
Member Author

Is it got to do with fact that image has to be of a certain size to work with GPU? I forgot the exact details.

It does work with smaller images, like half size in xy.. I assumed it would throw a memory error when I run 'cle.deskew' command, wasn't used to getting an error when pulling the image back..
If it's like lazy dask processing that makes more sense.

@haesleinhuepf
Copy link
Member

I assumed it would throw a memory error when I run 'cle.deskew' command,

It starts computing and may return an error, when you ask to do the next operation on the data, e.g. when pulling it.

Could you please run this to print out specs of the GPU(s)?

import pyclesperanto_prototype as cle
print(cle.cl_info())

I presume it outputs a line starting with MAX_MEM_ALLOC_SIZE and ending with about 6 GB. That means the maximum image size is 6 GB. Note that in float type, every pixel consists of 4 bytes and unsigned 16-bit integer has 2 bytes per pixel.

Also this might be helpful to see which GPU you are using (just for completeness):

import pyclesperanto_prototype as cle
print(cle.get_device())

@thawn
Copy link
Collaborator

thawn commented Feb 21, 2025

FYI, on a numpy array, you can use image.nbytes to get the actual memory consumption of the array.

@StRigaud
Copy link
Member

I presume it outputs a line starting with MAX_MEM_ALLOC_SIZE and ending with about 6 GB. That means the maximum image size is 6 GB.

I am pretty sure that the issue is the GPU RAM on two levels:

  • the maximum allocation size is too small (usually it is the GPU RAM / 4, so ~6 GB for a 24GB GPU)
  • the total amount of RAM is not sufficiant as you can only store 26 GB which is 2 time your image that will be used as input and output. That leave you with 0 memory available. This is clearly not enough for a valid process.

If you want to process your image with this GPU you would need to tile it.

@pr4deepr
Copy link
Member Author

pr4deepr commented Feb 21, 2025

Thanks @thawn and @StRigaud

cl_info()

Oclgrind
EXTENSIONS:cl_khr_icd
EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x00000147C20A9070>]
HOST_TIMER_RESOLUTION:0
NAME:Oclgrind
NUMERIC_VERSION:12582912
PROFILE:FULL_PROFILE
VENDOR:Oclgrind
VERSION:OpenCL 1.2 (Oclgrind 21.10)


    Oclgrind Simulator
       ADDRESS_BITS:64
       ATOMIC_FENCE_CAPABILITIES:19
       ATOMIC_MEMORY_CAPABILITIES:17
       ATTRIBUTE_ASYNC_ENGINE_COUNT_NV:None
       AVAILABLE:1
       AVAILABLE_ASYNC_QUEUES_AMD:None
       BOARD_NAME_AMD:None
       BUILT_IN_KERNELS:
       BUILT_IN_KERNELS_WITH_VERSION:[]
       COMPILER_AVAILABLE:1
       COMPUTE_CAPABILITY_MAJOR_NV:None
       COMPUTE_CAPABILITY_MINOR_NV:None
       DEVICE_ENQUEUE_CAPABILITIES:0
       DOUBLE_FP_CONFIG:63
       DRIVER_VERSION:Oclgrind 21.10
       ENDIAN_LITTLE:1
       ERROR_CORRECTION_SUPPORT:0
       EXECUTION_CAPABILITIES:3
       EXTENSIONS: cl_khr_spir cl_khr_3d_image_writes cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64
       EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x00000147C20A9070>, <pyopencl._cl.NameVersion object at 0x00000147C20A9170>, <pyopencl._cl.NameVersion object at 0x00000147C20D95F0>, <pyopencl._cl.NameVersion object at 0x00000147C20D9B30>, <pyopencl._cl.NameVersion object at 0x00000147C20D99B0>, <pyopencl._cl.NameVersion object at 0x00000147C20D94B0>, <pyopencl._cl.NameVersion object at 0x00000147C20D9D30>, <pyopencl._cl.NameVersion object at 0x00000147C219C430>, <pyopencl._cl.NameVersion object at 0x00000147C219C630>, <pyopencl._cl.NameVersion object at 0x00000147C20D9BF0>]
       EXT_MEM_PADDING_IN_BYTES_QCOM:None
       GENERIC_ADDRESS_SPACE_SUPPORT:0
       GFXIP_MAJOR_AMD:None
       GFXIP_MINOR_AMD:None
       GLOBAL_FREE_MEMORY_AMD:None
       GLOBAL_MEM_CACHELINE_SIZE:0
       GLOBAL_MEM_CACHE_SIZE:0
       GLOBAL_MEM_CACHE_TYPE:0
       GLOBAL_MEM_CHANNELS_AMD:None
       GLOBAL_MEM_CHANNEL_BANKS_AMD:None
       GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD:None
       GLOBAL_MEM_SIZE:134217728
       GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE:134217728
       GPU_OVERLAP_NV:None
       HALF_FP_CONFIG:None
       HOST_UNIFIED_MEMORY:0
       ILS_WITH_VERSION:[]
       IL_VERSION:
       IMAGE2D_MAX_HEIGHT:8192
       IMAGE2D_MAX_WIDTH:8192
       IMAGE3D_MAX_DEPTH:2048
       IMAGE3D_MAX_HEIGHT:2048
       IMAGE3D_MAX_WIDTH:2048
       IMAGE_BASE_ADDRESS_ALIGNMENT:None
       IMAGE_MAX_ARRAY_SIZE:2048
       IMAGE_MAX_BUFFER_SIZE:65536
       IMAGE_PITCH_ALIGNMENT:None
       IMAGE_SUPPORT:1
       INTEGRATED_MEMORY_NV:None
       KERNEL_EXEC_TIMEOUT_NV:None
       LINKER_AVAILABLE:1
       LOCAL_MEM_BANKS_AMD:None
       LOCAL_MEM_SIZE:32768
       LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD:None
       LOCAL_MEM_TYPE:1
       MAX_CLOCK_FREQUENCY:1
       MAX_COMPUTE_UNITS:1
       MAX_CONSTANT_ARGS:1024
       MAX_CONSTANT_BUFFER_SIZE:65536
       MAX_GLOBAL_VARIABLE_SIZE:65536
       MAX_MEM_ALLOC_SIZE:134217728
       MAX_NUM_SUB_GROUPS:0
       MAX_ON_DEVICE_EVENTS:0
       MAX_ON_DEVICE_QUEUES:0
       MAX_PARAMETER_SIZE:1024
       MAX_PIPE_ARGS:0
       MAX_READ_IMAGE_ARGS:128
       MAX_READ_WRITE_IMAGE_ARGS:64
       MAX_SAMPLERS:16
       MAX_WORK_GROUP_SIZE:1024
       MAX_WORK_GROUP_SIZE_AMD:None
       MAX_WORK_ITEM_DIMENSIONS:3
       MAX_WORK_ITEM_SIZES:[1024, 1024, 1024]
       MAX_WRITE_IMAGE_ARGS:64
       MEM_BASE_ADDR_ALIGN:1024
       ME_VERSION_INTEL:None
       MIN_DATA_TYPE_ALIGN_SIZE:1
       NAME:Oclgrind Simulator
       NATIVE_VECTOR_WIDTH_CHAR:1
       NATIVE_VECTOR_WIDTH_DOUBLE:1
       NATIVE_VECTOR_WIDTH_FLOAT:1
       NATIVE_VECTOR_WIDTH_HALF:0
       NATIVE_VECTOR_WIDTH_INT:1
       NATIVE_VECTOR_WIDTH_LONG:1
       NATIVE_VECTOR_WIDTH_SHORT:1
       NON_UNIFORM_WORK_GROUP_SUPPORT:1
       NUMERIC_VERSION:12582912
       NUM_SIMULTANEOUS_INTEROPS_INTEL:None
       OPENCL_C_ALL_VERSIONS:[<pyopencl._cl.NameVersion object at 0x00000147C20A9570>, <pyopencl._cl.NameVersion object at 0x00000147C219C530>, <pyopencl._cl.NameVersion object at 0x00000147C219C6B0>, <pyopencl._cl.NameVersion object at 0x00000147C219C670>]
       OPENCL_C_FEATURES:[]
       OPENCL_C_VERSION:OpenCL C 1.2 (Oclgrind 21.10)
       PAGE_SIZE_QCOM:None
       PARTITION_AFFINITY_DOMAIN:[0]
       PARTITION_MAX_SUB_DEVICES:0
       PARTITION_PROPERTIES:[0]
       PARTITION_TYPE:[0]
       PCIE_ID_AMD:None
       PCI_BUS_ID_NV:None
       PCI_DOMAIN_ID_NV:None
       PCI_SLOT_ID_NV:None
       PIPE_MAX_ACTIVE_RESERVATIONS:0
       PIPE_MAX_PACKET_SIZE:0
       PIPE_SUPPORT:0
       PLATFORM:<pyopencl.Platform 'Oclgrind' at 0x147bf7ce160>
       PREFERRED_CONSTANT_BUFFER_SIZE_AMD:None
       PREFERRED_GLOBAL_ATOMIC_ALIGNMENT:0
       PREFERRED_INTEROP_USER_SYNC:1
       PREFERRED_LOCAL_ATOMIC_ALIGNMENT:0
       PREFERRED_PLATFORM_ATOMIC_ALIGNMENT:0
       PREFERRED_VECTOR_WIDTH_CHAR:1
       PREFERRED_VECTOR_WIDTH_DOUBLE:1
       PREFERRED_VECTOR_WIDTH_FLOAT:1
       PREFERRED_VECTOR_WIDTH_HALF:0
       PREFERRED_VECTOR_WIDTH_INT:1
       PREFERRED_VECTOR_WIDTH_LONG:1
       PREFERRED_VECTOR_WIDTH_SHORT:1
       PREFERRED_WORK_GROUP_SIZE_AMD:None
       PREFERRED_WORK_GROUP_SIZE_MULTIPLE:1
       PRINTF_BUFFER_SIZE:None
       PROFILE:FULL_PROFILE
       PROFILING_TIMER_OFFSET_AMD:None
       PROFILING_TIMER_RESOLUTION:1000
       QUEUE_ON_DEVICE_MAX_SIZE:0
       QUEUE_ON_DEVICE_PREFERRED_SIZE:0
       QUEUE_ON_DEVICE_PROPERTIES:0
       QUEUE_ON_HOST_PROPERTIES:3
       QUEUE_PROPERTIES:3
       REFERENCE_COUNT:1
       REGISTERS_PER_BLOCK_NV:None
       SIMD_INSTRUCTION_WIDTH_AMD:None
       SIMD_PER_COMPUTE_UNIT_AMD:None
       SIMD_WIDTH_AMD:None
       SIMULTANEOUS_INTEROPS_INTEL:None
       SINGLE_FP_CONFIG:7
       SPIR_VERSIONS:1.2
       SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS:0
       SVM_CAPABILITIES:0
       THREAD_TRACE_SUPPORTED_AMD:None
       TOPOLOGY_AMD:None
       TYPE:15
       VENDOR:Oclgrind
       VENDOR_ID:66
       VERSION:OpenCL 1.2 (Oclgrind 21.10)
       WARP_SIZE_NV:None
       WAVEFRONT_WIDTH_AMD:None
       WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT:0




NVIDIA CUDA
EXTENSIONS:cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x00000147C20D99B0>, <pyopencl._cl.NameVersion object at 0x00000147C20D94B0>, <pyopencl._cl.NameVersion object at 0x00000147C20D9D30>, <pyopencl._cl.NameVersion object at 0x00000147C20D9BF0>, <pyopencl._cl.NameVersion object at 0x00000147C20A9570>, <pyopencl._cl.NameVersion object at 0x00000147C20A9170>, <pyopencl._cl.NameVersion object at 0x00000147C20A91F0>, <pyopencl._cl.NameVersion object at 0x00000147C20A95F0>, <pyopencl._cl.NameVersion object at 0x00000147C219C6F0>, <pyopencl._cl.NameVersion object at 0x00000147C20A9070>, <pyopencl._cl.NameVersion object at 0x00000147C219C830>, <pyopencl._cl.NameVersion object at 0x00000147C219C530>, <pyopencl._cl.NameVersion object at 0x00000147C219C6B0>, <pyopencl._cl.NameVersion object at 0x00000147C219C670>, <pyopencl._cl.NameVersion object at 0x00000147C219C430>, <pyopencl._cl.NameVersion object at 0x00000147C219C630>, <pyopencl._cl.NameVersion object at 0x00000147C219CA30>, <pyopencl._cl.NameVersion object at 0x00000147C219C8F0>, <pyopencl._cl.NameVersion object at 0x00000147C219C930>, <pyopencl._cl.NameVersion object at 0x00000147C219C9B0>, <pyopencl._cl.NameVersion object at 0x00000147C219CDB0>, <pyopencl._cl.NameVersion object at 0x00000147C219CDF0>, <pyopencl._cl.NameVersion object at 0x00000147C219CE30>, <pyopencl._cl.NameVersion object at 0x00000147C219CEF0>, <pyopencl._cl.NameVersion object at 0x00000147C219C8B0>, <pyopencl._cl.NameVersion object at 0x00000147C219C970>, <pyopencl._cl.NameVersion object at 0x00000147C219C9F0>]
HOST_TIMER_RESOLUTION:0
NAME:NVIDIA CUDA
NUMERIC_VERSION:12582912
PROFILE:FULL_PROFILE
VENDOR:NVIDIA Corporation
VERSION:OpenCL 3.0 CUDA 12.4.89


    Quadro RTX 6000
       ADDRESS_BITS:64
       ATOMIC_FENCE_CAPABILITIES:19
       ATOMIC_MEMORY_CAPABILITIES:17
       ATTRIBUTE_ASYNC_ENGINE_COUNT_NV:6
       AVAILABLE:1
       AVAILABLE_ASYNC_QUEUES_AMD:None
       BOARD_NAME_AMD:None
       BUILT_IN_KERNELS:
       BUILT_IN_KERNELS_WITH_VERSION:[]
       COMPILER_AVAILABLE:1
       COMPUTE_CAPABILITY_MAJOR_NV:7
       COMPUTE_CAPABILITY_MINOR_NV:5
       DEVICE_ENQUEUE_CAPABILITIES:0
       DOUBLE_FP_CONFIG:63
       DRIVER_VERSION:551.78
       ENDIAN_LITTLE:1
       ERROR_CORRECTION_SUPPORT:1
       EXECUTION_CAPABILITIES:1
       EXTENSIONS:cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
       EXTENSIONS_WITH_VERSION:[<pyopencl._cl.NameVersion object at 0x00000147C20D9D30>, <pyopencl._cl.NameVersion object at 0x00000147C20D9BF0>, <pyopencl._cl.NameVersion object at 0x00000147C20A9570>, <pyopencl._cl.NameVersion object at 0x00000147C20A9170>, <pyopencl._cl.NameVersion object at 0x00000147C20A91F0>, <pyopencl._cl.NameVersion object at 0x00000147C20A9070>, <pyopencl._cl.NameVersion object at 0x00000147C219C4F0>, <pyopencl._cl.NameVersion object at 0x00000147C219CAB0>, <pyopencl._cl.NameVersion object at 0x00000147C219C6F0>, <pyopencl._cl.NameVersion object at 0x00000147C20A95F0>, <pyopencl._cl.NameVersion object at 0x00000147C219C830>, <pyopencl._cl.NameVersion object at 0x00000147C219C530>, <pyopencl._cl.NameVersion object at 0x00000147C219C6B0>, <pyopencl._cl.NameVersion object at 0x00000147C219C670>, <pyopencl._cl.NameVersion object at 0x00000147C219C430>, <pyopencl._cl.NameVersion object at 0x00000147C219C630>, <pyopencl._cl.NameVersion object at 0x00000147C219CA30>, <pyopencl._cl.NameVersion object at 0x00000147C219C8F0>, <pyopencl._cl.NameVersion object at 0x00000147C219C930>, <pyopencl._cl.NameVersion object at 0x00000147C219C9B0>, <pyopencl._cl.NameVersion object at 0x00000147C219CDB0>, <pyopencl._cl.NameVersion object at 0x00000147C219CDF0>, <pyopencl._cl.NameVersion object at 0x00000147C219CE30>, <pyopencl._cl.NameVersion object at 0x00000147C219CEF0>, <pyopencl._cl.NameVersion object at 0x00000147C219C8B0>, <pyopencl._cl.NameVersion object at 0x00000147C219C970>, <pyopencl._cl.NameVersion object at 0x00000147C219C9F0>]
       EXT_MEM_PADDING_IN_BYTES_QCOM:None
       GENERIC_ADDRESS_SPACE_SUPPORT:0
       GFXIP_MAJOR_AMD:None
       GFXIP_MINOR_AMD:None
       GLOBAL_FREE_MEMORY_AMD:None
       GLOBAL_MEM_CACHELINE_SIZE:128
       GLOBAL_MEM_CACHE_SIZE:2359296
       GLOBAL_MEM_CACHE_TYPE:2
       GLOBAL_MEM_CHANNELS_AMD:None
       GLOBAL_MEM_CHANNEL_BANKS_AMD:None
       GLOBAL_MEM_CHANNEL_BANK_WIDTH_AMD:None
       GLOBAL_MEM_SIZE:24158928896
       GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE:0
       GPU_OVERLAP_NV:1
       HALF_FP_CONFIG:None
       HOST_UNIFIED_MEMORY:0
       ILS_WITH_VERSION:[]
       IL_VERSION:
       IMAGE2D_MAX_HEIGHT:32768
       IMAGE2D_MAX_WIDTH:32768
       IMAGE3D_MAX_DEPTH:16384
       IMAGE3D_MAX_HEIGHT:16384
       IMAGE3D_MAX_WIDTH:16384
       IMAGE_BASE_ADDRESS_ALIGNMENT:None
       IMAGE_MAX_ARRAY_SIZE:2048
       IMAGE_MAX_BUFFER_SIZE:268435456
       IMAGE_PITCH_ALIGNMENT:None
       IMAGE_SUPPORT:1
       INTEGRATED_MEMORY_NV:0
       KERNEL_EXEC_TIMEOUT_NV:1
       LINKER_AVAILABLE:1
       LOCAL_MEM_BANKS_AMD:None
       LOCAL_MEM_SIZE:49152
       LOCAL_MEM_SIZE_PER_COMPUTE_UNIT_AMD:None
       LOCAL_MEM_TYPE:1
       MAX_CLOCK_FREQUENCY:1620
       MAX_COMPUTE_UNITS:72
       MAX_CONSTANT_ARGS:9
       MAX_CONSTANT_BUFFER_SIZE:65536
       MAX_GLOBAL_VARIABLE_SIZE:0
       MAX_MEM_ALLOC_SIZE:6039732224
       MAX_NUM_SUB_GROUPS:0
       MAX_ON_DEVICE_EVENTS:0
       MAX_ON_DEVICE_QUEUES:0
       MAX_PARAMETER_SIZE:32764
       MAX_PIPE_ARGS:0
       MAX_READ_IMAGE_ARGS:256
       MAX_READ_WRITE_IMAGE_ARGS:0
       MAX_SAMPLERS:32
       MAX_WORK_GROUP_SIZE:1024
       MAX_WORK_GROUP_SIZE_AMD:None
       MAX_WORK_ITEM_DIMENSIONS:3
       MAX_WORK_ITEM_SIZES:[1024, 1024, 64]
       MAX_WRITE_IMAGE_ARGS:32
       MEM_BASE_ADDR_ALIGN:4096
       ME_VERSION_INTEL:None
       MIN_DATA_TYPE_ALIGN_SIZE:128
       NAME:Quadro RTX 6000
       NATIVE_VECTOR_WIDTH_CHAR:1
       NATIVE_VECTOR_WIDTH_DOUBLE:1
       NATIVE_VECTOR_WIDTH_FLOAT:1
       NATIVE_VECTOR_WIDTH_HALF:0
       NATIVE_VECTOR_WIDTH_INT:1
       NATIVE_VECTOR_WIDTH_LONG:1
       NATIVE_VECTOR_WIDTH_SHORT:1
       NON_UNIFORM_WORK_GROUP_SUPPORT:0
       NUMERIC_VERSION:12582912
       NUM_SIMULTANEOUS_INTEROPS_INTEL:None
       OPENCL_C_ALL_VERSIONS:[<pyopencl._cl.NameVersion object at 0x00000147C9A22D30>, <pyopencl._cl.NameVersion object at 0x00000147C21A1630>, <pyopencl._cl.NameVersion object at 0x00000147C21A16B0>, <pyopencl._cl.NameVersion object at 0x00000147C21A12B0>]
       OPENCL_C_FEATURES:[<pyopencl._cl.NameVersion object at 0x00000147C21A1330>, <pyopencl._cl.NameVersion object at 0x00000147C21A14B0>, <pyopencl._cl.NameVersion object at 0x00000147C21A19B0>, <pyopencl._cl.NameVersion object at 0x00000147C21A1AF0>]
       OPENCL_C_VERSION:OpenCL C 1.2 
       PAGE_SIZE_QCOM:None
       PARTITION_AFFINITY_DOMAIN:[0]
       PARTITION_MAX_SUB_DEVICES:1
       PARTITION_PROPERTIES:[0]
       PARTITION_TYPE:[0]
       PCIE_ID_AMD:None
       PCI_BUS_ID_NV:27
       PCI_DOMAIN_ID_NV:0
       PCI_SLOT_ID_NV:0
       PIPE_MAX_ACTIVE_RESERVATIONS:0
       PIPE_MAX_PACKET_SIZE:0
       PIPE_SUPPORT:0
       PLATFORM:<pyopencl.Platform 'NVIDIA CUDA' at 0x147c2417b50>
       PREFERRED_CONSTANT_BUFFER_SIZE_AMD:None
       PREFERRED_GLOBAL_ATOMIC_ALIGNMENT:0
       PREFERRED_INTEROP_USER_SYNC:0
       PREFERRED_LOCAL_ATOMIC_ALIGNMENT:0
       PREFERRED_PLATFORM_ATOMIC_ALIGNMENT:0
       PREFERRED_VECTOR_WIDTH_CHAR:1
       PREFERRED_VECTOR_WIDTH_DOUBLE:1
       PREFERRED_VECTOR_WIDTH_FLOAT:1
       PREFERRED_VECTOR_WIDTH_HALF:0
       PREFERRED_VECTOR_WIDTH_INT:1
       PREFERRED_VECTOR_WIDTH_LONG:1
       PREFERRED_VECTOR_WIDTH_SHORT:1
       PREFERRED_WORK_GROUP_SIZE_AMD:None
       PREFERRED_WORK_GROUP_SIZE_MULTIPLE:32
       PRINTF_BUFFER_SIZE:None
       PROFILE:FULL_PROFILE
       PROFILING_TIMER_OFFSET_AMD:None
       PROFILING_TIMER_RESOLUTION:1000
       QUEUE_ON_DEVICE_MAX_SIZE:0
       QUEUE_ON_DEVICE_PREFERRED_SIZE:0
       QUEUE_ON_DEVICE_PROPERTIES:0
       QUEUE_ON_HOST_PROPERTIES:3
       QUEUE_PROPERTIES:3
       REFERENCE_COUNT:1
       REGISTERS_PER_BLOCK_NV:65536
       SIMD_INSTRUCTION_WIDTH_AMD:None
       SIMD_PER_COMPUTE_UNIT_AMD:None
       SIMD_WIDTH_AMD:None
       SIMULTANEOUS_INTEROPS_INTEL:None
       SINGLE_FP_CONFIG:191
       SPIR_VERSIONS:None
       SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS:0
       SVM_CAPABILITIES:1
       THREAD_TRACE_SUPPORTED_AMD:None
       TOPOLOGY_AMD:None
       TYPE:4
       VENDOR:NVIDIA Corporation
       VENDOR_ID:4318
       VERSION:OpenCL 3.0 CUDA
       WARP_SIZE_NV:32
       WAVEFRONT_WIDTH_AMD:None
       WORK_GROUP_COLLECTIVE_FUNCTIONS_SUPPORT:0




Current device: Quadro RTX 6000

cl.get_device

<Quadro RTX 6000 on Platform: NVIDIA CUDA (1 refs)>

yea, entry for graphics card is:

MAX_MEM_ALLOC_SIZE:6039732224

so, 6GB..

If I wanted to handle this error in the code, can I use the condition:
if output image in float32 is > MAX_MEM_ALLOC_SIZE , then image should be tiled?

@haesleinhuepf
Copy link
Member

It should be possible to process the output string of the cl_info function using Python and regular expressions to have an if-statement that checks if the number in the line behind MAX_MEM_ALLOC_SIZE is larger than 610241024*1024 .

git-bob comment how to do this

Copy link

This message was generated by git-bob (version: 0.19.0, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

To parse the maximum allocation size from the cl_info output, you could do something like this:

import re

def get_max_allocation_size():
    info = cle.cl_info()
    match = re.search(r'MAX_MEM_ALLOC_SIZE:(\d+)', info)
    if match:
        return int(match.group(1))
    return None

# Example usage:
max_size = get_max_allocation_size()
needed_size = output_shape[0] * output_shape[1] * output_shape[2] * 4  # float32 = 4 bytes
if needed_size > max_size:
    print(f"Output image size ({needed_size/1e9:.1f} GB) exceeds maximum GPU allocation size ({max_size/1e9:.1f} GB)")
    # Handle tiling here

This would help catch the allocation limit before attempting the operation.

@pr4deepr
Copy link
Member Author

Thanks a lot for the help!!
Good job git-bob!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants