Large images can cause OutOfMemoryError in dataloader #1

RossM · 2023-11-12T05:49:34Z

When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.

Sample stack trace:

  File "C:\Users\rossm\Source\Repos\OneTrainer\modules\dataLoader\mixin\DataLoaderMgdsMixin.py", line 25, in _create_mgds
    ds = MGDS(
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 357, in __init__
    self.loading_pipeline.start()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 302, in start
    module.start_next_epoch()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 1185, in start_next_epoch
    item[name] = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 770, in get_item
    previous_item = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 481, in get_item
    image = resize(image)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\transforms.py", line 361, in forward
    return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\functional.py", line 492, in resize
    return F_t.resize(img, size=output_size, interpolation=interpolation.value, antialias=antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 462, in resize
    img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [torch.float32, torch.float64])
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 528, in _cast_squeeze_in
    img = img.to(req_dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 568.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

Nerogar · 2023-11-12T09:34:07Z

That's a good point. At the moment it's intentionally done on the gpu, because processing on cpu is a lot slower. Maybe a switch would be best. But even then, some parts of the pipeline (like VAE encoding) still need to be done on the gpu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large images can cause OutOfMemoryError in dataloader #1

Large images can cause OutOfMemoryError in dataloader #1

RossM commented Nov 12, 2023

Nerogar commented Nov 12, 2023

Large images can cause OutOfMemoryError in dataloader #1

Large images can cause OutOfMemoryError in dataloader #1

Comments

RossM commented Nov 12, 2023

Nerogar commented Nov 12, 2023