Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large images can cause OutOfMemoryError in dataloader #1

Open
RossM opened this issue Nov 12, 2023 · 1 comment
Open

Large images can cause OutOfMemoryError in dataloader #1

RossM opened this issue Nov 12, 2023 · 1 comment

Comments

@RossM
Copy link

RossM commented Nov 12, 2023

When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.

Sample stack trace:

  File "C:\Users\rossm\Source\Repos\OneTrainer\modules\dataLoader\mixin\DataLoaderMgdsMixin.py", line 25, in _create_mgds
    ds = MGDS(
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 357, in __init__
    self.loading_pipeline.start()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 302, in start
    module.start_next_epoch()
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 1185, in start_next_epoch
    item[name] = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 770, in get_item
    previous_item = self.get_previous_item(name, index)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
    item = module.get_item(index, item_name)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 481, in get_item
    image = resize(image)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\transforms.py", line 361, in forward
    return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\functional.py", line 492, in resize
    return F_t.resize(img, size=output_size, interpolation=interpolation.value, antialias=antialias)
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 462, in resize
    img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [torch.float32, torch.float64])
  File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 528, in _cast_squeeze_in
    img = img.to(req_dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 568.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@Nerogar
Copy link
Owner

Nerogar commented Nov 12, 2023

That's a good point. At the moment it's intentionally done on the gpu, because processing on cpu is a lot slower. Maybe a switch would be best. But even then, some parts of the pipeline (like VAE encoding) still need to be done on the gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants