You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.
Sample stack trace:
File "C:\Users\rossm\Source\Repos\OneTrainer\modules\dataLoader\mixin\DataLoaderMgdsMixin.py", line 25, in _create_mgds
ds = MGDS(
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 357, in __init__
self.loading_pipeline.start()
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 302, in start
module.start_next_epoch()
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 1185, in start_next_epoch
item[name] = self.get_previous_item(name, index)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
item = module.get_item(index, item_name)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 770, in get_item
previous_item = self.get_previous_item(name, index)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\MGDS.py", line 51, in get_previous_item
item = module.get_item(index, item_name)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\mgds\GenericDataLoaderModules.py", line 481, in get_item
image = resize(image)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\transforms.py", line 361, in forward
return F.resize(img, self.size, self.interpolation, self.max_size, self.antialias)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\functional.py", line 492, in resize
return F_t.resize(img, size=output_size, interpolation=interpolation.value, antialias=antialias)
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 462, in resize
img, need_cast, need_squeeze, out_dtype = _cast_squeeze_in(img, [torch.float32, torch.float64])
File "C:\Users\rossm\Source\Repos\OneTrainer\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 528, in _cast_squeeze_in
img = img.to(req_dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 568.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free. Of the allocated memory 13.50 GiB is allocated by PyTorch, and 1.46 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered:
That's a good point. At the moment it's intentionally done on the gpu, because processing on cpu is a lot slower. Maybe a switch would be best. But even then, some parts of the pipeline (like VAE encoding) still need to be done on the gpu.
When loading image files, the file is moved to the GPU before doing preprocessing such as resizing and cropping. This can result in an out of memory CUDA error if the image is large enough. Preprocessing should be done on the CPU and the model only moved to GPU when needed to run the NN.
Sample stack trace:
The text was updated successfully, but these errors were encountered: