You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm hoping to integrate nvImageCode with PyTorch DataLoaders (torch utils DataLoader, or FFCV DataLoader, or LitData DataLoader), but I'm struggling.
If I include the decoder as a transform to be used in my dataset.__getitem__ method, I get the dreaded cudaErrorInitializationError:
RuntimeError: Unhandled CUDA error: cudaErrorInitializationError initialization error
I can have my dataset return the raw image bytes, and apply the decoder to the list of bytes which is fast, but then I have to loop over items to transform them into pytorch tensors which is slow because it operates over the entire batch sequentially (not in parallel workers). This single step is slow enough that it negates the advantage of using the nvimgcodec.Decoder().
classCustomDataSet(StreamingDataset):
def__init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def__getitem__(self, idx):
sample=super().__getitem__(idx)
returnsampledataset=CustomDataSet(...)
dataloader=DataLoader(dataset, ...)
forbatchintqdm(dataloader):
imgs=decoder.decode(batch['image'])
imgs= [torch.tensor(img).moveaxis(-1,0) forimginimgs] # need to do this in the worker processes
I also tried to have my dataset return decode sources with ROIs, but this fails because DecodeSource is not pickleable.
class CustomDataset(StreamingDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __getitem__(self, idx):
sample = super().__getitem__(idx) # <- Whatever you returned from the DatasetOptimizer prepare_item method.
roi = nvimgcodec.Region(0, 0, 224, 224) # replace with random crop later...
sample['image'] = nvimgcodec.DecodeSource(sample['image'], roi)
return sample
In any case, I've checked open bugs/issues, and the docs, and I can't find a good example of using nvimgcodec in the context of a dataloader with parallel workers. Any guidance or suggestions for how to handle this would be greatly appreciated.
Check for duplicates
I have searched the open bugs/issues and have found no duplicates for this bug report
The text was updated successfully, but these errors were encountered:
Running multiple CUDA contexts (as it will happen as you are running PyTorch data loaders on separate processes) will not provide a good performance. We are currently working on supporting free-threaded Python (https://docs.python.org/3/howto/free-threading-python.html) which will allow us from running samples from separate threads (not processes), sharing a single CUDA context.
We are also working on an alternative solution that doesn't require free-threaded Python and that it'll allow to run multi-process data loaders while keeping the GPU accelerated processing on a single process. We will let you know once we have something to test.
That being said, I believe it should not fail with cudaErrorInitializationError. I believe the issue might be because the decoder instance is being created at init and then transferred to a separate process (this is just a guess). Can you try to move the initialisation of the decoder to the first use so we are sure it gets initialized for each worker? Something like this:
class CustomDataSet(StreamingDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.decoder = None # do not initialize here
def __getitem__(self, idx):
sample = super().__getitem__(idx)
if self.decoder is None:
self.decoder = nvimgcodec.Decoder()
sample['image'] = self.decoder(sample['image'])
return sample
Describe the question.
Hi,
I'm hoping to integrate nvImageCode with PyTorch DataLoaders (torch utils DataLoader, or FFCV DataLoader, or LitData DataLoader), but I'm struggling.
If I include the decoder as a transform to be used in my
dataset.__getitem__
method, I get the dreaded cudaErrorInitializationError:RuntimeError: Unhandled CUDA error: cudaErrorInitializationError initialization error
I can have my dataset return the raw image bytes, and apply the decoder to the list of bytes which is fast, but then I have to loop over items to transform them into pytorch tensors which is slow because it operates over the entire batch sequentially (not in parallel workers). This single step is slow enough that it negates the advantage of using the nvimgcodec.Decoder().
I also tried to have my dataset return decode sources with ROIs, but this fails because DecodeSource is not pickleable.
In any case, I've checked open bugs/issues, and the docs, and I can't find a good example of using nvimgcodec in the context of a dataloader with parallel workers. Any guidance or suggestions for how to handle this would be greatly appreciated.
Check for duplicates
The text was updated successfully, but these errors were encountered: