How to integrate with PyTorch DataLoaders #23

grez72 · 2024-10-28T19:21:48Z

Describe the question.

Hi,

I'm hoping to integrate nvImageCode with PyTorch DataLoaders (torch utils DataLoader, or FFCV DataLoader, or LitData DataLoader), but I'm struggling.

If I include the decoder as a transform to be used in my dataset.__getitem__ method, I get the dreaded cudaErrorInitializationError:
RuntimeError: Unhandled CUDA error: cudaErrorInitializationError initialization error

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.decoder = nvimgcodec.Decoder() # cudaErrorInitializationError
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        sample['image'] = self.decoder(sample['image'])
        return sample

I can have my dataset return the raw image bytes, and apply the decoder to the list of bytes which is fast, but then I have to loop over items to transform them into pytorch tensors which is slow because it operates over the entire batch sequentially (not in parallel workers). This single step is slow enough that it negates the advantage of using the nvimgcodec.Decoder().

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        return sample

dataset = CustomDataSet(...)
dataloader = DataLoader(dataset, ...)

for batch in tqdm(dataloader):
    imgs = decoder.decode(batch['image'])
    imgs = [torch.tensor(img).moveaxis(-1,0) for img in imgs] # need to do this in the worker processes

I also tried to have my dataset return decode sources with ROIs, but this fails because DecodeSource is not pickleable.

class CustomDataset(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def __getitem__(self, idx):
        sample  = super().__getitem__(idx) # <- Whatever you returned from the DatasetOptimizer prepare_item method.
        roi = nvimgcodec.Region(0, 0, 224, 224) # replace with random crop later...
        sample['image'] = nvimgcodec.DecodeSource(sample['image'], roi)
        return sample

In any case, I've checked open bugs/issues, and the docs, and I can't find a good example of using nvimgcodec in the context of a dataloader with parallel workers. Any guidance or suggestions for how to handle this would be greatly appreciated.

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

jantonguirao · 2024-10-29T08:42:54Z

Running multiple CUDA contexts (as it will happen as you are running PyTorch data loaders on separate processes) will not provide a good performance. We are currently working on supporting free-threaded Python (https://docs.python.org/3/howto/free-threading-python.html) which will allow us from running samples from separate threads (not processes), sharing a single CUDA context.

We are also working on an alternative solution that doesn't require free-threaded Python and that it'll allow to run multi-process data loaders while keeping the GPU accelerated processing on a single process. We will let you know once we have something to test.

That being said, I believe it should not fail with cudaErrorInitializationError. I believe the issue might be because the decoder instance is being created at init and then transferred to a separate process (this is just a guess). Can you try to move the initialisation of the decoder to the first use so we are sure it gets initialized for each worker? Something like this:

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.decoder = None # do not initialize here
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        if self.decoder is None:
            self.decoder = nvimgcodec.Decoder()
        sample['image'] = self.decoder(sample['image'])
        return sample

Harry-675 · 2024-11-05T09:45:14Z

Same question，and the suggestion doesn't work. @jantonguirao

jantonguirao · 2024-11-05T10:18:29Z

@grez72 @Harry-675 To investigate this further, I'd have to look at a full code sample. Can you provide a minimal reproduction script? Thanks

grez72 added the question Further information is requested label Oct 28, 2024

jantonguirao self-assigned this Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to integrate with PyTorch DataLoaders #23

How to integrate with PyTorch DataLoaders #23

grez72 commented Oct 28, 2024

jantonguirao commented Oct 29, 2024

Harry-675 commented Nov 5, 2024 •

edited

Loading

jantonguirao commented Nov 5, 2024

How to integrate with PyTorch DataLoaders #23

How to integrate with PyTorch DataLoaders #23

Comments

grez72 commented Oct 28, 2024

Describe the question.

Check for duplicates

jantonguirao commented Oct 29, 2024

Harry-675 commented Nov 5, 2024 • edited Loading

jantonguirao commented Nov 5, 2024

Harry-675 commented Nov 5, 2024 •

edited

Loading