Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate with PyTorch DataLoaders #23

Open
1 task done
grez72 opened this issue Oct 28, 2024 · 3 comments
Open
1 task done

How to integrate with PyTorch DataLoaders #23

grez72 opened this issue Oct 28, 2024 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@grez72
Copy link

grez72 commented Oct 28, 2024

Describe the question.

Hi,

I'm hoping to integrate nvImageCode with PyTorch DataLoaders (torch utils DataLoader, or FFCV DataLoader, or LitData DataLoader), but I'm struggling.

If I include the decoder as a transform to be used in my dataset.__getitem__ method, I get the dreaded cudaErrorInitializationError:
RuntimeError: Unhandled CUDA error: cudaErrorInitializationError initialization error

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.decoder = nvimgcodec.Decoder() # cudaErrorInitializationError
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        sample['image'] = self.decoder(sample['image'])
        return sample

I can have my dataset return the raw image bytes, and apply the decoder to the list of bytes which is fast, but then I have to loop over items to transform them into pytorch tensors which is slow because it operates over the entire batch sequentially (not in parallel workers). This single step is slow enough that it negates the advantage of using the nvimgcodec.Decoder().

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        return sample

dataset = CustomDataSet(...)
dataloader = DataLoader(dataset, ...)

for batch in tqdm(dataloader):
    imgs = decoder.decode(batch['image'])
    imgs = [torch.tensor(img).moveaxis(-1,0) for img in imgs] # need to do this in the worker processes

I also tried to have my dataset return decode sources with ROIs, but this fails because DecodeSource is not pickleable.

class CustomDataset(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def __getitem__(self, idx):
        sample  = super().__getitem__(idx) # <- Whatever you returned from the DatasetOptimizer prepare_item method.
        roi = nvimgcodec.Region(0, 0, 224, 224) # replace with random crop later...
        sample['image'] = nvimgcodec.DecodeSource(sample['image'], roi)
        return sample    

In any case, I've checked open bugs/issues, and the docs, and I can't find a good example of using nvimgcodec in the context of a dataloader with parallel workers. Any guidance or suggestions for how to handle this would be greatly appreciated.

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@grez72 grez72 added the question Further information is requested label Oct 28, 2024
@jantonguirao
Copy link
Collaborator

Running multiple CUDA contexts (as it will happen as you are running PyTorch data loaders on separate processes) will not provide a good performance. We are currently working on supporting free-threaded Python (https://docs.python.org/3/howto/free-threading-python.html) which will allow us from running samples from separate threads (not processes), sharing a single CUDA context.

We are also working on an alternative solution that doesn't require free-threaded Python and that it'll allow to run multi-process data loaders while keeping the GPU accelerated processing on a single process. We will let you know once we have something to test.

That being said, I believe it should not fail with cudaErrorInitializationError. I believe the issue might be because the decoder instance is being created at init and then transferred to a separate process (this is just a guess). Can you try to move the initialisation of the decoder to the first use so we are sure it gets initialized for each worker? Something like this:

class CustomDataSet(StreamingDataset):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.decoder = None # do not initialize here
    def __getitem__(self, idx):
        sample  = super().__getitem__(idx)
        if self.decoder is None:
            self.decoder = nvimgcodec.Decoder()
        sample['image'] = self.decoder(sample['image'])
        return sample

@jantonguirao jantonguirao self-assigned this Oct 29, 2024
@Harry-675
Copy link

Harry-675 commented Nov 5, 2024

Same question,and the suggestion doesn't work. @jantonguirao Image

@jantonguirao
Copy link
Collaborator

@grez72 @Harry-675 To investigate this further, I'd have to look at a full code sample. Can you provide a minimal reproduction script? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants