How to predict with 1000s of dataloaders? #19388
Unanswered
HadiSDev
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a case where I have 2K + dataloaders that each hold a dataset with a dataframe.
The model is a pretrained Bert model, and I want to create a trainer that can take all these dataloaders and call predict on them.
Each output should be seperate, meaning I cannot mix between the different dataloaders. I need to save the predictions individually for each dataset.
The problem I am having is I get random OOM from cuda after 20th, 50th, 100th or 200th dataloader, so I am never able to actually finish the predict step. It seems like the predict function was never made to handle so many dataloaders. I tried making it work with 1 or 2 GPUs (L4) but still no success :(
The only solution I am left with is running it with 1 GPU with dataloaders in batches instead, save predictions, and send the next batch of dataloaders, but this does not really work with multi-gpu setup.
Any ideas what I am doing wrong?
Beta Was this translation helpful? Give feedback.
All reactions