Should the LightningModule have a property for a datamodule? #11765

ananthsub · 2022-02-05T23:25:24Z

ananthsub
Feb 5, 2022

Data has multiple potential sources

This leads to some confusing logic for both users and the framework over how to access the dataloaders being used. Currently, the LightningModule and DataModule are treated completely independently. Both are passed as arguments to the trainer like so:

dm = MyDataModule()
lm = MyLightningModule()
trainer = Trainer(...)
trainer.fit(lm, dm)

One issue that arises is when the model needs to know about the dataloader. This has come up recently with LR schedulers. Currently, users have to reach into the trainer to get the instance:

self.trainer.datamodule.get_train_dataloader()

or

self.trainer.train_dataloader

But these are not obvious and may not be guaranteed to work at all call times.
In the former, a datamodule might not be used at all.
In the latter, the train dataloader might not be instantiated for reference.

I think this approach could simplify both of these, while still providing good modularity and encapsulation.
Note: this would not be a requirement for how to use these components. This would only be a recommendation, especially for users looking to organize their code.

Changes to LightningModule

def __init__(self):
    self._datamodule: Optional[LightningDataModule] = None

@property
def datamodule(self) -> Optional[LightningDataModule]):
    return self._datamodule

@datamodule.setter
def datamodule(self, datamodule: Optional[LightningDataModule]) -> None:
    self._datamodule = datamodule


def train_dataloader(self) -> TRAIN_DATALOADERS:
    if self._datamodule:
        return self._datamodule.train_dataloader()
    raise NotImplementedError()

Intended use case:

dm = MyDataModule()
lm = MyLightningModule()
lm.datamodule = dm
trainer = Trainer(...)
trainer.fit(lm)

we can even enable this with already instantiated dataloaders for a consistent experience

lm = MyLightningModule()
dm = DataModule.from_dataloaders(train_dataloaders, val_dataloaders, test_dataloaders, predict_dataloaders)
lm.datamodule = dm
trainer = Trainer(...)
trainer.fit(lm)

Why not do this?

Nothing prevents users from doing this already! We absolutely don't need API changes to enable this.

So should we do anything? One option would be to document this usage as a style guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the LightningModule have a property for a datamodule? #11765

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Should the LightningModule have a property for a datamodule? #11765

ananthsub Feb 5, 2022

Why not do this?

Replies: 0 comments

ananthsub
Feb 5, 2022