Best practices: CLI and Loading DataModule from config.yaml #10956

cheind · 2021-12-06T13:38:14Z

cheind
Dec 6, 2021

Hey,

I've been using both, PyTorch Lightning/CLI and jsonargparse for quite a while. Yet, I haven't found a simple method to instantiate a specific DataModule whose parameters are set in a config.yaml that was used during training. I have 2 workarounds which are both unsatisfying:

Workaround 1 - Reuse CLI

Define an instantiate-only CLI and use .datamodule

class InstantiateOnlyLightningCLI(LightningCLI): # probably unnecessary in current RC: run=False
    def fit(self) -> None:
        return None

cli = InstantiateOnlyLightningCLI(
        wave.WaveNet,
        pl.LightningDataModule,
        subclass_mode_model=False,
        subclass_mode_data=True,
    )
cli.datamodule

Downsides

trainer and model are instantiated although not used
--help is populated with many unneeded parameters

Workaround 2 - Load Yaml directly

When the actual datamodule is known then

with open(config, "r") as f:
  plcfg = yaml.safe_load(f.read())
  datamodule = SpecificDataModule(**plcfg["data"]["init_args"])

Downsides

No parameter validation
No class arguments (i.e transformation classes) supported

Does anyone have a better solution?

Answered by mauvilsa

Dec 7, 2021

This is similar to #10363. You can use jsonargparse directly to create a parser and instantiate. You can do the following:

from jsonargparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument('--model', type=dict) # to ignore model
parser.add_argument('--data', type=pl.LightningDataModule)
config = parser.parse_path('config.yaml')
config_init = parser.instantiate_classes(config)

The instantiated data module will be in config_init.data. In the pytorch-lightning source code the add of arguments is done slightly different but this argparse style should be more familiar to more people. Just for reference in lightning for subclass mode it is https://github.com/PyTorchLightnin…

View full answer

tchaton · 2021-12-06T17:04:32Z

tchaton
Dec 6, 2021
Maintainer

@mauvilsa

0 replies

mauvilsa · 2021-12-07T05:44:51Z

mauvilsa
Dec 7, 2021

This is similar to #10363. You can use jsonargparse directly to create a parser and instantiate. You can do the following:

from jsonargparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument('--model', type=dict) # to ignore model
parser.add_argument('--data', type=pl.LightningDataModule)
config = parser.parse_path('config.yaml')
config_init = parser.instantiate_classes(config)

The instantiated data module will be in config_init.data. In the pytorch-lightning source code the add of arguments is done slightly different but this argparse style should be more familiar to more people. Just for reference in lightning for subclass mode it is https://github.com/PyTorchLightning/pytorch-lightning/blob/a7aed2af7a0de344c4a8eac32f9a86a36a3eaeec/pytorch_lightning/utilities/cli.py#L164

5 replies

adosar May 3, 2024

I am trying to load back my custom DataModule but I get the following error:

Subclass types expect one of:
- a class path (str)
- a dict with class_path entry
- a dict without class_path but with init_args entry (class path given previously)

The above exception was the direct cause of the following exception:

mauvilsa May 4, 2024

This is not enough information to know what is wrong with your code.

mauvilsa May 4, 2024

A small note about this answer. Recently load_from_checkpoint support for LightningCLI was implemented. It will be available in the next minor lightning/pytorch-lightning release. Loading a checkpoint is not exactly the same as instantiating only from a config file. But quite likely people arriving here might also want to load the trained weights. So worth mentioning here.

adosar May 4, 2024

@mauvilsa

Recently it load_from_checkpoint support for LightningCLI was implemented.

These are exciting news! Currently, the workaround I am using is the following:

def load_trainer_model_dm(config):
    r"""
    Load back trainer, model and datamodule after training.

    This function assumes that all we need is to perform inference.

    Parameters
    ----------
    config: str
        Path to the configuration file.
    """
    with open(config, 'r') as f:
        config_dict = yaml.safe_load(f)

    # No need for inference.
    config_dict['trainer']['logger'] = False
    del config_dict['seed_everything'], config_dict['ckpt_path']
    
    parser = LightningArgumentParser()
    parser.add_class_arguments(PointNetLit, 'model', fail_untyped=False)
    parser.add_class_arguments(PCDDataModule, 'data', fail_untyped=False)
    parser.add_class_arguments(L.Trainer, 'trainer', fail_untyped=False)
    config = parser.parse_object(config_dict)
    objects = parser.instantiate_classes(config)

    return objects.trainer, objects.model, objects.data

Of course, this only instantiates the model and doesn't load back the trained weights. However, for tasks like inference or performance evaluation we can just use the ckpt_path of the Trainer:

trainer, litmodel, dm = load_trainer_model_dm('experiments/lightning_logs/version_0/config.yaml')
trainer.test(litmodel, dm, ckpt_path='experiments/lightning_logs/version_0/checkpoints/best.ckpt')

mauvilsa May 4, 2024

Forgot to add the link #18105

cheind · 2021-12-07T10:10:00Z

cheind
Dec 7, 2021
Author

@mauvilsa thanks for the reply. In the end, to load from a PyTorch-Lightning config you need to

parser = jsonargparse.ArgumentParser()
parser.add_argument("--model", type=dict)  # to ignore model
parser.add_argument("--trainer", type=dict)  # to ignore trainer
parser.add_argument("--data", type=datasets.MNISTDataModule)
parser.add_argument("--config", action=jsonargparse.ActionConfigFile)
parser.add_argument("--seed_everything", type=Any) # ignore

config = parser.parse_args()
config_init = parser.instantiate_classes(config)

For now :) I believe jsonargparse does not support parse_known_args so that we could specifying all potential additional fields that PL adds to the config?

3 replies

mauvilsa Dec 7, 2021

The potential additional keys depends on the user, e.g. configurable callbacks could have been added. To ignore all the default ones that LightningCLI defines could be:

from jsonargparse import SUPPRESS

for key in ["model", "trainer", "seed_everything", "optimizer", "lr_scheduler"]:
    parser.add_argument(f"--{key}", type=Any, help=SUPPRESS)

It is a design decision that jsonargparse will not support parse_known_args. Take for instance that parse_known_args is used and someone in a config types by mistake the wrong name of a parameter inside data.init_args. The parsing would not fail and it wouldn't be noticed that the default value of that parameters is used. But it is true that there should be in jsonargparse a way to explicitly say which keys to ignore instead of this solution which is kind of a hack.

adosar May 3, 2024

How one is supposed to use this code block in the jupyter notebook? parser.parse_args() requires the arguments provided in the command line.

mauvilsa May 4, 2024

In jsonargparse the parser has multiple methods to parse from: command line arguments, object (dict), environment variables, path to config file, raw config in a string; see ArgumentParser. It all depends on what you want to do. Also note, in a jupyter notebook it is possible to override command line arguments via sys.argv and override environment variables via os.environ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices: CLI and Loading DataModule from config.yaml #10956

{{title}}

Replies: 3 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Best practices: CLI and Loading DataModule from config.yaml #10956

Workaround 1 - Reuse CLI

Workaround 2 - Load Yaml directly

Replies: 3 comments · 8 replies

tchaton Dec 6, 2021 Maintainer

cheind Dec 7, 2021 Author

Replies: 3 comments 8 replies

tchaton
Dec 6, 2021
Maintainer

cheind
Dec 7, 2021
Author