Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argparse usability issues #3720

Closed
mauvilsa opened this issue Sep 29, 2020 · 16 comments · Fixed by #4492
Closed

Argparse usability issues #3720

mauvilsa opened this issue Sep 29, 2020 · 16 comments · Fixed by #4492
Assignees
Labels
discussion In a discussion stage won't fix This will not be worked on

Comments

@mauvilsa
Copy link
Contributor

🚀 Feature

Improve the usability of the argparse functionality by allowing to use extensions of argparse.

Motivation

Currently in pytorch-lightning argparse is used as parser = ArgumentParser(parents=[parent_parser]) which prevents many behaviors that users might want. The most basic example, someone implements a trainer.py tool, thus writes:

from argparse import ArgumentParser
from pytorch_lightning import Trainer
parser = ArgumentParser(description='My cool tool that uses pytorch-lightning')
parser = Trainer.add_argparse_args(parser) 
parser.parse_args()

If you then run trainer.py --help the description of the parser is not shown. This is because the parser that add_argparse_args returns is not the one that the user created. It is a new one that just copied the defined arguments, the rest being lost. Another feature lost is specifying a formatter_class for example doing parser = ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter).

Another consequence of the way argparse is currently used is that it is not possible to use argparse-like parsers. The argparse module is rather limited, so it would be nice to be able to use extensions of argparse. More specifically, I would really like to use jsonargparse with pytorch-lightning. This would for example make it trivial to parse arguments from a json or yaml configuration file. When you have so many options that can be configured such as with the Trainer class, a config file is simply a must. Reinventing the wheel by adding config file support to pytorch-lightning would not make much sense. Much better to use an existing package that already provides this.

I tried changing the pytorch-lightning code from parser = ArgumentParser(parents=[parent_parser]) to parser = parent_parser and use a jsonargparse parser. Unfortunately it failed with an AttributeError: 'bool' object has no attribute 'lower'. Without looking at the source code I imagine that a bool argument support has been added since argparse does not support it. However, jsonargparse already supports defining arguments with type=bool.

Honestly I can't imagine any benefit or good reason for using argparse's parent option here. If there is, please comment.

Pitch

I propose the following:

  • Internally use jsonargparse including its bool support.
  • Add arguments directly to the parser provided to add_argparse_args if the provided parser is a jsonargparse parser.
  • To be backwards compatible, the add_argparse_args would return the parser and use the parent option if the provided parser is not a jsonargparse parser.
  • Change the documentation to recommend using jsonargparse and describing its support for config files.

To pitch further, I would be more than willing to implement and create a pull request for this. I am the developer of jsonargparse and continuously work on machine learning. Also I think pytorch-lightning is an awesome project and plan to use it a lot and hopefully contribute to it.

@mauvilsa mauvilsa added feature Is an improvement or enhancement help wanted Open to be worked on labels Sep 29, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@carmocca
Copy link
Contributor

carmocca commented Sep 29, 2020

Add arguments directly to the parser provided to add_argparse_args

I had the same problem when using an argparse._ArgumentGroup as the input. #3076

To be backwards compatible, the add_argparse_args would return the parser and use the parent option if the provided parser is not a jsonargparse parser.

It should use the parent option if the provided parser is an argparse.ArgumentParser. This way both cases are covered 😄

Now, about using jsonargparse, @nateraw has been working on improving the CLI arguments API. For now, it is available in pytorch-lightning-bolts.

If core chooses to support jsonargparse, I suspect it would be through bolts aswell.

@ananyahjha93
Copy link
Contributor

@nateraw has been working on a few issues with the current argparse setup.

@edenlightning edenlightning modified the milestones: 1.1, 1.0 Sep 29, 2020
@edenlightning edenlightning removed the help wanted Open to be worked on label Oct 5, 2020
@edenlightning edenlightning modified the milestones: 1.0, 1.1 Oct 7, 2020
@mauvilsa
Copy link
Contributor Author

@nateraw please look at this issue and comment. I have looked at what you have done in lightning-bolts. However, with that it still wouldn't be possible to use an extension of argparse that provides more features than the standard argparse.

@nateraw
Copy link
Contributor

nateraw commented Oct 15, 2020

@mauvilsa I'm not sure I'd be comfortable adding a 3rd party package to do this w/o full confidence in its stability, especially now that we're on >1.0.0 ...that being said, I'll dive into your codebase and get your perspective. 😄

I'd be much more comfortable implementing the solution directly into lightning. That way all the tests live in one place.

@mauvilsa
Copy link
Contributor Author

mauvilsa commented Oct 16, 2020

@nateraw nice to have you here now 👍. I am not sure why you think it is better to reimplement things directly into lightning. Modularity and separation of concerns is a basic software engineering principle. pytorch-lightning is not an argument parsing library. It makes much more sense to reuse existing software whose sole purpose is parsing arguments. If you go on and implement an extension of argparse in linghtning, then what would you do when people ask for config file support? Are you going to implement this from scratch and keep people waiting until this is done, well tested and stable? This code will end up not being used because people would simply find alternatives. Another example is that the Trainer class has some complex arguments. Take for instance gpus which can be None or int or str or list[int]. Are you going to implement this also in your extension of argparse? Both of these examples and much more are either already implemented or very easy to implement in jsonargparse.

Regarding stability and testing. In the company I work we heavily use jsonargparse and there are also many other people already using it in their projects despite jsonargparse not yet being popular. Being used in many projects means that it has been tested quite a lot. Not to mention that obviously there are many unit tests. Furthermore, jsonargparse is an extension of argparse, so large part of its stability comes from objective of intending to be mostly an argparse compatible replacement. If lightning uses jsonargparse it should also include unit tests to verify that the correct behavior is obtained, much like how it should be for all the other dependencies that lightning has.

In a few minutes I will add here an example to illustrate to you better what the benefits of using jsonargparse would be.

@mauvilsa
Copy link
Contributor Author

In the master branch of the jsonargparse github repo you can find the latest changes I have been working on. This is needed to test the code below. To install the all extras require is necessary to get all optional dependencies and have all features enabled, i.e. pip install -e .[all] or pip install WHEEL_FILE[all] .

Without integrating jsonargparse into lightning, a way to add the parameters of the Trainer class and parse them would be like in the following train.py code snippet.

#!/usr/bin/env python3

from jsonargparse import ArgumentParser, ActionConfigFile
from pytorch_lightning import Trainer

parser = ArgumentParser(description='My pytorch-lightning-based training tool.')
parser.add_argument('--cfg', action=ActionConfigFile, help='Config file.')
parser.add_argument('--op1', default='val1', help='Tool option 1.')
parser.add_class_arguments(Trainer, 'train', as_group=True)

cfg = parser.parse_args()
print(parser.dump(cfg, skip_none=False))

Some of the features that this already provides are:

  • Arguments of Train are added to a group in a nested namespace.
  • Parsing arguments from config file.
  • Support for almost all arguments of Trainer class including some complex types.
  • The help of the tool includes descriptions (taken from docstrings) and default values.

The following assumes the code above has been saved to a file called train.py and the execution bit has been set. train.py just parses the arguments and prints them, thus an initial config file can be created by running ./train.py > config.yaml. Then you can modify the config and run the tool as ./train.py --cfg config.yaml.

An example of a complex argument is gpus which can be a list of ints. This can be provided for example by running ./train.py --train.gpus '[1, 3]'. This is also supported in the config file.

Be sure you look at the help by running ./train.py --help.

Some of these features are work in progress. But I do plan to make them stable and have good unit tests rather soon. All of this will be included in the release of v3.0.0.

@mauvilsa
Copy link
Contributor Author

mauvilsa commented Oct 16, 2020

@nateraw regarding your proposal in issue #4119 certainly it would be convenient to have something like that to standardize how arguments are added and simplify how to do it. But as you can imagine I would suggest to use jsonargparse. It could be an extension of jsonargparse's ArgumentParser. I would only propose a few minor changes as follows:

from pytorch_lightning.utils import ArgumentParser
from my_models import MyModel
from my_datamodules import MyDM

parser = ArgumentParser()
parser.add_trainer_args('train')
parser.add_datamodule_args(MyDM, 'data')
parser.add_model_args(MyModel, 'model')
cfg = parser.parse_args()

The differences are:

  • Arguments of trainer are explicitly added. I would vote against this being added by default. It can be a common use case that people have one tool to train and another to predict. The predict tool should not include the train arguments.

  • The add_*_args can optionally receive the name of the nested namespace so that people can change things depending on their use case.

  • Internally it would be taken care of that each add_*_args adds arguments to a different group. Also the config file option would be added by default with some standard key, e.g. --cfg.

@carmocca
Copy link
Contributor

Without integrating jsonargparse into lightning

What would this integration look like?

Be sure you look at the help by running ./train.py --help.

It doesn't seem to work (using current master):

Traceback (most recent call last):
  File "train.py", line 8, in <module>
    parser.add_class_arguments(Trainer, 'train', as_group=True)
  File "/home/jsonargparse/jsonargparse.py", line 259, in add_class_arguments
    group.add_argument(arg, **kwargs)
  File "/home/jsonargparse/jsonargparse.py", line 197, in add_argument
    action = super().add_argument(*args, **kwargs)
  File "/usr/lib/python3.8/argparse.py", line 1380, in add_argument
    action = action_class(**kwargs)
  File "/home/jsonargparse/jsonargparse.py", line 1589, in __call__
    if 'help' in kwargs and '%s' in kwargs['help']:
TypeError: argument of type 'NoneType' is not iterable

Some of the features that this already provides are:
* Arguments of Train are added to a group in a nested namespace.
* Parsing arguments from config file.
* Support for almost all arguments of Trainer class including some complex types.
* The help of the tool includes descriptions (taken from docstrings) and default values.

This is sweet! Including the argument descriptions is a pending TODO. Also adding to groups in a nested namespace is exactly what I wanted.

* Parsing arguments from config file.

I know there has been some effort already on integrating Lightning with Hydra to configure projects, so there may be some clashing here (cc @tchaton ?).

The differences are:

  • ...

I agree with these comments.

@mauvilsa What are the disadvantages of just having a jsonargparse API for Lightning in pl-bolts without changing Lightning's code?

@mauvilsa
Copy link
Contributor Author

Without integrating jsonargparse into lightning

What would this integration look like?

Would be very simple, mostly the code in train.py with a wrapper. Something like:

from jsonargparse import ArgumentParser, ActionConfigFile
from pytorch_lightning import Trainer

class LightningArgumentParser(ArgumentParser):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwards)
        self.add_argument('--cfg', action=ActionConfigFile, help='Config file.')

    def add_trainer_args(self, parent='train'):
        self.add_class_arguments(Trainer, parent, as_group=True)

    def add_datamodule_args(...

Be sure you look at the help by running ./train.py --help.

It doesn't seem to work (using current master):

Probably with the code I wrote rather fast this morning I broke it. If you can, try it out with python 3.6 which is the only version I tested up to now. I will investigate what the issue is and let you know.

* Parsing arguments from config file.

I know there has been some effort already on integrating Lightning with Hydra to configure projects, so there may be some clashing here (cc @tchaton ?).

I am not too familiar with Hydra. I guess the scope of that would be bigger than argument parsing and config files. Certainly there could be an overlap here.

@mauvilsa What are the disadvantages of just having a jsonargparse API for Lightning in pl-bolts without changing Lightning's code?

As you can see jsonargparse could be used without integrating in lightning or bolts. The benefit that I would see of integrating it is that you standardize the format of config files, much like the way lightning is standardizing where you place different parts of the code. To me clearly this fits better in lightning than in bolts, since the focus of bolts is models, callbacks, and datasets for research. I am personally not currently interested in bolts since unfortunately I don't have too much time for research. Why should users be forced to install bolts just to use the argparse feature?

What is the issue with changing lightning's code? It wouldn't interfere with anything core and would be big improvement over the current argparse that is already there.

@mauvilsa
Copy link
Contributor Author

@carmocca I now tried the code using python 3.8 and I did not get any issues. What exactly did you do? Was it in a new virtual env? Which version of pytorch-lightning?

@carmocca
Copy link
Contributor

@carmocca I now tried the code using python 3.8 and I did not get any issues. What exactly did you do? Was it in a new virtual env? Which version of pytorch-lightning?

Sorry for the confusion. I was reusing a previous venv which caused the issue. Works with a clean environment!

What is the issue with changing lightning's code?

No issue on my part, just asking to assess the different options 😄

@nateraw
Copy link
Contributor

nateraw commented Oct 23, 2020

I am not sure why you think it is better to reimplement things directly into lightning. Modularity and separation of concerns is a basic software engineering principle. pytorch-lightning is not an argument parsing library.

Lightning is a collection of tools to make machine learning w/ PyTorch easier. I believe its not out of the question for us to come up with a solution for argparsing that solves basic argparse problems related to lightning.

As you can see jsonargparse could be used without integrating in lightning or bolts.

I think jsonargparse should be separate. If a user wants a more feature rich argument parser/config handler, a third party package should be able to handle that by itself.

That being said, I did read through your code. I really like where jsonargparse is headed. 😄 I'm excited to see more examples from you that show all the possibilities of usage with lightning.


Now, as per our discussion above on the API, I wrote a draft PR that updates to the API we discussed. I'm not convinced this is right either (after playing with it)...will be discussing this with the core team on Monday so I can bring more people into this discussion.

As a bonus, I used the same argparser code from the above draft PR in this repo to show a real example of how I'm using it.

@edenlightning edenlightning added discussion In a discussion stage and removed feature Is an improvement or enhancement labels Oct 30, 2020
@edenlightning edenlightning removed this from the 1.1 milestone Oct 30, 2020
@tchaton tchaton added the won't fix This will not be worked on label Nov 10, 2020
@stale stale bot closed this as completed Nov 17, 2020
@pzelasko
Copy link

The issue is closed but it doesn't seem to be resolved - I just tried to use PL with jsonargparse, and it fails at the stage of config reading. What is the officially endorsed way of using configs for argparse together with PL? I looked in the docs but it only mentions argparse integration, nothing about configs, and 3rd party modules don't seem to work together with PL...

@carmocca
Copy link
Contributor

carmocca commented Apr 6, 2021

@pzelasko The jsonargparse integration is now merged to master. You can read about it at https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_cli.html

Note that it is in beta. Feel free to open issues if you encounter any bugs or limitations.

@pzelasko
Copy link

pzelasko commented Apr 6, 2021

Thanks! Will try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion In a discussion stage won't fix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants