Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dataclass inheritance #287

Open
gautiervarjo opened this issue May 11, 2023 · 9 comments
Open

Support dataclass inheritance #287

gautiervarjo opened this issue May 11, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@gautiervarjo
Copy link

gautiervarjo commented May 11, 2023

🚀 Feature request

The documentation describes how dataclass-like classes are "not expected to have subclasses", and therefore do not accept specifying a class_path. However dataclasses.dataclass does support inheritance: python docs

Would it be feasible to add support for this? I can't say whether this makes sense for all dataclass-like objects, or just the plain python ones.

My use-case for dataclass+inheritance is pretty basic, my main reason for using dataclasses is reducing boilerplate of writing __init__/__str__/etc functions.

To reproduce

A working example with plain classes

import dataclasses
import jsonargparse

class Base:
    def __init__(self, name: str) -> None:
        self.name = name

class Derived(Base):
    def __init__(self) -> None:
        super().__init__(name="derived plain")

parser = jsonargparse.ArgumentParser()
parser.add_subclass_arguments(Base, "thing")
print(parser.parse_args(["--thing.class_path=Derived"]))

A failing example with dataclasses

import dataclasses
import jsonargparse

@dataclasses.dataclass
class DataclassBase:
    name: str

@dataclasses.dataclass
class DataclassDerived(DataclassBase):
    def __init__(self) -> None:
        super().__init__(name="derived dataclass")

# Does not work, jsonargparse assumes no inheritance for dataclass-like types.
parser = jsonargparse.ArgumentParser()

parser.add_subclass_arguments(DataclassBase, "thing")
# ^ this raises: "ValueError: Not allowed for dataclass-like classes."

print(parser.parse_args(["--thing.class_path=DataclassDerived"]))

A failing example with CLI + config

# Config
thing:
  class_path: Derived
# ... snip ...: same definition of DataclassBase and DataclassDerived.

def main(thing: DataclassBase) -> None:
    pass

jsonargparse.CLI(main)
# Passing the config above to this CLI produces:
# usage: parser.py [-h] [--config CONFIG] [--print_config[=flags]] [--thing CONFIG] --thing.name NAME
# parser.py: error: Configuration check failed :: Key "thing.name" is required but not included in config object or its value is None.

Expected behavior

The failing dataclass snippet should behave exactly like the working plain-class one.

Environment

  • jsonargparse version: 4.21.1
  • Python version: 3.8
  • How jsonargparse was installed: installed via Pants/PEX, using the requirement line jsonargparse[signatures]==4.21.1.
  • OS: Linux
@gautiervarjo gautiervarjo added the bug Something isn't working label May 11, 2023
@mauvilsa mauvilsa added enhancement New feature or request and removed bug Something isn't working labels May 13, 2023
@mauvilsa
Copy link
Member

@gautiervarjo thank you for the proposal!

I do have in my list of things to improve, a way for the developer to specify which classes to consider or not as subclasses or as dataclass-like. Though, I haven't decided yet how this will be.

A couple of minor notes until this kind of support is implemented.

  1. A use case could be to accept only a limited number of dataclasses, instead of full subclass support. In this case it can be done as:

    parser.add_argument("--thing", type=Union[DataclassBase, DataclassDerived])

    or with CLI as:

    def main(thing: Union[DataclassBase, DataclassDerived]) -> None:
  1. A class that is a mixture of a dataclass and a normal class is not considered dataclass-like. Thus, the subclass behavior can be enabled by doing:

    class NormalClassBase:
        pass
    
    @dataclasses.dataclass
    class DataclassBase(NormalClassBase):
        name: str

@gautiervarjo
Copy link
Author

Thank you for the tips! These seem very low-effort to implement so they're definitely good enough for my use-case 🙂

@CM-BF
Copy link

CM-BF commented Jan 29, 2024

@mauvilsa Thank you for your workarounds! I would like to share one more user case: using Equinox in JAX, since every equinox Module is a dataclass.

@mauvilsa
Copy link
Member

mauvilsa commented Jun 9, 2024

This feature request also applies to pydantic classes as mentioned in #503.

@rusmux
Copy link

rusmux commented Jun 10, 2024

@mauvilsa Are there going to be any updates in the near future? I'm particularly interested in supporting pydantic.BaseModel, as above workarounds do not work for it

@mauvilsa
Copy link
Member

@rusmux thanks for letting me know about your interest. I do consider interest to decide on which new features to work on. Though I can't really say if soon I would work on this one since it is not simple.

@perepichka
Copy link

perepichka commented Oct 22, 2024

A similar use-case, which used to work as of version 4.20.0, was hierarchical Optional dataclasses:

With the following setup:

from dataclasses import dataclass

@dataclass
class B:
    param_1: int = 10
    param_2: str = "b"

@dataclass
class Bb(B):
    pass

and

import jsonargparse

from hierarchy_test import B
from typing import Optional


def test(
    b: Optional[B],
):
    print("success")


if __name__ == "__main__":
    jsonargparse.CLI(test)

with the following YAML:

b:
  class_path: hierarchy_test.Bb
  init_args:
    param_1: 1
    param_2: "dd"

This runs with no error on 4.20.0, and fails on 4.33.2

This issue has been preventing me from upgrading to a newer Pytorch Lightning. I've got a quasi-solution by explicitly defining a Union of types as suggested by @mauvilsa . In my production env this involves dynamically creating a Union type based on all subclasses found through a search of a section of the codebase, not the most elegant thing.

@mauvilsa
Copy link
Member

An update. I have been thinking about this for some time and I want to resolve it hopefully soon. My current idea is to remove the distinction between subclasses and dataclass-like. But do so without any breaking changes, otherwise this would need a major release which I don't see happening soon. No breaking changes would mean that accessing values in a parsed Namespace should still work like currently for both cases, so for dataclasses namespace.key should work, and for subclasses namespace.init_args.key should also work. Furthermore, when serializing dataclasses which are the exact class of the type (i.e. not subclasses of the base class) should serialize without init_args.

Clearly this behavior is not ideal, but would be the default in order to avoid breaking changes. And there would be a setting to change this default behavior when serializing. Not sure what it should be. Probably no difference between dataclasses and subclasses. Could be that if the class_path is the same as the base class, assuming the type is a single class and not a union, then no class_path and init_args is included. Or could be always have class_path and init_args.

It is not easy to describe exactly what I have in mind and all the implications it could have. Nonetheless, if you observe any potential issues with this initial idea, please speak out.

@mauvilsa
Copy link
Member

mauvilsa commented Nov 8, 2024

I have been working on this and it seems it will not be possible without breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants