Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for returning TypedDict for dataclasses.asdict #8583

Open
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

syastrov
Copy link
Contributor

There are a couple of limitations that would be nice to remove, but I am hoping this is enough for an initial version.

Subclasses of list/dict/tuple are transformed into the base class rather than a new version of the subclass.
This is because the transformation happens in the get_function_hook plugin hook which only has access to the CheckerPluginInterface and so can't (without hacking) add things to the symbol table.
Besides that, there could be violations of variance/constraints if the new type is just constructed without being checked.

Also, NamedTuples found within dataclasses are transformed into Any.
This is because the new namedtuple needs a partial_fallback generated/added to the symbol table (e.g. same problem as the above).
Supporting generating new NamedTuples properly would probably require refactoring NamedTupleAnalyzer.build_namedtuple_typeinfo to make it reusable.

As far as I know, I was looking for some way of somehow adding new types to the symbol table from CheckerPluginInterface and then hoping that I can trigger a new type-checking pass so that variance/constraints are checked on the newly added types. But I'm sure there's a better way to do it.

Relates to #5152

@syastrov syastrov force-pushed the dataclasses-asdict branch 7 times, most recently from 957420a to 03dfc4a Compare March 26, 2020 12:45
@syastrov
Copy link
Contributor Author

I had previously made a "draft" PR at #8339. This supersedes that.

This should be ready for review now.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! General approach looks reasonable, here I have some comments.

reveal_type(result['staff']) # N: Revealed type is 'Any'

[typing fixtures/typing-full.pyi]
[builtins fixtures/tuple.pyi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add couple fine grained (daemon) test cases. The tricky part is that when type of a dataclass attribute changes, then the call to asdict() should be reprocessed (put it in a different module). I am not 100% sure we already have enough extra dependencies added by the plugin to guarantee this, so such tests will be really useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one. It revealed that there was no dependency on the fallback type for TypedDict. I tried to add that. Since it depends on python version which TypedDict fallback to use (I think?), I had to add a dependency on all modules (e.g. typing/typing_extensions/mypy_extensions) to the default plugin. Doesn't seem very nice, but I'm not sure of the right way to do it. Do you have any ideas?

Do you have any ideas for more fine-grained tests to add?

@syastrov
Copy link
Contributor Author

Thanks for the review @ilevkivskyi. I've addressed your comments.

Also, I'm not sure if it makes sense to make the TypedDicts total or not (I've left them non-total because it's more lenient).

@syastrov
Copy link
Contributor Author

Oops, looks like tests got messed up.. I'll have a look

@syastrov
Copy link
Contributor Author

syastrov commented Apr 2, 2020

I decided to make asdict return TypedDict only for Python >= 3.8, since TypedDict has only become standard since that version. I think it makes sense, but it also solves this problem: If I were to try to include a dependency on mypy_extensions or typing_extensions in get_additional_deps, but they were not installed, then the user would get an error about a missing dependency (besides that, there was an issue with using mypy_extensions that could have meant I would have had to add more fixtures to a lot of unit tests, since it was indirectly using tuple).

I also changed the TypedDict to be total. I figure that most use-cases would not be deleting keys from the TypedDict returned by asdict, so it's useful to know that the keys are there (as they will be to start with). But if the user wants to modify the dict, of course it's possible with a type-ignore or cast to ordinary dict. I hope this makes sense.

Pending your re-review of course, as far as I can see, the only thing missing is some more fine-grained tests (though I'm unsure what makes sense to test there).

Please let me know if there are any other issues :)

@syastrov syastrov requested a review from ilevkivskyi April 2, 2020 19:18
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updates! This looks almost ready, here are few more comments.

def get_additional_deps(self, file: MypyFile) -> List[Tuple[int, str, int]]:
if self.python_version >= (3, 8):
# Add module needed for anonymous TypedDict (used to support dataclasses.asdict)
return [(10, "typing", -1)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed? I think typing module is always loaded. It may be needed to add only dependency on typing_extensions, which is available on all versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems adding that dependency on typing_extensions means a lot of test fixtures need updating (it's the tuple builtin missing, which I believe is transitively imported from typing_extensions).

@syastrov
Copy link
Contributor Author

syastrov commented Apr 8, 2020

Thanks @ilevkivskyi for the re-review.

I tried to address your comments (thanks, I now think I understand how the fine-grained test works). I have no idea if the code copied from SemanticAnalyzer regarding named_type_or_none is logically correct in the TypeChecker context, but I'll take your word for it.

I could not get the new fine-grained test of asdict to pass (Edit: there are no errors, when I expect one, so it seems as though it still thinks the expression is a str, so valid). I tried to debug it a bit and the asdict function hook is called twice. But it's also called twice when I remove the modification to c.py from the test. So it seems like it should be called at least 3 times, and there's some missing dependency that prevents it from being called again? Any tips how to proceed?

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 9, 2020

If something doesn't get reported properly in fine-grained incremental mode, this is often related to missing fine-grained dependencies. We have test cases in test-data/unit/deps.test that deal with these. You can try adding a test case there and somebody from the core team can review the dependencies if it's not clear if they are correct.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more comments here. For the fine grained tests, see what Jukka said.

mypy/checker.py Outdated
@@ -4538,6 +4538,21 @@ def named_type(self, name: str) -> Instance:
any_type = AnyType(TypeOfAny.from_omitted_generics)
return Instance(node, [any_type] * len(node.defn.type_vars))

def named_type_or_none(self, qualified_name: str,
args: Optional[List[Type]] = None) -> Optional[Instance]:
sym = self.lookup_qualified(qualified_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one no better than it was. You need to copy/duplicate the important logic. In particular, my whole idea was to not use local lookup functions, but a global lookup like lookup_fully_qualified_or_none() in semantic analyzer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I misunderstood your comment. Not really knowing how the checker or semantic analyzer work internally, I don't really think I am qualified to make this change. However, I will try.

I have copied the lookup_fully_qualified_or_none function to checker (and only modified it to raise KeyError if the result is None) and used it in named_type_or_none.
I am not sure if this makes sense and whether the docstring/TODO comments should be altered/removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, late-night coding mistake :) I removed the part about raising KeyError, and it returns None now instead if it can't find the name of course.

@syastrov
Copy link
Contributor Author

syastrov commented Jun 3, 2020

Thanks for all your time reviewing -- II'm going to try to pick this up again.

@JukkaL wrote:

If something doesn't get reported properly in fine-grained incremental mode, this is often related to missing fine-grained dependencies. We have test cases in test-data/unit/deps.test that deal with these. You can try adding a test case there and somebody from the core team can review the dependencies if it's not clear if they are correct.

Thanks for the pointer. I am not sure from which module it makes sense to try and output dependencies for.

I tried with this example:

[case testDataclassAsdictDeps]
# flags: --python-version 3.8
from dataclasses import dataclass
from b import AttributeInOtherModule

@dataclass
class MyDataclass:
    attr: AttributeInOtherModule

my_var: MyDataclass

[file b.py]
AttributeInOtherModule = str

[typing fixtures/typing-typeddict.pyi]
[builtins fixtures/fine_grained.pyi]

[out]

Output is:

  <m.MyDataclass> -> <m.my_var>, m, m.MyDataclass (diff)
  <m.my_var> -> m                               (diff)
  <b.AttributeInOtherModule> -> m               (diff)
  <b> -> m                                      (diff)
  <dataclasses.dataclass> -> m                  (diff)
  <dataclasses> -> m                            (diff)

Shouldn't b.AttributeInOtherModule affect m.MyDataclass? Or am I misinterpreting how this should work?

@syastrov
Copy link
Contributor Author

syastrov commented Jun 9, 2020

I made another experiment with outputting deps.

[case testDataclassesAsdictDeps]
# flags: --python-version 3.8
from dataclasses import asdict
from a import my_var
x = my_var
x['attr'] + "foo"

[file a.py]
from dataclasses import dataclass
from b import AttributeInOtherModule

@dataclass
class MyDataclass:
    attr: AttributeInOtherModule

my_var: MyDataclass

[file b.py]
AttributeInOtherModule = str

[typing fixtures/typing-typeddict.pyi]
[builtins fixtures/fine_grained.pyi]

[out]

Output:

Actual:
  main:5: error: Value of type "MyDataclass" is not indexable (diff)
  <m.x> -> m                                    (diff)
  <a.MyDataclass.__getitem__> -> m              (diff)
  <a.MyDataclass> -> <m.x>                      (diff)
  <a.my_var> -> m                               (diff)
  <a> -> m                                      (diff)
  <dataclasses.asdict> -> m                     (diff)
  <dataclasses> -> m                            (diff)

Here, I think it makes sense that <a.MyDataclass> -> <m.x> appears.

But when changing the code to

x = asdict(my_var)

the output is:

  <m.x> -> m                                    (diff)
  <a.my_var> -> m                               (diff)
  <a> -> m                                      (diff)
  <dataclasses.asdict> -> m                     (diff)
  <dataclasses> -> m                            (diff)

After trying to add an add_plugin_dependency method to the checker, I seem to get one of the fine-grained tests to pass, but not the other (FineGrainedCacheSuite is the one passing now).

My guess is that because the dataclass attributes are loaded from the cache, and the failing test is testing incremental checking when there is no cache, which fails of course because the cache is empty. However, I'm not sure what to do about it. Populating the dataclass attribute metadata is done during semantic analysis by the dataclass plugin as part of a class decorator hook. I would like to somehow add a dependency to get that phase to run again. I tried adding a dependency on both the dataclass itself and a wildcard trigger on the module defining the dataclass, but that does not seem to help.

I feel like I am getting very close to getting this working.
Any pointers would be much appreciated if you @ilevkivskyi or @JukkaL have the time.
Thanks :)

@syastrov syastrov requested a review from ilevkivskyi June 9, 2020 20:14
# Conflicts:
#	mypy/plugin.py
#	mypy/plugins/common.py
#	mypy/plugins/dataclasses.py
#	mypy/plugins/default.py
#	mypy/semanal_typeddict.py
#	test-data/unit/check-dataclasses.test
@github-actions

This comment has been minimized.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super happy with the copying of all those lookup related stuff. But this is kind of an existing problem, so I will not block this PR on this issue. If there are no objections from other maintainers, they can merge it.

Thanks for working on this!

@ilevkivskyi
Copy link
Member

Oh, btw, it looks like there are now some merge conflicts that need to be fixed (again).

@97littleleaf11 97littleleaf11 self-requested a review November 15, 2021 10:41
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@97littleleaf11
Copy link
Collaborator

97littleleaf11 commented Jan 19, 2022

I fixed the test errors caused by lookup functions, which are refactored several month ago. However the mypy primer results is confusing. I just skip the unfound sym node as a workaround.

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

poetry (https://github.com/python-poetry/poetry)
+ src/poetry/config/source.py:15: error: Incompatible return value type (got "TypedDict({'name': str, 'url': str, 'default': bool, 'secondary': bool})", expected "Dict[str, Union[str, bool]]")

rotki (https://github.com/rotki/rotki)
+ rotkehlchen/chain/ethereum/structures.py:71: error: Incompatible return value type (got "TypedDict({'event_type': Union[Literal['deposit'], Literal['withdrawal'], Literal['interest'], Literal['borrow'], Literal['repay'], Literal['liquidation']], 'block_number': int, 'timestamp': Timestamp, 'tx_hash': str, 'log_index': int})", expected "Dict[str, Any]")

zulip (https://github.com/zulip/zulip)
+ zerver/tornado/event_queue.py:976: error: TypedDict key "user_id" cannot be deleted  [misc]
+ zerver/tornado/event_queue.py:977: error: "mentioned_user_group_id" is not a valid TypedDict key; expected one of ("user_id", "online_push_enabled", "pm_email_notify", "pm_push_notify", "mention_email_notify", ...)  [misc]

core (https://github.com/home-assistant/core)
+ homeassistant/config_entries.py:1406: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({'host': str, 'port': Optional[int], 'hostname': str, 'type': str, 'name': str, 'properties': Dict[str, Any]})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/config_entries.py:1412: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({'topic': str, 'payload': Union[str, bytes], 'qos': int, 'retain': bool, 'subscribed_topic': str, 'timestamp': datetime})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/config_entries.py:1418: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/config_entries.py:1424: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({'host': str, 'port': Optional[int], 'hostname': str, 'type': str, 'name': str, 'properties': Dict[str, Any]})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/config_entries.py:1430: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({'ip': str, 'hostname': str, 'macaddress': str})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/config_entries.py:1436: error: Argument 1 to "async_step_discovery" of "ConfigFlow" has incompatible type "TypedDict({'device': str, 'vid': str, 'pid': str, 'serial_number': Optional[str], 'manufacturer': Optional[str], 'description': Optional[str]})"; expected "Dict[str, Any]"  [arg-type]
+ homeassistant/components/recorder/statistics.py:177: error: Incompatible return value type (got "TypedDict({'type': str, 'data': Optional[Dict[str, Optional[str]]]})", expected "Dict[Any, Any]")  [return-value]
+ homeassistant/components/energy/validate.py:70: error: Incompatible return value type (got "TypedDict({'energy_sources': List[List[TypedDict({'type': str, 'identifier': str, 'value': Optional[Any]})]], 'device_consumption': List[List[TypedDict({'type': str, 'identifier': str, 'value': Optional[Any]})]]})", expected "Dict[Any, Any]")  [return-value]

@97littleleaf11
Copy link
Collaborator

The incompatible cases with TypedDict looks like false-positive.

@syastrov
Copy link
Contributor Author

@ilevkivskyi Thank you for the review 👍
@97littleleaf11 Thanks so much for bringing this up to date/fixing issues :)

The stack trace in the mypy primer result mentions the key upnp which seems to be used in the class _UpnpServiceDescription which is inherited from by another dataclass (SsdpServiceInfo) along with other classes. Perhaps cause is that the (multiple) inheritance is not handled correctly? (https://github.com/home-assistant/core/blob/67c35652f0d49dc56e1e934fab3b2c51f5f82592/homeassistant/components/ssdp/__init__.py#L94)?
Do you think making a proper fix for this is a blocker for this PR to be merged?

Regarding false positives: Despite providing more strictness, this PR introduces some false-positives to existing code, which either uses an overly-precise type Dict or mutates the resulting dict. In the case of code which mutates the resulting dict, you could cast to dict or MutableMapping as a workaround, or rework the code to construct a new dict based on the result from asdict (optionally, using a separately defined TypedDict for more precision). It's arguably very few false positives, though.

@97littleleaf11
Copy link
Collaborator

97littleleaf11 commented Jan 20, 2022

Perhaps cause is that the (multiple) inheritance is not handled correctly?

Could you please add a minimal reproduce if possible?

Do you see this as an issue? Is it enough to warrant making a flag for asdict returning TypedDict, disabled by default?

cast to dict or MutableMapping as a workaround

Yeah it's a typical solution. TypedDict is designed to be not comparible with Dict[str, Any] in mypy. My concern is that asdict currently is already typed as Dict[str, Any] in typeshed as well as other real-world projects. cc @JukkaL

@zerlok
Copy link

zerlok commented Sep 4, 2024

I think it's a good feature for transforming one dataclass to another with simple line DataclassTwo(**asdict(DataclassOne(...))) that still support typing.

P.S. In my case, I would like to tell DataclassTwo constructor to ignore unknown fields from DataclassOne. I'll try to make another feature request for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants