-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow typechecking on nested TypedDict with union members #17231
Comments
Ran some more tests with a larger set of types, and it looks like the issue might be memory related. I am seeing python max on memory on my system, causing heavy swapping, while the process sits at 100% CPU, probably GCing constantly. |
Any idea what the root cause could be or how we could workaround it, or even help contribute a fix? We'd like to improve Pulumi's Python SDKs by supporting TypedDict, but this performance issue means we'd have to workaround it for Mypy users, likely by conditionally typing these as untyped dictionaries for Mypy, which is rather unfortunate. if not MYPY:
class DeploymentArgsDict(TypedDict):
api_version: NotRequired[Input[str]]
kind: NotRequired[Input[str]]
metadata: NotRequired[Input['ObjectMetaArgsDict']]
...
else:
DeploymentArgsDict: TypeAlias = Mapping[str, Any] |
Because of the number of
|
Does mypy internally expand all of these TypedDict definitions? If so, I'm curious why. Pyright internally builds one object for each class. There are only 438 of them in the code sample, which isn't that many. Each internal object refers to the other objects as needed. It doesn't do any expansion. |
To be clear here, I didn't check how Mypy internally represents such types. In PyCharm, we represent TypedDicts as |
There seem to be some easy improvements we can make to speed up the handling of nested TypedDicts. I don't think there's any deep reason why they'd have to be this slow. I'll look into this -- if it's easy enough, the next mypy release (to be out in a week or two) could include some optimizations. |
#17842 fixes some bottlenecks. |
If TypedDict A has multiple items that refer to TypedDict B, don't duplicate the types representing B during type expansion (or generally when translating types). If TypedDicts are deeply nested, this could result in lot of redundant type objects. Example where this could matter (assume B is a big TypedDict): ``` class B(TypedDict): ... class A(TypedDict): a: B b: B c: B ... z: B ``` Also deduplicate large unions. It's common to have aliases that are defined as large unions, and again we want to avoid duplicating these unions. This may help with #17231, but this fix may not be sufficient.
Bug Report
For Pulumi we are looking into generating types using TypedDict to model cloud APIs. For example for Kubernetes we have something representing a Deployment.
Pulumi has a notion of inputs and outputs, and the
Input
type used in the above example looks like this:Output does a lot things, but for the purposes of this repro all that matters is that its a generic type.
The K8S types can nest pretty deeply, and I suspect a combination of having nested literals along with the
Union
via theInput
type is causing slowness here.Example:
If I drop
Awaitable[T]
from the union to reduce it to two members, typechecking completes in 2 seconds. With it present, it takes 40 seconds.This is a simplified example, and the actual code has another union layered on top. In that case we run out of memory.
To Reproduce
I have created a repro here https://github.com/julienp/typeddict-performance
Expected Behavior
It takes a second or two to typecheck.
Actual Behavior
It takes ~40 seconds on my machine
Your Environment
mypy.ini
(and other config files): noneThe text was updated successfully, but these errors were encountered: