Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Implicit instance attributes #15811

Merged
merged 11 commits into from
Feb 3, 2025

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Jan 29, 2025

Summary

Add support for implicitly-defined instance attributes, i.e. support type inference for cases like this:

class C:
    def __init__(self) -> None:
        self.x: int = 1
        self.y = None

reveal_type(C().x)  # int
reveal_type(C().y)  # Unknown | None

A lot of things have been intentionally left out of this PR:

  • Type inference for self
  • Support attributes which are implicitly "declared" via their parameter types (self.x = param)
  • Tuple-unpacking attribute assignments: self.x, self.y = …
  • Diagnostic for conflicting declared types
  • Attributes in statically-known-to-be-false branches are still visible:
    class C:
        def __init__(self) -> None:
            if 2 + 3 < 4:
                self.x: str = "a"
    
    # # TODO: Ideally, this would result in a `unresolved-attribute` error. But mypy and pyright
    # do not support this either (for conditions that can only be resolved to `False` in type
    # inference), so it does not seem to be particularly important.
    reveal_type(C().x)  # revealed: str

Part of: #14164

Benchmarks

Codspeed reports no change in a cold-cache benchmark, and a -1% regression in the incremental benchmark. On black's src folder, I don't see a statistically significant difference between the branches (Reminder to self: always set the CPU frequency scaling governor to performance when benchmarking on a laptop):

Command Mean [ms] Min [ms] Max [ms] Relative
./red_knot_main check --project /home/shark/black/src 133.7 ± 9.5 126.7 164.7 1.01 ± 0.08
./red_knot_feature check --project /home/shark/black/src 132.2 ± 5.1 118.1 140.9 1.00

Test Plan

Updated and new Markdown tests

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Jan 29, 2025
@sharkdp sharkdp force-pushed the david/implicit-instance-attributes branch 5 times, most recently from 49bec2f to 362eb8a Compare January 31, 2025 13:27
@sharkdp sharkdp marked this pull request as ready for review January 31, 2025 13:28
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review not complete yet, just submitting partway through

@@ -646,6 +813,90 @@ reveal_type(b"foo".join) # revealed: @Todo(bound method)
reveal_type(b"foo".endswith) # revealed: @Todo(bound method)
```

## Instance attribute edge cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about assignments to attributes of the first argument of a classmethod or staticmethod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not (properly) supported yet. We do have one pre-existing test for attribute assignments in @classmethods in this file, but we do not yet find these implicitly-declared attributes when calling .member on the class object. With this PR, we're potentially in a weird state where we find those attributes on instances, but not on the class object. If that's acceptable, I'll open an issue and add that to my goals for next week. Otherwise, I can also include it in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the functionality scope of this PR needs to be expanded, this can definitely be a follow-up. I was more just thinking that some TODO tests around it would clarify the intention and fit in well with the existing tests in this PR, which are already a mix of "complete" and "TODO". But there's certainly no need for that, the tests can come with the feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll note this down as a task for a follow-up.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

/// Tries to find declarations/bindings of an instance attribute named `name` that are only
/// "implicitly" defined in a method of the class that corresponds to `class_body_scope`.
fn implicit_instance_attribute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess at this point we do attribute lookup fully lazily, and without any Salsa caching. I wonder if there's enough work repeated here to make some Salsa caching of attribute types worth it? But we can evaluate this separately.

Copy link
Member

@MichaReiser MichaReiser Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have salsa caching here to avoid invalidating types from module a if the class is defined in module b. Or is it guaranteed that this method is only ever called from within the same module (or if it is called across-modules, that it goes through a salsa query?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need some help answering these questions.

Or is it guaranteed that this method is only ever called from within the same module

I don't think so.

or if it is called across-modules, that it goes through a salsa query?

This function is only called from Class::instance_member (via own_instance_member), which in turn can be called from:

  • Type::member
  • TypeInferenceBuilder::infer_attribute_expression

The former (Type::member) is called from all sorts of different places.

Copy link
Member

@MichaReiser MichaReiser Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Yeah, in that case, I think implicit_instance_attribute or own_instance_member or member should be a salsa query because: semantic_index changes whenever the file where the class is declared changes, and nothing is constraining member calls to be from the same file.

An alternative is to introduce a attribute_assignments(db, class_body_scope) query that returns the assignments just for that scope. That might be cheaper but it still has the benefit that implicit_instance_attribute only gets invalidated when the attribute assignments in that specific scope changed.

We use a similar pattern for use_def_map and symbol_table where those queries return a specific "slice" of the SemanticIndex which is less likely to change.

Let me know if that helps or if you need some more input (also happy to discuss on Discord/sync)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers: I implemented something that I discussed with Micha.

Removing salsa::tracked from either attribute_assignments or the inline infer_expression_type query makes the newly added test fail.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, finished review, only had one more comment.

This is, as usual, awesome work.

crates/red_knot_python_semantic/src/types.rs Show resolved Hide resolved

/// Tries to find declarations/bindings of an instance attribute named `name` that are only
/// "implicitly" defined in a method of the class that corresponds to `class_body_scope`.
fn implicit_instance_attribute(
Copy link
Member

@MichaReiser MichaReiser Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to have salsa caching here to avoid invalidating types from module a if the class is defined in module b. Or is it guaranteed that this method is only ever called from within the same module (or if it is called across-modules, that it goes through a salsa query?)

crates/red_knot_python_semantic/src/semantic_index.rs Outdated Show resolved Hide resolved
@sharkdp sharkdp force-pushed the david/implicit-instance-attributes branch from 128679e to 4ed94c6 Compare February 3, 2025 11:04
Comment on lines +4150 to +4158
// We use a separate salsa query here to prevent unrelated changes in the AST of an external
// file from triggering re-evaluations of downstream queries.
// See the `dependency_implicit_instance_attribute` test for more information.
#[salsa::tracked]
fn infer_expression_type<'db>(db: &'db dyn Db, expression: Expression<'db>) -> Type<'db> {
let inference = infer_expression_types(db, expression);
let expr_scope = expression.scope(db);
inference.expression_type(expression.node_ref(db).scoped_expression_id(db, expr_scope))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would be useful to have as a general query and if it should be defined next to infer_expression_types instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar thought. It could certainly be used in more places, but I wasn't sure if we want the additional query in those places? I'll note it down as a task and open a small PR after this has been merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is intuitive to me why attribute_assignments should be a query, but it's less intuitive to me why this should be a query. I don't see what additional dependencies are introduced here above those that the underlying infer_expression_types query would have anyway. So I guess this is not about dependencies, but rather about a smaller returned value so backdating can be more effective?

It feels intuitively to me that we would get more caching value with fewer cached memos if we placed the salsa query caching at the level of Class::own_instance_member or Type::member, rather than at such a fine-grained level that does such little work over the underlying infer_expression_types. But we should of course validate any such changes with observed performance differences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the expression.node_ref. It accesses the AST node from the module where the class is declared, meaning the enclosing query has to re-run whenever the declaring module changes -- which we don't want.

Doing it here is mainly about avoiding calling a query (and caching the value which is expensive) in cases where we don't need to. Although I admit, I don't have any numbers on whether caching here is a significantly lower number than caching at the own_instance_member level.

@sharkdp sharkdp merged commit 102c2ee into main Feb 3, 2025
21 checks passed
@sharkdp sharkdp deleted the david/implicit-instance-attributes branch February 3, 2025 18:34
dcreager added a commit that referenced this pull request Feb 4, 2025
* main: (66 commits)
  [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861)
  [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862)
  Simplify the `StringFlags` trait (#15944)
  [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889)
  Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882)
  [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938)
  [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933)
  Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935)
  Update black deviations (#15928)
  [red-knot] MDTest: Fix line numbers in error messages (#15932)
  Preserve triple quotes and prefixes for strings (#15818)
  [red-knot] Hand-written MDTest parser (#15926)
  [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762)
  nit: docs for ignore & select (#15883)
  [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922)
  [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799)
  [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890)
  [red-knot] Internal refactoring of visibility constraints API (#15913)
  [red-knot] Implicit instance attributes (#15811)
  [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants