Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ImmutableValue Hashable, and add more type safety to fields #10932

Closed

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Oct 9, 2020

Problem

It's (arguably) not clear what the distinction is between compute_value and sanitize_raw_value. After seeing that they do the same thing, and that our ImmutableValue wasn't checking hashability, I thought there might be some room to simplify some of our Field definitions.

Solution

  • Make pants.engine.target.ImmutableValue = typing.Hashable, and make Field a generic type.
    • The hashability is now automatically enforced for the default and value fields!
    • The types of default and value in subclasses are also now verified to be compatible with the type parameter for Field (i.e. it's not possible to override them with the wrong type in a subclass).
    • Also, default is now type-checked to be compatible with value.
  • Rename PrimitiveField.compute_value to sanitize_raw_value, AsyncField.sanitized_raw_value to value, and move both the field and the classmethod into the base Field struct.
    • This allows removing a type: ignore[attr-defined] in the Target class where we deal with just the Field superclass type.
  • Split out the TriBoolField type for where the None state is desired.

@cosmicexplorer cosmicexplorer requested review from stuhood, benjyw, Eric-Arellano and gshuflin and removed request for stuhood October 9, 2020 14:54
@cosmicexplorer cosmicexplorer force-pushed the use-hashable-value branch 4 times, most recently from 5771a6b to 44fc0c4 Compare October 12, 2020 03:03
@coveralls
Copy link

coveralls commented Oct 12, 2020

Coverage Status

Coverage remained the same at 0.0% when pulling 0176f27 on cosmicexplorer:use-hashable-value into b32f1d1 on pantsbuild:master.

@cosmicexplorer cosmicexplorer force-pushed the use-hashable-value branch 3 times, most recently from aefb724 to bb51afe Compare October 12, 2020 07:56
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Danny! Lots of excellent improvements here.

I originally wanted a very strong separation between PrimitiveField vs. AsyncField, but it's become clear from the past few months that that separation was fairly artificial and is not worth maintaining.

Comment on lines 33 to 35
@classmethod
def compute_value(cls, raw_value: Optional[str], *, address: Address) -> str:
value = cast(str, super().compute_value(raw_value, address=address))
def sanitize_raw_value(cls, raw_value: Optional[str], *, address: Address) -> str:
value = cast(str, super().sanitize_raw_value(raw_value, address=address))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using the same method for both types of Fields. I originally wanted to separate them to emphasize how the two APIs are consumed differently, but that's not worth it. It makes it confusing to navigate between both.

I'm not sure which name I prefer, though: compute_value vs. sanitize_raw_value. I'm conflicted because:

  1. I think sanitize_raw_value is more descriptive, but
  2. Several plugin authors are already using compute_value for their target definitions.

We want to minimize churn. But we should also do what's going to be best in the long term; better to have churn now before even more people write plugins.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, great point. I think .compute_value() seems more natural, I was more worried that people might infer it to mean "it's ok to do heavy computation in this method" -- the docstring clarifies that a lot, though. If people are already using it in plugins, we might consider adding @deprecated to .sanitize_raw_value()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with .compute_value(), then I'm not as worried about us needing to deprecate .sanitize_raw_value(). I will be really shocked if people subclassed AsyncField and overrode .sanitize_raw_value(). We don't document how to do that, for example. And it's rare to use an AsyncField.

We're also allowed to make breaking changes without deprecation, so long as we document at https://www.pantsbuild.org/docs/plugin-upgrade-guide, but in good faith, we want to minimize churn as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds great! Will do!

Comment on lines +302 to +334
# This is somewhat similar to scala's path-dependent types:
# https://stackoverflow.com/a/2694607/2518889.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this comment is worth memorializing in source code. While interesting, it might be distracting for people not already familiar with that Scala mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, but I think it is generally a good thing to refer to existing languages or frameworks that use the patterns we've developed from whole cloth here. Would it be acceptable to try to move this to the docsite?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more confusing on the docsite, as this is a really specific implementation detail that doesn't need to be mentioned in the docs. Note that this is a private field.

My recommendation is to remove the comment, or perhaps ask others what they think.

fix invalid type error method

fix fmt

remove repeated .compute_value() method
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very excited for this.

# override the type hints for `PrimitiveField.compute_value()` and
# `AsyncField.sanitize_raw_value()` to indicate that `None` is an invalid type.
# override the type hints for `Field.compute_value()` and
# `AsyncField.compute_value()` to indicate that `None` is an invalid type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave off AsyncField.compute_value(). In this new reorganization, AsyncField is simply a subclass.

@@ -107,7 +107,7 @@ def create(cls, field: Type[Field]) -> "FieldInfo":
# requiring the Field author to rewrite the docstring.
#
# However, if the original `Field` author did not define docstring, then this means we
# would typically fall back to the docstring for `AsyncField`, `PrimitiveField`, or a
# would typically fall back to the docstring for `AsyncField`, `Field`, or a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# would typically fall back to the docstring for `AsyncField`, `Field`, or a
# would typically fall back to the docstring for `Field` or a

type of `value`. These direct subclasses should also likely override `compute_value()` to
perform any custom hydration and/or validation, such as converting unhashable types to hashable
types or checking for banned values. The returned value must be hashable (and should be
immutable) so that this Field may be used by the V2 engine. This means, for example, using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call it the v2 engine in source code. There is now only one engine 😀

Comment on lines +74 to +75
to generate documentation, e.g. for `./pants target-types`. If the field is required, do not use
`Optional` for the type hint of `raw_value`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Optional sentence isn't true. In fact, you should always use Optional to reflect the reality that a user might leave off the field, i.e. to have MyPy ensure you consider this possibility. list_target_types.py will strip off the | None for you.

The raw_value is a bit of a lie overall, as the input source is unvalidated and untrusted. For example, StringField really could be Union[None, str, Any]; there's nothing stopping a user from inputting an int as the initial input, before our validation. But we try to strike a balance of pragmatism to say what we intend for the user to input, so that we can use it to generate meaningful ./pants target-types output.

With Optional, because ./pants target-types already knows how to handle it, we want to be closer to reflecting the actual reality by always using Optional.

Comment on lines +110 to +111
# TODO: enforce that `raw_value` and `value` are *not* Optional if `required = True`, possibly
# via mixin or decorator?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already enforced at runtime by the constructor below checking for required values.

Do you mean to enforce it with MyPy?

Comment on lines 121 to 125
# NB: We neither store the `address` or `raw_value` as attributes on this dataclass:
# * Don't store `raw_value` because it very often is mutable and/or unhashable, which means
# this Field could not be passed around in the engine.
# * Don't store `address` to avoid the cost in memory of storing `Address` on every single
# field encountered by Pants in a run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In retrospect, these comments are probably unnecessary. The docstring for compute_value() already explains the idea of needing to sanitize the raw value.

Maybe it's worth pointing out the address part, but I think it's fine without explanation. Where it's worth explaining is why we do store it for AsyncField.

class AsyncField(Field, metaclass=ABCMeta):
"""A field that needs the engine in order to be hydrated.
class AsyncField(Field[_DefaultBase], metaclass=ABCMeta):
"""A field that needs to extract information from an Address via the engine to be hydrated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite right. You might not need the address at all for hydration, only, it's often useful.

The original description was intentionally vague - why you need the engine for hydration depends on every field. What the hydration looks like also depends on the field. The only thing we can generalize is "you need the engine for some reason to fully get the value you care about".

Comment on lines +236 to +238
# We cheat here by ignoring the @final declaration *and* @frozen_after_init. We don't want to
# generalize this further, because we want to avoid having `Field` subclasses start adding
# arbitrary fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg, I like this approach. Thanks for the great comment.

Comment on lines +302 to +334
# This is somewhat similar to scala's path-dependent types:
# https://stackoverflow.com/a/2694607/2518889.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more confusing on the docsite, as this is a really specific implementation detail that doesn't need to be mentioned in the docs. Note that this is a private field.

My recommendation is to remove the comment, or perhaps ask others what they think.

return value_or_default

class BoolField(TriBoolField, metaclass=ABCMeta):
"""A specialization of TriBoolField which has no None state."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

A field that can be either True or False.

You must set the class property default or set `required = True`. See `TriBoolField` to also allow `None`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the "You must set the class property default or set required = True."

I think that we need to add some code to compute_value() to validate the value is never None. It's not enough to subclass TriBoolField. It would be great to add a small test to target_test.py that tries explicitly setting my_bool_field=None and confirming it errors.

@Eric-Arellano
Copy link
Contributor

Hey Danny, I broke out the minimum set of changes needed to unblock some work on pex_binary and python_awslambda in #11231.

It doesn't include the type hints improvements nor TriBoolField. If you're still interested, I think those are both great changes to make.

Thank you for opening this PR - you are totally right about AsyncField being too complex.

Eric-Arellano added a commit that referenced this pull request Nov 24, 2020
Broken out of #10932.

The difference between `AsyncField` and `PrimitiveField` is fairly artificial and it creates more complexity than we need. Really, the only difference is a) storing the `Address` and b) an expression of intent.

This makes `Field` act like `PrimitiveField`, which is now deleted. `AsyncField` now uses the same naming as `PrimitiveField` used to, i.e.`value` instead of `sanitized_raw_value` and `compute_value()` instead of `sanitize_raw_value()`.

This also cleans up some of the relevant `target_test.py`. We were testing something that's no longer worth it, as it's outside the scope of the code and we have multiple places we test this pattern. We add a new test that `__eq__` and `__hash__` are correct.

[ci skip-rust]
[ci skip-build-wheels]
Base automatically changed from master to main March 19, 2021 19:20
Eric-Arellano added a commit that referenced this pull request May 3, 2021
Cherry-picked from #10932 (credit to @cosmicexplorer!). The vast majority of `BoolField`s are meant to be binary: `True` or `False`. But they were really trinary because they included `None`, which made the type hint harder to work with and made the `./pants help` message include `| None`.

[ci skip-rust]
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor

This has now mostly been implemented (with a tweaked factoring) via #12004 and #11231. The type hint improvements are not yet implemented, and would be a great change if interested in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants