-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make ImmutableValue Hashable, and add more type safety to fields #10932
Conversation
5771a6b
to
44fc0c4
Compare
aefb724
to
bb51afe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Danny! Lots of excellent improvements here.
I originally wanted a very strong separation between PrimitiveField
vs. AsyncField
, but it's become clear from the past few months that that separation was fairly artificial and is not worth maintaining.
@classmethod | ||
def compute_value(cls, raw_value: Optional[str], *, address: Address) -> str: | ||
value = cast(str, super().compute_value(raw_value, address=address)) | ||
def sanitize_raw_value(cls, raw_value: Optional[str], *, address: Address) -> str: | ||
value = cast(str, super().sanitize_raw_value(raw_value, address=address)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with using the same method for both types of Field
s. I originally wanted to separate them to emphasize how the two APIs are consumed differently, but that's not worth it. It makes it confusing to navigate between both.
I'm not sure which name I prefer, though: compute_value
vs. sanitize_raw_value
. I'm conflicted because:
- I think
sanitize_raw_value
is more descriptive, but - Several plugin authors are already using
compute_value
for their target definitions.
We want to minimize churn. But we should also do what's going to be best in the long term; better to have churn now before even more people write plugins.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, great point. I think .compute_value()
seems more natural, I was more worried that people might infer it to mean "it's ok to do heavy computation in this method" -- the docstring clarifies that a lot, though. If people are already using it in plugins, we might consider adding @deprecated
to .sanitize_raw_value()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with .compute_value()
, then I'm not as worried about us needing to deprecate .sanitize_raw_value()
. I will be really shocked if people subclassed AsyncField
and overrode .sanitize_raw_value()
. We don't document how to do that, for example. And it's rare to use an AsyncField
.
We're also allowed to make breaking changes without deprecation, so long as we document at https://www.pantsbuild.org/docs/plugin-upgrade-guide, but in good faith, we want to minimize churn as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds great! Will do!
# This is somewhat similar to scala's path-dependent types: | ||
# https://stackoverflow.com/a/2694607/2518889. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this comment is worth memorializing in source code. While interesting, it might be distracting for people not already familiar with that Scala mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, but I think it is generally a good thing to refer to existing languages or frameworks that use the patterns we've developed from whole cloth here. Would it be acceptable to try to move this to the docsite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more confusing on the docsite, as this is a really specific implementation detail that doesn't need to be mentioned in the docs. Note that this is a private field.
My recommendation is to remove the comment, or perhaps ask others what they think.
bb51afe
to
6027ce2
Compare
fix invalid type error method fix fmt remove repeated .compute_value() method
6e753ba
to
0176f27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very excited for this.
# override the type hints for `PrimitiveField.compute_value()` and | ||
# `AsyncField.sanitize_raw_value()` to indicate that `None` is an invalid type. | ||
# override the type hints for `Field.compute_value()` and | ||
# `AsyncField.compute_value()` to indicate that `None` is an invalid type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave off AsyncField.compute_value()
. In this new reorganization, AsyncField
is simply a subclass.
@@ -107,7 +107,7 @@ def create(cls, field: Type[Field]) -> "FieldInfo": | |||
# requiring the Field author to rewrite the docstring. | |||
# | |||
# However, if the original `Field` author did not define docstring, then this means we | |||
# would typically fall back to the docstring for `AsyncField`, `PrimitiveField`, or a | |||
# would typically fall back to the docstring for `AsyncField`, `Field`, or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# would typically fall back to the docstring for `AsyncField`, `Field`, or a | |
# would typically fall back to the docstring for `Field` or a |
type of `value`. These direct subclasses should also likely override `compute_value()` to | ||
perform any custom hydration and/or validation, such as converting unhashable types to hashable | ||
types or checking for banned values. The returned value must be hashable (and should be | ||
immutable) so that this Field may be used by the V2 engine. This means, for example, using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call it the v2 engine in source code. There is now only one engine 😀
to generate documentation, e.g. for `./pants target-types`. If the field is required, do not use | ||
`Optional` for the type hint of `raw_value`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Optional
sentence isn't true. In fact, you should always use Optional
to reflect the reality that a user might leave off the field, i.e. to have MyPy ensure you consider this possibility. list_target_types.py
will strip off the | None
for you.
The raw_value
is a bit of a lie overall, as the input source is unvalidated and untrusted. For example, StringField
really could be Union[None, str, Any]
; there's nothing stopping a user from inputting an int
as the initial input, before our validation. But we try to strike a balance of pragmatism to say what we intend for the user to input, so that we can use it to generate meaningful ./pants target-types
output.
With Optional
, because ./pants target-types
already knows how to handle it, we want to be closer to reflecting the actual reality by always using Optional
.
# TODO: enforce that `raw_value` and `value` are *not* Optional if `required = True`, possibly | ||
# via mixin or decorator? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already enforced at runtime by the constructor below checking for required values.
Do you mean to enforce it with MyPy?
# NB: We neither store the `address` or `raw_value` as attributes on this dataclass: | ||
# * Don't store `raw_value` because it very often is mutable and/or unhashable, which means | ||
# this Field could not be passed around in the engine. | ||
# * Don't store `address` to avoid the cost in memory of storing `Address` on every single | ||
# field encountered by Pants in a run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In retrospect, these comments are probably unnecessary. The docstring for compute_value()
already explains the idea of needing to sanitize the raw value.
Maybe it's worth pointing out the address
part, but I think it's fine without explanation. Where it's worth explaining is why we do store it for AsyncField
.
class AsyncField(Field, metaclass=ABCMeta): | ||
"""A field that needs the engine in order to be hydrated. | ||
class AsyncField(Field[_DefaultBase], metaclass=ABCMeta): | ||
"""A field that needs to extract information from an Address via the engine to be hydrated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right. You might not need the address at all for hydration, only, it's often useful.
The original description was intentionally vague - why you need the engine for hydration depends on every field. What the hydration looks like also depends on the field. The only thing we can generalize is "you need the engine for some reason to fully get the value you care about".
# We cheat here by ignoring the @final declaration *and* @frozen_after_init. We don't want to | ||
# generalize this further, because we want to avoid having `Field` subclasses start adding | ||
# arbitrary fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg, I like this approach. Thanks for the great comment.
# This is somewhat similar to scala's path-dependent types: | ||
# https://stackoverflow.com/a/2694607/2518889. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more confusing on the docsite, as this is a really specific implementation detail that doesn't need to be mentioned in the docs. Note that this is a private field.
My recommendation is to remove the comment, or perhaps ask others what they think.
return value_or_default | ||
|
||
class BoolField(TriBoolField, metaclass=ABCMeta): | ||
"""A specialization of TriBoolField which has no None state.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
A field that can be either True or False.
You must set the class property default or set `required = True`. See `TriBoolField` to also allow `None`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the "You must set the class property default
or set required = True
."
I think that we need to add some code to compute_value()
to validate the value is never None
. It's not enough to subclass TriBoolField
. It would be great to add a small test to target_test.py
that tries explicitly setting my_bool_field=None
and confirming it errors.
Hey Danny, I broke out the minimum set of changes needed to unblock some work on It doesn't include the type hints improvements nor Thank you for opening this PR - you are totally right about |
Broken out of #10932. The difference between `AsyncField` and `PrimitiveField` is fairly artificial and it creates more complexity than we need. Really, the only difference is a) storing the `Address` and b) an expression of intent. This makes `Field` act like `PrimitiveField`, which is now deleted. `AsyncField` now uses the same naming as `PrimitiveField` used to, i.e.`value` instead of `sanitized_raw_value` and `compute_value()` instead of `sanitize_raw_value()`. This also cleans up some of the relevant `target_test.py`. We were testing something that's no longer worth it, as it's outside the scope of the code and we have multiple places we test this pattern. We add a new test that `__eq__` and `__hash__` are correct. [ci skip-rust] [ci skip-build-wheels]
Cherry-picked from #10932 (credit to @cosmicexplorer!). The vast majority of `BoolField`s are meant to be binary: `True` or `False`. But they were really trinary because they included `None`, which made the type hint harder to work with and made the `./pants help` message include `| None`. [ci skip-rust] [ci skip-build-wheels]
Problem
It's (arguably) not clear what the distinction is between
compute_value
andsanitize_raw_value
. After seeing that they do the same thing, and that ourImmutableValue
wasn't checking hashability, I thought there might be some room to simplify some of ourField
definitions.Solution
pants.engine.target.ImmutableValue = typing.Hashable
, and makeField
a generic type.default
andvalue
fields!default
andvalue
in subclasses are also now verified to be compatible with the type parameter forField
(i.e. it's not possible to override them with the wrong type in a subclass).default
is now type-checked to be compatible withvalue
.PrimitiveField.compute_value
tosanitize_raw_value
,AsyncField.sanitized_raw_value
tovalue
, and move both the field and the classmethod into the baseField
struct.type: ignore[attr-defined]
in theTarget
class where we deal with just theField
superclass type.TriBoolField
type for where theNone
state is desired.