Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Quantity types and update formulas to produce typed Samples #422

Merged
merged 15 commits into from
Jun 20, 2023

Conversation

shsms
Copy link
Contributor

@shsms shsms commented Jun 5, 2023

No description provided.

@github-actions github-actions bot added part:actor Affects an actor ot the actors utilities (decorator, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests labels Jun 5, 2023
@shsms shsms added this to the v0.22.0 milestone Jun 6, 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments but this is amazing! Great work! 🎉

src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Outdated Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Outdated Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
@github-actions github-actions bot added the part:tooling Affects the development tooling (CI, deployment, dependency management, etc.) label Jun 7, 2023
@shsms shsms force-pushed the type-wrappers branch 3 times, most recently from 015b3ba to 15f4900 Compare June 12, 2023 09:00
@shsms shsms marked this pull request as ready for review June 12, 2023 12:40
@shsms shsms requested a review from a team as a code owner June 12, 2023 12:40
@shsms shsms requested review from llucax and leandro-lucarella-frequenz and removed request for leandro-lucarella-frequenz June 12, 2023 12:40
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again quite a few comments, but many are things to consider in the future, not something to address in this PR as it is already huge and there is much more value in having it merged soon than in discussing improvements for ages :)

I'm going to make a summary with the main points I see pending:

  • as_xxx() as properties: I looked at https://google.github.io/styleguide/pyguide.html#213-properties and even when it is vague, it always talks about attributes, and I agree as_xxx is not really an attribute, so I agree to leave it as a function.
  • default exponent_unit_map provided by Quantity: I still think this is the best option, but if it is controversial, we can leave it for a future PR, specially because it doesn't affect users API.
  • About moving operations (like __add__) to Quantity: Same as above.
  • About providing also __mul__ and __truediv__ with other: float: Same as above.
  • About making Quantity an ABC (and the different alternatives to do this): Also too big of a change for this PR and it doesn't affect the API, so we can do it separately if it makes sense.
  • Add comments about type: ignore: in this PR.
  • Disallow the default constructor for types with concrete units (Power(10)): I would address in this PR, as it shouldn't be a big change and affects the API. But could be left for another PR if it turns out not to be that trivial (or controversial)
  • Using hypothesis: optional but in my experience it helped catching some corner cases. It took me a non-trivial amount of effort to do it right, this is why I wouldn't block the PR on this and has the optional status.
  • Forbidding .base_value for types with concrete units: ideally in this PR but it might depend on what we decide for Quantity as ABC, so it is also fine to leave to address in the future. Maybe in this PR we can add a note to the doc saying that this should not be used for concrete types.
  • Exporting/importing TypeVars: we can create a discussion, but in this case in particular I think it shouldn't be exported in timeseries because is not something that users will need often, and if they need to write a generic function or class they can declare the TypeVar themselves.
  • Internal imports and use concrete unit types in examples: fix in this PR.
  • Moving window and feature extractor as generic types: fix in this PR (if appropriate).
  • Clarify __format__ behavior in the docstring: this PR.

For the stuff that you agree we should leave for another PR, can you either create new issues/discussions gathering the info in the relevant comments and then resolve them with a link to the new issue/discussion, or just add a comment saying you agree to leave for another PR and I'll do the issue/discussion creation and resolving.
*

assert f"{Fz2(1.024445, exponent=-12)}" == "0 Hz"


def test_comparison() -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this might be a good candidate to do some property testing using hypothesis.

@@ -34,7 +35,7 @@ async def init_feature_extractor(period: int) -> PeriodicFeatureExtractor:
timedelta(seconds=1), lm_chan.new_receiver(), timedelta(seconds=1)
)

await lm_chan.new_sender().send(Sample(datetime.now(tz=timezone.utc), 0))
await lm_chan.new_sender().send(Sample(datetime.now(tz=timezone.utc), Quantity(0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this could be explored in a followup PR (if we believe it might work and be better, I don't think it is worth to address it now).

@@ -126,7 +127,9 @@ async def run(self) -> None:
if (datetime.now(timezone.utc) - time_data).total_seconds() > 30:
_logger.error("Active power data are stale")
continue
queue.put_nowait(active_power.value)
queue.put_nowait(
None if not active_power.value else active_power.value.base_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I think it might be better to leave base_value as private too, in code like this it is not clear at all what base_value is, is it Watt? Kilowatt? active_power.value.as_watts() is much more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base_value is really just for the generic cases described here: #422 (comment)

And for the dimensionless cases like reading from the resampler directly.

I guess the power_distribution example is no longer relevant anyway, because people should use the battery pool instead. I have an in-progress branch for that, so won't forget. But will do it after this one is merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I see about it is it can be used in sub-classes, completely by-passing the unit checks. This is why it would be nice (if we don't go with #450) to find some way to make it available only in the base unit-less Quantity.

pyproject.toml Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the moving window, shouldn't this be parametrized on a QuantityT instead of using a unit-less Quantity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out MovingWindow etc can't propagate types because the most likely output types for these classes is a numpy array. So it doesn't have to be generic, just needs to accept any thing that gets put in here. And using the base Quantity type achieves that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert power.as_megawatts() == 0.0012
assert power == Power.from_milliwatts(1200000.0)
assert power == Power.from_megawatts(0.0012)
assert power != Power.from_watts(1000.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring more about the case where the type system sees both types as Any, but I'm not sure if it's relevant anymore, because the type system is not rejecting Fz1(1.024445, exponent=0) < Fz2(1.024445, exponent=3) anyways.

But still testing that operations fails between different unit it is still probably worth testing.

src/frequenz/sdk/timeseries/_quantities.py Show resolved Hide resolved
shsms added 15 commits June 19, 2023 12:32
This check is redundant because it is a type-check that's already
taken care of by mypy.

Signed-off-by: Sahas Subramanian <[email protected]>
Signed-off-by: Sahas Subramanian <[email protected]>
This makes the transition to a generic `Sample` type easier, which is
done in a subsequent commit.

This commit also introduces the `output_type` parameter to formula
builders and formula engines.  The new parameter is used to determine
which type of quantity instance a formula engine would produce.

Signed-off-by: Sahas Subramanian <[email protected]>
This commit also updates the generated formulas to use the right
generic types like Power and Current, for power and current formulas
respectively.

Signed-off-by: Sahas Subramanian <[email protected]>
The previous design used a single dictionary `_engines` to store
engines of different types.  This is fixed now.

Signed-off-by: Sahas Subramanian <[email protected]>
@shsms shsms requested a review from llucax June 19, 2023 14:09
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I created a couple of discussions:

Feel free to close all the discussions where I point to either of those 2 discussions.

The remaining points are all minor and non-blockers to me, so I will approve. If you decide not to go with the disabled __init__, please create a follow-up issue so we don't forget.

assert power.as_megawatts() == 0.0012
assert power == Power.from_milliwatts(1200000.0)
assert power == Power.from_megawatts(0.0012)
assert power != Power.from_watts(1000.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK. Can't you just use a function that takes Any as arguments instead to test this and circumvent the type system to check the exception is raised? I think it makes it more clear what are you testing for, because using adding random # type: ignore shouldn't be something we really care about. But having the type relaxed to be Any is a real world case. Maybe you can even leave your example as is and just type-hint variables with any:

 fz1: Any = Fz1(1.024445, exponent=0)
 fz2: Any = Fz2(1.024445, exponent=3)
 with pytest.raises(TypeError) as excinfo: 
     assert fz1 < fz2
 assert ( 
     excinfo.value.args[0] 
     == "'<' not supported between instances of 'Fz1' and 'Fz2'" 
 ) 

@@ -309,6 +310,55 @@ def __sub__(self, other: Self) -> Self:
"""
return self.from_watts(self.as_watts() - other.as_watts())

def __mul__(self, duration: timedelta) -> "Energy":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. This is all I mean:

    def __truediv__(self, other: Current | Voltage | float) -> Voltage | Current | Power:
        if isinstance(other, Current):
            return Voltage(self._base_value / other._base_value, exponent=0)
        if isinstance(other, Voltage):
            return Current(self._base_value / other._base_value, exponent=0)
        if isinstance(other, numbers.Real):  # Only 1 extra `isinstance()` for all reals (`int` and `float` and whatever user-defined type that inherits from `numbers.Real`)
            return type(self)(self._base_value / other, exponent=0)
        raise TypeError(
            f"unsupported operand type(s) for /: '{type(self)}' and '{type(other)}'"
        )
>>> import numbers
>>> isinstance(1, numbers.Real)
True
>>> isinstance(1.0, numbers.Real)
True

@@ -34,7 +35,7 @@ async def init_feature_extractor(period: int) -> PeriodicFeatureExtractor:
timedelta(seconds=1), lm_chan.new_receiver(), timedelta(seconds=1)
)

await lm_chan.new_sender().send(Sample(datetime.now(tz=timezone.utc), 0))
await lm_chan.new_sender().send(Sample(datetime.now(tz=timezone.utc), Quantity(0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why you need the unit-less Quantity type, if you need it as a step to pass it to numpy or something like that, then you can decompose into a float already, right? But maybe we can discuss in a follow up discussion:

@@ -126,7 +127,9 @@ async def run(self) -> None:
if (datetime.now(timezone.utc) - time_data).total_seconds() > 30:
_logger.error("Active power data are stale")
continue
queue.put_nowait(active_power.value)
queue.put_nowait(
None if not active_power.value else active_power.value.base_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I see about it is it can be used in sub-classes, completely by-passing the unit checks. This is why it would be nice (if we don't go with #450) to find some way to make it available only in the base unit-less Quantity.

@@ -132,7 +133,7 @@ async def run() -> None:
def __init__( # pylint: disable=too-many-arguments
self,
size: timedelta,
resampled_data_recv: Receiver[Sample],
resampled_data_recv: Receiver[Sample[Quantity]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link doesn't seem to be pointing to any file:line in particular, maybe you can mention the file:line in the comment so I can look for it?

Comment on lines +54 to +61
@property
def base_value(self) -> float:
"""Return the value of this quantity in the base unit.

Returns:
The value of this quantity in the base unit.
"""
return self._base_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about as_base_unit instead? Because unitless is not clear that there's no transformations going on.

Maybe, I'm not sure if it is as discouraging. I think the root problem for me is you can have a Quantity without a unit in general and this is just a small consequence of that, so maybe we don't focus on this detail until we have the major design topic sorted out. By now I'm pretty sure I'm missing some fundamental use case that needs this. Maybe you can clarify it in a call and continue the conversation in #450).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -309,6 +310,55 @@ def __sub__(self, other: Self) -> Self:
"""
return self.from_watts(self.as_watts() - other.as_watts())

def __mul__(self, duration: timedelta) -> "Energy":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shsms shsms added this pull request to the merge queue Jun 20, 2023
Merged via the queue into frequenz-floss:v0.x.x with commit 5c44da2 Jun 20, 2023
@shsms shsms deleted the type-wrappers branch June 20, 2023 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:actor Affects an actor ot the actors utilities (decorator, etc.) part:data-pipeline Affects the data pipeline part:docs Affects the documentation part:tests Affects the unit, integration and performance (benchmarks) tests part:tooling Affects the development tooling (CI, deployment, dependency management, etc.)
Projects
Development

Successfully merging this pull request may close these issues.

7 participants