Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregate_temporal -> labels clarification #19

Closed
jdries opened this issue Jan 21, 2019 · 11 comments
Closed

aggregate_temporal -> labels clarification #19

jdries opened this issue Jan 21, 2019 · 11 comments
Milestone

Comments

@jdries
Copy link
Contributor

jdries commented Jan 21, 2019

Currently the definition of the labels parameter in aggregate_temporal is:

Labels for the intervals. The number of labels and the number of groups need to be equal.

One of the most common cases IMO, is that the labels are again valid timestamps. This way, the time dimension is mapped unto a time dimension, and not something entirely new.

Not sure if this should be a hard constraint, but for me it's at least a recommended suggestion. Also an example can be provided.
For me, it's also an option to use the first timestamp of each interval as the default label. That way the labels parameter is no longer mandatory, and the function becomes easier to use.

@m-mohr m-mohr added this to the v0.4 milestone Jan 21, 2019
@m-mohr
Copy link
Member

m-mohr commented Jan 21, 2019

There are multiple potential changes in your text:

  1. Allow timestamps for the labels:
    Yes, that's a good idea. I added the corresponding string formats.
  2. Add an example to the process.
    I added an example.
  3. Use the first timestamp of each interval as default label.
    Sounds reasonable. As the current approach was discussed and decided on the VITO sprint, I'd like wait for more opinions on this one. Your proposal makes sense in some cases, in some it seems not so meaningful. As @mkadunc was a main driver of the current proporal, what's your opinion, Miha?

The changes will be pushed to the repository in the next hours.

@jdries
Copy link
Contributor Author

jdries commented Jan 21, 2019

Thanks, one more question: can intervals overlap?

@m-mohr
Copy link
Member

m-mohr commented Jan 21, 2019

Yes, I haven't got a reason why we should restrict it, so I added "Intervals can overlap." to the parameter description.

Changes are now available in the documentation: https://open-eo.github.io/openeo-api/v/0.4.0/processreference/#aggregate_temporal

@mkadunc
Copy link
Member

mkadunc commented Jan 22, 2019

I agree with using the first timestamp of the interval as the default label, but you'd need to resolve conflicts in case many intervals start at the same value (which is possible when we allow overlap).

Why do we need the labels anyway? As far as I know we haven't really defined the 'auxiliary data' about the axes and such, so no need to define labels at the moment?

As for overlaps, I don't mind having them, but it means that aggregate would need to be redefined — openEO glossary emphasizes unique assignment to groups, restricting aggregate operations to non-overlapping subdivisions of the cube.

P.S. (not saying we should do it, just a side-note): In my ideal interface, the aggregation operation might look like this: aggregate(cube, source_axis, target_axis). The target_axis would contain all the required information on the intervals and labels... This would be useful in e.g. in case you have two cubes and want to use aggregate to make one compatible with the other: aggregate(cube_b, "time", cube_a.get_axis["time"]);

@jdries
Copy link
Contributor Author

jdries commented Jan 23, 2019

For me the labels or target_axis are relevant in the sense that they allow the backend to derive the new discrete time steps that are available after the aggregation, without actually having to run it.
Only annoyance with the current definition is that aggregate_temporal allows one to change the temporal dimension into an entirely different dimension, even by accident. Wouldn't it be an option to have a separate function to convert dimension labels? (Even though I don't have a use case for it myself.)
That would allow us to restrict the output of aggregate_temporal to still have a time dimension.

Overlaps: some compositing functions use overlapping time windows.

@mkadunc
Copy link
Member

mkadunc commented Jan 23, 2019

they allow the backend to derive the new discrete time steps that are available after the aggregation, without actually having to run it.

It's hard to discuss this without a domain model of data cubes' meta-information...

But I agree with your suggestion - given that the time steps are already available from the intervals, additional labels parameter would only be for human consumption - programatically you'd do best to use whole intervals as labels for axis cells, and let the user do additional annotation or conversion on those (which tells us that we need some processes for manipulating data cube meta-information)

I also agree that dimension should be preserved across aggregation, but its axis should have different partitioning / sampling specification - we shouldn't automatically convert a regularly-sampled axis into interval-partitioned axis without changing some of its meta-attributes.

@m-mohr
Copy link
Member

m-mohr commented Feb 7, 2019

We discussed that for 0.4 we stick with the current proposal, but @jdries may just remove the ability to specify string-only labels as GepPySpark can't handle them. Moving the discussion to 0.5 or 1.0.

@m-mohr m-mohr modified the milestones: v0.4, v0.5 Feb 7, 2019
@m-mohr m-mohr added the help wanted Extra attention is needed label Dec 16, 2019
@m-mohr
Copy link
Member

m-mohr commented Dec 16, 2019

Let's revisit this issue: We now have rename_labels. I'm wondering whether we should just remove the labels parameter from aggregate_temporal completely and just handle it as in reduce: We just enumerate the labels by default based on the intervals (i.e. [ [ "2015-01-01", "2016-01-01" ], [ "2016-01-01", "2017-01-01" ], [ "2017-01-01", "2018-01-01" ] ] will have the labels [0, 1, 2]). Afterwards, a user can call rename_labels when required. @jdries @mkadunc

Note to myself: I would need to allow the temporal string types in the labels parameter of rename_labels.

@m-mohr
Copy link
Member

m-mohr commented Jan 15, 2020

Note: aggregation assignments to groups are not necessarily unique any longer, see Open-EO/openeo.org@4397fde and #107 (comment).

@m-mohr
Copy link
Member

m-mohr commented Jan 15, 2020

@jdries PR #122 makes the parameter labels optional.
Is there anything left to be done here? Otherwise I'd close after we have merged #122.

@m-mohr m-mohr added has PR and removed help wanted Extra attention is needed labels Jan 15, 2020
@jdries
Copy link
Contributor Author

jdries commented Jan 15, 2020

Agree to merge and close!

@m-mohr m-mohr closed this as completed Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants