Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector_to_points #313 #315

Merged
merged 19 commits into from
Mar 15, 2022
Merged

Add vector_to_points #313 #315

merged 19 commits into from
Mar 15, 2022

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Dec 20, 2021

Adds a new process for the requirements given in #313.

Some questions are still open, see the comments in #313. This is therefore more a basis for further discussions rather than a final proposal.

@m-mohr m-mohr requested a review from clausmichele December 20, 2021 15:02
@m-mohr m-mohr linked an issue Dec 20, 2021 that may be closed by this pull request
@m-mohr m-mohr added the vector label Dec 20, 2021
mattia6690
mattia6690 approved these changes Dec 20, 2021
@m-mohr m-mohr requested a review from mattia6690 December 20, 2021 15:24
proposals/raster_to_points.json Outdated Show resolved Hide resolved
@m-mohr m-mohr added this to the 1.3.0 milestone Dec 20, 2021
@m-mohr
Copy link
Member Author

m-mohr commented Dec 21, 2021

After the comment from @clausmichele in #313 (comment), I've added another process vector_to_points to support the use case you have described. So depending on what we want, we can choose one of the processes in this PR and discard the other one. Or if both processes are potentially useful, merge both.

@clausmichele Is the new process okay for you?

@m-mohr m-mohr requested a review from mattia6690 December 21, 2021 12:04
@m-mohr m-mohr changed the title Add raster_to_points #313 Add raster_to_points / vector_to_points #313 Dec 21, 2021
@m-mohr m-mohr requested a review from jdries December 21, 2021 12:05
@jdries
Copy link
Contributor

jdries commented Dec 21, 2021

I also understood/remember that we would do vector_to_points, and then use aggregate_spatial to sample over a raster if desired.
raster_to_points seems like a shortcut for the same thing, but don't feel the need to have a process that is also covered by a simple combination of other processes. Also doesn't feel like it would be more efficient.

@m-mohr
Copy link
Member Author

m-mohr commented Dec 21, 2021

Alright, I removed raster_to_points.

@m-mohr m-mohr changed the title Add raster_to_points / vector_to_points #313 Add vector_to_points #313 Dec 21, 2021
proposals/vector_to_points.json Outdated Show resolved Hide resolved
proposals/vector_to_points.json Outdated Show resolved Hide resolved
proposals/vector_to_points.json Outdated Show resolved Hide resolved
@clausmichele
Copy link
Member

clausmichele commented Jan 20, 2022

I'm currently trying to implement this process. However, I was thinking that maybe the "count" parameter should be more a maximum number of points to extract in total. Otherwise there could be luge polygons where we extract 100 points a tiny ones where we do the same. What do you think?
I have implemented a basic python code that performs this operation, you can have a look at it ere https://github.com/clausmichele/openeo_vector_to_points/blob/main/vector_to_points_test.ipynb

cc @soxofaan let me know if you have suggestions on how to improve the code!

@soxofaan
Copy link
Member

However, I was thinking that maybe the "count" parameter should be more a maximum number of points to extract in total. Otherwise there could be luge polygons where we extract 100 points a tiny ones where we do the same. What do you think?

yes, I'm also wondering whether some kind of target density wouldn't be more practical than a bare count. (Still having optional minimum and maximum count might still be useful.) See #315 (comment)

@m-mohr
Copy link
Member Author

m-mohr commented Mar 9, 2022

Open points:

@soxofaan
Copy link
Member

soxofaan commented Mar 9, 2022

  1. If no point is within the geometry, compute the centroid of it, if it is contained return it
  2. If the centroid is not in the geometry, return the first geometry coordinate

FYI, shapely/geopandas has this method representative_point:

Returns a cheaply computed point that is guaranteed to be within the geometric object.

I'm not completely sure how it works, but I think it's explained here: https://gis.stackexchange.com/questions/414260/how-does-geopandas-representative-point-work

@clausmichele
Copy link
Member

clausmichele commented Mar 10, 2022

Thanks, this is indeed interesting

@clausmichele
Copy link
Member

@m-mohr I've just noticed that:

  • We defined the vector_to_*_points to return the points sampled from each geometry as MultiPoint.
  • However, if we want to use this in combination with aggregate_spatial, it aggregates all the points within a MultiPoint feature and returns a single value. This is an issue since we would like to extract and use the values from all the points.

How do we solve this? Should we modify the definition and return just points instead of MultiPoint?

@soxofaan
Copy link
Member

Good point

and related: if you make each point a separate feature, you probably need a reference back to the original geometry in some way (because you can not just do that based on index anymore)

@m-mohr
Copy link
Member Author

m-mohr commented Mar 10, 2022

Would a new process such as split_multi_geometries work? But overall the question is what is generally useful and required? Do you need a reference back or group the original geometry? Or is the underlying issue aggregate_spatial and we should not (mis-)use it then?

@clausmichele
Copy link
Member

clausmichele commented Mar 10, 2022

It's getting every day more complicated!
From a back-end perspective I would rather work only on single geometries and not with their grouped version, it would be easier to define and handle.
From a user perspective I see that it could be useful to define MultiPolygon for a single class and also having a MultiPoint from the same class, avoiding to repeat the properties or target variable multiple times.

Possible solutions:

  1. Return single points instead of multi point
  2. Define another process to split multiple geometries
  3. Modify aggregate_spatial (the less viable solution)

split_multi_geometries would work imho, but we would need to define another process instead of modifying a draft of the current ones.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2022

It seems like the actual issue here is aggregate_spatial though, right? We are somewhat misusing it to combine points with values although we are not aggregating, right? If that's the actual issue, then we need to check how we can accomplish that in a clean way.

@clausmichele
Copy link
Member

Yes, in my opinion using single points in aggregate_spatial it's an edge_case that can be handled as we are doing.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2022

But then we have no solution. I don't really understand what you are heading at right now.
Another option is probably to let a user choose in vector to points processes whether they want the points to be grouped (returns MultiPoint for each geometry given) or independent (returns points without reference back to original geometries). Would that work? So that would be a new parameter then...

@clausmichele
Copy link
Member

This would be a good solution indeed! For me it would be totally fine to let the user choose based on a parameter.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 14, 2022

Does the following work? 41b37ad

Also, we still need to update distance to not work in degree any longer, but instead adopt what has been discussed in #330

@soxofaan
Copy link
Member

(hmm for some reason my two comments above do no reference code diffs here, I hope it's clear what I refer to)

@m-mohr
Copy link
Member Author

m-mohr commented Mar 15, 2022

@soxofaan Likely because you commented on a specific commit...

@m-mohr
Copy link
Member Author

m-mohr commented Mar 15, 2022

I'm completely overwhelmed by this PR and can't really follow the status any longer. Thus, I think I'd merge this soon and then work on the remaining issues separately in issues and PRs.

@soxofaan
Copy link
Member

I'm completely overwhelmed by this PR and can't really follow the status any longer. Thus, I think I'd merge this soon and then work on the remaining issues separately in issues and PRs.

I even think we should do that more often: merge something under proposals knowingly it's incomplete. At least there is something implementers can start to play with and find additional issues and unclarities

@m-mohr
Copy link
Member Author

m-mohr commented Mar 15, 2022

Okay, I think I've opened issues for all remaining open discussions: #344, #345, #347, #343, #330
If I've missed something, please open a new issue.
Merging now.

@m-mohr m-mohr merged commit 536508c into draft Mar 15, 2022
@m-mohr m-mohr deleted the issue-313 branch March 15, 2022 14:04
@m-mohr m-mohr modified the milestones: 1.3.0, 2.0.0 Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed new process vector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sampling
6 participants