Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimal profile for backends #56

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions federation/backends/minimal_profile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Minimal profile for openEO 'raster' backends
This document describes the requirements for a backend that aims to provide minimal functionality for access and processing of gridded
earth observation data also referred to as rasters. Earth observation raster data access and processing at a large scale is the main use case that drove the design of openEO. The final API is quite generic on purpose, so that it can also serve other cases. This however makes it impossible to define certain restrictions in the main API, hence the need for this profile.

This document does not aim to contradict the openEO API, and when this is perceived to be the case, the definitions in the openEO API get precedence.
It is possible that requirements in this document do define additional constraints on top of the openEO API.

This profile is established because the openEO web service API is quite liberal if interpreted strictly. So a web service can be compliant to the API
specification, but still be mostly incompatible with other backends, and existing usage patterns. By following this profile, implementors have a better
view on what is important.

## Scope
This minimal profile aims to help initial openEO implementations, that seek a least effort approach to achieve minimal compatibility with other backends.
Requirements in this profile are not sufficient for backends in the openEO platform federation.

## Mandatory endpoints

In the openEO API, all endpoints are optional and discoverable, except for the basic capabilities and versioning endpoints:

https://api.openeo.org/#tag/Capabilities

https://api.openeo.org/#tag/Capabilities/operation/connect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-mohr are these #tag/... URL stable enough to point to a specific API subset or are there better ways?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stable as redoc is... Should be good enough for now.


[Authentication](https://api.openeo.org/#section/Authentication) is also mandatory, but this profile does not define a specific method. We highlight that openID connect + PKCE is one of the only standardized and secure methods that are available.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKCE is conceptually only a consideration between the client and identity provider, the back-end has little involvement (at most in specifying the grant_types in the default_clients for providers under /credentials/oidc). To keep the scope of this minimal profile minimal, it is maybe better to not mention PKCE?





Next to those elements, these are the endpoints that are required by this profile:


```
[
{
"methods": [
"GET"
],
"path": "/file_formats"
},
{
"methods": [
"GET"
],
"path": "/collections"
},
{
"methods": [
"GET"
],
"path": "/collections/{collection_id}"
},
{
"methods": [
"GET"
],
"path": "/processes"
},
{
"methods": [
"GET",
"POST"
],
"path": "/jobs"
},
{
"methods": [
"GET",
"DELETE"
],
"path": "/jobs/{job_id}"
},
{
"methods": [
"GET",
"POST",
"DELETE"
],
"path": "/jobs/{job_id}/results"
}
],
```

## Batch job support

Support for openEO batch jobs is mandatory, but certain specific elements can be ommitted. The previous section indicates the exact endpoints and HTTP requests that need to be supported.


## File formats
[GeoTiff](./fileformats.md#geotiff) is the only mandatory output format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have GeoJSON as mandatory input/output format too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does any of the processes listed actually require this as output format? I don't think so so am wondering why we would list it here as output format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as output format probably not, but filter_spatial requires it as input format I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! If we add input format here it should be included as the list filter_spatial below as minimal process.


## Collections
A minimal backend shall have at least one raster collection, allowing to build raster data cubes.

## Processes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but the problem is that ESA asked for something even more minimal, as the core profile for openEO platform is already quite demanding. (Minimal openEO backends would not be able to join the openEO platform federation, then they would indeed hace to support the core profile.)

Copy link
Collaborator

@m-mohr m-mohr Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... somewhat makes sense. A couple of questions arise though:

  • How was this list compiled? Looking at the "minimal" list is looks like it could be more minimal (e.g. remove normalized_difference? filters?).
  • What's the difference between a minimal openEO backend and a backend that wants to join the federation? Why do we specifiy something for a minimal openEO backend here? Thus, I understand this as "minimal openEO Platform backend".
  • What's the use of having two lists? One specified a minimal profile, the other one was until now a requirement to be implemented as part of the federation contract. That's contradicting.
  • Should we have this in one place and one table?
  • Do these lists actually make sense at all? Maybe more as a guideline? A "vector backend" would never be able to fullfill these...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo it makes no sense to provide and maintain two different lists as platform if we don't do anything with this one here.
As it's not that easy to actually fulfil the platform core profile (as discussed our backend is missing some of them) I would be in favour of a simpler list. Having all processes listed here would of course not mean that you automatically become a platform backend (there are other points to consider that should be clearer defined somewhere).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A profile is indeed mostly a guideline. Basically, anyone wanting to implement a backend will need guidance on which processes to implement first, and then a minimal list like this can be helpful.
If we don't have a 'minimal' list for raster backends, providers can implement anything at all and call themselves openEO compliant.
If we only have the openEO platform list, then the threshold is quite high.
Hence the two lists?
As to the contents of the list, I'm fine with reducing it or adapting in any way.
I specified it here because didn't have another good location, but openEO.org is also a good option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have a 'minimal' list for raster backends, providers can implement anything at all and call themselves openEO compliant. [...] but openEO.org is also a good option?

Indeed and that is intentional. You can choose what you need or want, openEO doesn't tell you what you have to implement. If you just implement load_collection and save_result you can be openEO compliant and I don't think why we should raise the bar. Guidance is okay, but it shouldn't be required to be openEO compliant. But that's unrelated to openEO Platform, where we can define a profile for becoming part of the federation or so. But we have two lists now and I still don't exactly understand why we need two of them.

If you actually want to define a "minimal" guideline for openEO itself, I'd say propose it in the PSC and/or community meetings. If this is meant for openEO Platform then it's a different story and should live here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, moved the discussion elsewhere so we can indeed have it in the PSC:
https://github.com/Open-EO/openeo-api/issues/490


### Data Cubes

- `apply`: Apply a process to each pixel
- `apply_dimension`: Apply a process to pixels along a dimension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't reduce_dimension a better candidate for a minimal profile than apply_dimension? (especially because there are no UDFs in the minimal profile)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, under Math and logic below there are currently no processes that take an array as input (like mean, min, max), so apply_dimension nor reduce_dimension can not be used anyway

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, you need reduce_dimension to be able to easily get results for the only supported file format, GeoTiff, in the minimal profile. So reduce_dimension should be included with at least a small number of reducers for composites (min/max/mean?), I'm not sure about apply_dimension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch jobs do allow writing to geotiff without reduce_dimension, it just writes a geotiff per date in that case.
But we can indeed switch to reduce_dimension, not sure why I went for apply_dimension...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your implementation can do that and it's allowed, indeed. But that's not minimal for me. Minimal, for me, is providing a 3D cube to write a single GeoTiff.

- `dimension_labels`: Get the dimension labels
- `filter_bands`: Filter the bands by names
- `filter_bbox`: Spatial filter using a bounding box
- `filter_spatial`: Spatial filter using geometries
- `filter_temporal`: Temporal filter for temporal intervals
- `load_collection`: Load a collection
- `mask`: Apply a raster mask
- `resample_spatial`: Resample and warp the spatial dimensions
- `save_result`: Save processed data

For backends with more than one collection:
- `merge_cubes`: Merge two data cubes
- `resample_cube_spatial`: Resample the spatial dimensions to match a target data cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this one is not that minimal. It's from this list the only one we couldn't implemented in our backend. This is certainly due to our approach but maybe also means it's a more complicated to support than others.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resample_cube_spatial is now implicitly part of merge_cubes and mask as of v2.0.0 so if we have merge_cubes it seems we should also have resample_cube_spatial. But hearing that SH can't implement it(?), I fear that we may actually regret doing Open-EO/openeo-processes#405 - Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dthiex if you do support merge_cubes, but not resample_cube_spatial, is it possible that cubes on sentinelhub or somehow aligned already by default more easily? Or does merge_cubes result in an error in a lot of cases?

In any case, if you remove it from this list, then there's a lot of example process graphs that don't work on a minimal backend.



### Math
- `absolute`: Absolute value
- `add`: Addition of two numbers
- `arccos`: Inverse cosine
- `arcosh`: Inverse hyperbolic cosine
- `arcsin`: Inverse sine
- `arctan`: Inverse tangent
- `arctan2`: Inverse tangent of two numbers
- `arsinh`: Inverse hyperbolic sine
- `artanh`: Inverse hyperbolic tangent
- `ceil`: Round fractions up
- `clip`: Clip a value between a minimum and a maximum
- `constant`: Define a constant value -> pretty easy implementation
- `cos`: Cosine
- `cosh`: Hyperbolic cosine
- `divide`: Division of two numbers
- `e`: Euler's number
- `exp`: Exponentiation to the base e
- `floor`: Round fractions down
- `int`: Integer part of a number
- `linear_scale_range`: Linear transformation between two ranges
- `ln`: Natural logarithm
- `log`: Logarithm to a base
- `mod`: Modulo
- `multiply`: Multiplication of two numbers
- `pi`: Pi
- `power`: Exponentiation
- `round`: Round to a specified precision
- `sgn`: Signum
- `sin`: Sine
- `sinh`: Hyperbolic sine
- `sqrt`: Square root
- `subtract`: Subtraction of two numbers
- `tan`: Tangent
- `tanh`: Hyperbolic tangent

### Indices
- `normalized_difference`: Normalized difference

### Logic
- `and`: Logical AND
- `all`: Are all of the values true?
- `any`: Is at least one value true?
- `between`: Between comparison
- `eq`: Equal to comparison
- `gt`: Greater than comparison
- `gte`: Greater than or equal to comparison
- `if`: If-Then-Else conditional
- `is_infinite`: Value is an infinite number - **experimental**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not list experimental processes as required in a minimal_profile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it is also not implemented yet by any backend afaik. I thought backends would implement it soon when we added it to the other profile as it's a relatively simple process, but that did not happen yet. So I think we can remove it for now (or start implementing it).

- `is_nan`: Value is not a number
- `is_nodata`: Value is a no-data value
- `is_valid`: Value is valid data
- `lt`: Less than comparison
- `lte`: Less than or equal to comparison
- `neq`: Not equal to comparison
- `not`: Inverting a boolean
- `or`: Logical OR
- `xor`: Logical XOR (exclusive or)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above about apply_dimension: there is probably need for some minimal array proceses too: array_element, mean, min, max?