Protection from large Coverage API requests #54

clynnes · 2020-01-10T17:22:30Z

For large datasets, such as satellite products covering long periods of time, it is easy to submit a synchronous coverage request that cannot be fulfilled within typical timeout limits. Even if a large request CAN be fulfilled, the user may be impacted by getting much more data volume than expected. There is likely no one solution to solve this for the diversity of users and servers. An assemblage of solutions might include the following capabilities:

- page through the coverage response by spatial or temporal chunk or tile
- obtain estimated size of a response before actually requesting the data (like a HEAD request)
- allow servers to advertise maximum request limit
- have a specific error response that indicates the request is too big to satisfy
- support asynchronous response

nmtoken · 2020-01-10T17:31:13Z

allow servers to advertise maximum request limit

How would servers be able to evaluate whether a request was too big?

clynnes · 2020-01-10T17:41:56Z

For coverages that are regular grids, the response volume can often be computed from the bounding box size and the time period requested. For servers in the cloud, a response may also be too big for cost reasons, so the next step would be to estimate egress and processing charges. For multi-point coverages, this may be more difficult.

cmheazel · 2021-03-10T00:31:50Z

API-Common Part 2 will allow a server to place a limit on the size of a response. If a response will exceed that limit, then the response is "paged", returning a partial response and the metadata necessary to retrieve the rest. With some adjustments, this existing capability can be applied to coverages.

chris-little · 2021-03-10T16:11:59Z

@cmheazel @clynnes I suggest that this is split into two issues: response too big, and asynchronous response. See async text in the EDR API

cmheazel · 2021-04-20T14:26:17Z

@chris-little So far this discussion has not touched on async. However, I have captured the async issue under API-Common issue 231.

jerstlouis · 2021-05-26T16:02:43Z

SWG 2021-05-26: Agreed that service-metadata is the best place to describe API-wide limits in terms of a maximum number of cells and/or uncompressed megabytes of data that the server will accept to return. Furthermore we need a permission that the server can reject a client request (some standardized 400 code?) if the client is asking for too much data.

nmtoken · 2021-05-26T16:59:45Z

413 Payload Too Large ~ https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.11

jerstlouis · 2021-05-26T17:03:01Z

@nmtoken As we discussed in the call, I believe 413 as defined might applies to the payload provided by the client, as opposed to the payload being returned with the response, though that should be clarified.

nmtoken · 2021-05-26T17:15:53Z

@jerstlouis Sorry, wasn't on the call and missed the discussion.

Now you say it, I agree, it could mean that too or either may apply. My next pick would be 400 Bad Request ~ https://datatracker.ietf.org/doc/html/rfc7231#section-6.5.1

he 400 (Bad Request) status code indicates that the server cannot or
will not process the request due to something that is perceived to be
a client error (e.g., malformed request syntax, invalid request
message framing, or deceptive request routing).

jerstlouis · 2021-05-28T00:49:00Z

As a related aside to this, I would like to suggest that we consider that for servers supporting the scaling conformance classes, that a default /coverage request could provide downsampled data if scaleFactor=1 is not explicitly provided. This makes it easy for client to get an overview of the whole coverage regardless of its dimensions or resolution, and prevents the default request from returning an error.

e.g.

https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama/coverage?scaleFactor=1 returns an error, but
https://maps.ecere.com/ogcapi/collections/SRTM_ViewFinderPanorama/coverage does not.

pebau · 2021-05-28T06:37:44Z

not a good idea to change the behavior of Core in a way that is not intuitive and not consistent. And makes it more complicated to understand. Why not focus energy on a good set of examples, which would be extremely beneficial...

jerstlouis · 2021-05-28T10:13:39Z

@pebau while I agree on the examples, that is a separate topic.

The reason I make this suggestion here is that the link to /coverage is provided from the collection resource, and for many coverages it is a broken link that is asking for too much data if not modified with additional parameters specified.

Principle 23 of the OGC API Web Guidelines mentions:

Good practice for defining default behavior is avoiding exception, error or empty pages as the response.

So I really like the idea of being able to return the downsampled coverage by default if the full resolution is too much for the server to process or the client expecting to receive, while saying scaleFactor=1 would mean the client knows what it is doing and insists on retrieving the full resolution data.

I also believe it is part of the solution to the problem raised in this issue by @clynnes .
Also to address the original suggestions, all of these are good capabilities as well:

page through the coverage response by spatial or temporal chunk or tile

The specification now has conformance classes for subset and coverage tiles.

obtain estimated size of a response before actually requesting the data (like a HEAD request)

We should probably add this as a new requirement?

allow servers to advertise maximum request limit

We were just discussing adding this to service metadata at the API level.

have a specific error response that indicates the request is too big to satisfy

We were just discussing 400 is likely the proper error response for this.

support asynchronous response

Discussed in #66 -- multiple OGC API specifications are going to use the Prefer header to indicate that an async response is desired, and this could be defined in Common.

pebau · 2021-05-28T10:28:50Z

@jerstlouis you commonly "like the idea" - let me note, for the records, that others do not necessarily like it (in this case: at least me). In my world, more technically based arguments tend to be used.
In the case on hand, concerns get mixed: consistent service logic and resource-dependent limitations such as quota.

jerstlouis · 2021-05-28T10:40:51Z

@pebau I included the reason why I like the idea: to avoid a default behavior returning an error or returning more data than the client (or webcrawler) intended to request. I understand your opinion and concerns about the service consistency, I think it's a question of balancing perceived consistent service logic vs. the implications of having a default behavior which is normally not what is intended by users/clients.

The webcrawler case is probably the best reason to go for this -- this was much less of a concern with WCS as the GetCoverage request link was not exposed in the same web-friendly way as it is with OGC API - Coverages. (and returning an error to indexing crawlers negatively affects SEO and thus Findability).

pebau · 2021-05-29T16:16:23Z

@jerstlouis I do not share the idea that users would intend what is sketched (let aside that a complete specification is still missing), also with webcrawlers. If this is a use case, how is that solved by other tools and services, in the geo and other domains? Any investigation available?

pomakis · 2021-05-29T16:48:13Z

Admittedly we haven't actually had any real clients yet for our OGC API - Coverages implementation, but I can tell you that of all the people within our company that I got to test drive what's been implemented so far, they've all complained that our /coverage endpoint is broken because they get a "400 Bad Request" response. So in this respect it seems to me that having endpoints that actually work without having to append parameters to them would be a good thing.

If a client wants to explicitly request the full-scale coverage, it always can do so by adding scaleFactor=1. It doesn't even have to check to see if the server actually supports the scaling conformance class, since servers that don't support this conformance class would presumably just ignore this parameter (or throw an error if its value is something other than 1).

I don't feel strongly about this one way or another, and am happy to let the industry decide, but I just figured I'd share my own personal experience and opinion in this matter.

jerstlouis · 2021-11-24T16:51:05Z

SWG 2021-11-25: We agreed to include permissions and requirements along these lines:

Permission: A server advertising conformance for the scaling conformance class MAY return a downsampled version of the coverage, if the client does not explicitly request scale-factor=1

Requirement: Even if the scaling conformance class is not supported, the server SHALL understand requests specifying scale-factor=1 as indicating that the client wishes to retrieve the data at full resolution.

If a client wants to page and the server supports the Coverage Tiles conformance class, that can be used and the client can negotiate e.g. its preferred TileMatrixSet to page/request in the most convenient way.

pebau · 2021-11-30T17:36:17Z

just for the records, I am strongly against this.

jerstlouis · 2022-01-12T16:54:20Z

2022-01-12: Suggestion from @tomkralidis to have a boolean property in the metadata indicating that the default /coverage request will return downsampled data.

Motion to update the spec to include permissions to return downsampled data, introduce this boolean property, and requirement to accept scale-factor=1 even when scaling conformance class is not supported, and to allow advertising limits as per #110.
Moved: Jerome
Second: Stephan
Discussion:
Noting Peter's objection.
3 votes in favor in this meeting.

jerstlouis · 2023-09-06T18:07:10Z

@tomkralidis

Question regarding your suggestion

to have a boolean property in the metadata indicating that the default /coverage request will return downsampled data

There is a permission now in place.

In the case where a response to a request without a scale-factor, scale-axes or scale-size parameters would be larger than an advertised server limit, an implementation MAY automatically downsample the coverage to a suitable resolution instead of returning a 4xx client error.

and a Note:

A client retrieving a coverage from an implementation advertising support for this "Scaling" requirements class should explicitly use scale-factor=1 if it wants to ensure retrieving the coverage in its native resolution and prefers receiving an error instead of a downsampled version of the data.

I am wondering now how to implement this suggestion, and having seconds thoughts about whether it is really necessary / useful. The only place we could include this metadata (e.g., "defaultCoverageMayBeDownsampled" : true) would be in /collections/{collectionId}.

My feeling, as per the note, is that any client requesting /coverage needs to explicitly use ?scale-factor=1 as query parameter if it absolutely does not want downsampled data. Then these clients should not care what this property says, and should just assume this to be the case for all implementations.

Adding more things to the collection description complicates things.
Adding this could also have an opposite effect to the intent of preventing downsampled data used inadvertently.
It could create a situation where clients think they can rely on looking at defaultCoverageMayBeDownsampled and if it is not there or false, they do not need to use ?scale-factor=1 to ensure downsampled data, but a server implementation could then fail this requirement. It actually makes things more complicated to implement for both client and server to look for this property and branch one way or the other, and introduces more potential points of failure.

In OGC API - Maps we already have this expectation that /map is not necessarily the most detailed resolution.
In Features there is a paging mechanism, so an expectation that /items will not necessarily return all items.

Thoughts about whether it is necessary or not to include something like "defaultCoverageMayBeDownsampled" : true, or we may be better off without it?

tomkralidis · 2023-09-10T14:57:07Z

@jerstlouis so to summarize:

the server provides the data in whatever resolution it decides, by default/implicitly, and that clients can explicitly specify scale-factor=1 if they so choose
no additional collection level metadata is then required

That works for me, we should have some informative text on this behaviour (probably a similar in OAMaps while we're at it).

nmtoken closed this as completed Jan 10, 2020

nmtoken reopened this Jan 10, 2020

cmheazel added the Common Should be addressed in Common label Mar 3, 2021

jerstlouis added the BeforeOABReview label Aug 11, 2021

jerstlouis self-assigned this Feb 23, 2022

jerstlouis added this to OGC API - Coverages - Part 1: Core Aug 21, 2024

jerstlouis moved this to In Review in OGC API - Coverages - Part 1: Core Aug 21, 2024

This was referenced Aug 28, 2024

Implications of supporting scale-factor=1 for static endpoint #190

Open

Improve Examples Annex (Draft v0.0.7 some feedback) #186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protection from large Coverage API requests #54

Protection from large Coverage API requests #54

clynnes commented Jan 10, 2020

nmtoken commented Jan 10, 2020

clynnes commented Jan 10, 2020

cmheazel commented Mar 10, 2021

chris-little commented Mar 10, 2021

cmheazel commented Apr 20, 2021

jerstlouis commented May 26, 2021

nmtoken commented May 26, 2021

jerstlouis commented May 26, 2021 •

edited

Loading

nmtoken commented May 26, 2021

jerstlouis commented May 28, 2021 •

edited

Loading

pebau commented May 28, 2021

jerstlouis commented May 28, 2021 •

edited

Loading

pebau commented May 28, 2021

jerstlouis commented May 28, 2021 •

edited

Loading

pebau commented May 29, 2021

pomakis commented May 29, 2021

jerstlouis commented Nov 24, 2021 •

edited

Loading

pebau commented Nov 30, 2021

jerstlouis commented Jan 12, 2022 •

edited

Loading

jerstlouis commented Sep 6, 2023

tomkralidis commented Sep 10, 2023

Protection from large Coverage API requests #54

Protection from large Coverage API requests #54

Comments

clynnes commented Jan 10, 2020

nmtoken commented Jan 10, 2020

clynnes commented Jan 10, 2020

cmheazel commented Mar 10, 2021

chris-little commented Mar 10, 2021

cmheazel commented Apr 20, 2021

jerstlouis commented May 26, 2021

nmtoken commented May 26, 2021

jerstlouis commented May 26, 2021 • edited Loading

nmtoken commented May 26, 2021

jerstlouis commented May 28, 2021 • edited Loading

pebau commented May 28, 2021

jerstlouis commented May 28, 2021 • edited Loading

pebau commented May 28, 2021

jerstlouis commented May 28, 2021 • edited Loading

pebau commented May 29, 2021

pomakis commented May 29, 2021

jerstlouis commented Nov 24, 2021 • edited Loading

pebau commented Nov 30, 2021

jerstlouis commented Jan 12, 2022 • edited Loading

jerstlouis commented Sep 6, 2023

tomkralidis commented Sep 10, 2023

jerstlouis commented May 26, 2021 •

edited

Loading

jerstlouis commented May 28, 2021 •

edited

Loading

jerstlouis commented May 28, 2021 •

edited

Loading

jerstlouis commented May 28, 2021 •

edited

Loading

jerstlouis commented Nov 24, 2021 •

edited

Loading

jerstlouis commented Jan 12, 2022 •

edited

Loading