Validity of retrying PUT instance requests (and concurrent request behaviour) #301

wryun · 2017-08-30T02:37:12Z

I'm trying to understand the exact meaning of the spec around retrying/concurrency. This touches on two current PRs, #300 and #254.

The intention of the first PR seems to be that if anything at all happens, the marketplace SHOULD not retry but instead DELETE the resource. However, it seems like a reading of the spec implies that the broker MUST:

return 422 if it can't handle concurrency due to activity in progress (PR above changes to mutation, though, which is quite different?)
return 202 if provisioning is in progress (or 200 if it's prepared to wait for provisioning to finish)

This makes it seem like the marketplace could elect NOT to DELETE the resource, but instead retry the PUT on certain error responses (notably broker timeouts, but potentially on 5xx's and 408s). And in fact, it's not clear why you would want to bother deleting the resource even though you SHOULD, since it seems more efficient to retry and the broker MUST support that - though admittedly on a repeated broker 500 code it probably makes sense to DELETE and try a completely new request.

From discussing this with @nilebox, he has said that this is not the expected reading of the spec, and that brokers must be assumed to be less intelligent than this. Possible clarifications:

clearly identify if brokers that don't support concurrency are allowed to behave in an undefined way OR must respond with 422
move the additional handling of 422 described in 'Blocking Operations' up to the table of response codes

Alternatively, if retries are a desired behaviour, it may make sense to have a 500 code which indicates that the request is retryable (or conversely, a 500 which indicates it's a terminal failure).

nilebox · 2017-08-30T03:47:33Z

To be clear, the issue is about clarifying the blocking operations section (with related PRs mentioned above), and specifically two sentences:

Brokers do not have to support concurrent requests that act on the same set of resources.

Based on our prior discussions about support for stateless broker, I read this as: "broker is allowed to assume that no concurrent provisioning requests for the same instanceId will ever be executed, period". So, if the concurrent requests do occur, the behavior of the broker is unpredictable.

If a broker receives a request that it is not able to process due to other activity being done on that resource then the broker MUST reject the request with an HTTP 422 Unprocessable Entity.

And I am reading this as "but broker MAY detect concurrent requests and return specific errors". Even though there is a "MUST" requirement in the sentence, it is applied only "if not able to process" (very vague phrase), hence I am treating it as a "MAY" requirement. And this sounds again like an edge case trying to mitigate potential bugs in marketplace, as previously mentioned by @shalako in #291 (comment)

I agree with @wryun that this section is ambiguous, and needs to be clarified, specifically:

is marketplace allowed to execute concurrent requests to the broker?
does broker have to ("MUST") detect concurrent requests (and if it can't handle them, throw correct errors)?

duglin · 2017-08-30T15:41:05Z

My 2 cents:

is marketplace allowed to execute concurrent requests to the broker?

yes.

does broker have to ("MUST") detect concurrent requests (and if it can't handle them, throw correct errors)?

No it does not need to detect concurrent requests. However, if it does and doesn't like them, then it MUST use the 422 error code instead of something else. If it doesn't detect them AND it doesn't behave well if someone does do it, then IMO its a bug in the broker.

#300 is not about retries at all - its just about how to deal with concurrent requests.

wryun · 2017-08-30T21:48:21Z

@duglin the reason why I say #300 is connected to retries is that, AFAIK, the main thing that would prohibit retry behaviour in the event of timeout is that concurrent requests are not supported by the broker.

In this case, the change of 'act on' to 'mutate' could suggest that it's ok to make retries to any broker as they would not change the parameters (i.e. that should return a 20x on PUT, but PATCH is not supported). I'm probably taking an odd interpretation of mutate here, but to me 'act on' seems clearer if the intent is to allow the broker to support no concurrent requests.

EDIT: to give some context, our internal marketplace currently retries PUTs even though this is against the SHOULD behaviour of the spec. Also, this is related to an older issue from @nilebox #182 which I hadn't seen before. It's confusing us, at least!

nilebox · 2017-08-30T23:25:40Z

@duglin I do like your interpretation more. It's not the same as "Brokers do not have to support concurrent requests" though. According to your interpretation, brokers MUST properly handle concurrent requests (whether through idempotency, or by producing the same side effects as the original request, or by detecting and rejecting unprocessable concurrent requests).

If your interpretation is correct then IMO the text should be rewritten to something like:

In some situations, marketplace MAY issue concurrent requests (with identical parameters) to brokers, and brokers MUST correctly handle them. If a broker receives a request that it is not able to process due to other activity being done on that resource then the broker MUST reject the duplicate request with an HTTP 422 Unprocessable Entity.

mattmcneeney · 2019-06-24T15:04:10Z

@wryun I'm trying to work out if this is still a problem for anyone. I know we did some work to clarify some retry/concurrency behaviour.

wryun · 2019-06-27T09:48:23Z

I'm no longer working on this. @ash2k is probably the best person to comment.

mattmcneeney · 2019-07-25T09:54:51Z

@ash2k do you have any thoughts on this? We will close this on the next call if we don't hear anything. Thanks!

ash2k · 2019-07-26T01:21:58Z

It's not a problem for me/us anymore, but if the spec is not clear/explicit about a situation, why not explicitly describe the intended behavior? If everybody on the call agree about how it should work - great, put it into the spec :) If people have different opinions, then it's even more important to clarify the spec, isn't it? Just closing the issue is not helping anyone who wants to understand it better.

mattmcneeney · 2019-07-26T09:21:38Z

Absolutely - I totally agree that if there is undefined or unclear behaviour, we should make an effort to clarify it. The problem I have is that the text in the specification quoted above no longer exists, and I'm struggling to work out if this behaviour is actually still unclear.

The first comment in this thread states:

The intention of the first PR seems to be that if anything at all happens, the marketplace SHOULD not retry but instead DELETE the resource. However, it seems like a reading of the spec implies that the broker MUST:

return 422 if it can't handle concurrency due to activity in progress (PR above changes to mutation, though, which is quite different?)

return 202 if provisioning is in progress (or 200 if it's prepared to wait for provisioning to finish)

From looking at the spec today, it looks like on a re-sent PUT, a service broker MUST return a 202 Accepted if the Service Instance is not yet fully provisioned (see this section).

I'm not sure that the Blocking Operations or Orphan Mitigation sections actually apply here.

What do you think?

ash2k · 2019-07-29T00:39:34Z

I'm sorry, I don't have an opinion on the spec. If nobody is concerned with the current wording, feel free to close the issue - it can always be reopened.

fmui assigned mattmcneeney Jul 23, 2019

mattmcneeney closed this as completed Jul 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validity of retrying PUT instance requests (and concurrent request behaviour) #301

Validity of retrying PUT instance requests (and concurrent request behaviour) #301

wryun commented Aug 30, 2017

nilebox commented Aug 30, 2017

duglin commented Aug 30, 2017

wryun commented Aug 30, 2017 •

edited

Loading

nilebox commented Aug 30, 2017 •

edited

Loading

mattmcneeney commented Jun 24, 2019

wryun commented Jun 27, 2019

mattmcneeney commented Jul 25, 2019

ash2k commented Jul 26, 2019

mattmcneeney commented Jul 26, 2019

ash2k commented Jul 29, 2019

Validity of retrying PUT instance requests (and concurrent request behaviour) #301

Validity of retrying PUT instance requests (and concurrent request behaviour) #301

Comments

wryun commented Aug 30, 2017

nilebox commented Aug 30, 2017

duglin commented Aug 30, 2017

wryun commented Aug 30, 2017 • edited Loading

nilebox commented Aug 30, 2017 • edited Loading

mattmcneeney commented Jun 24, 2019

wryun commented Jun 27, 2019

mattmcneeney commented Jul 25, 2019

ash2k commented Jul 26, 2019

mattmcneeney commented Jul 26, 2019

ash2k commented Jul 29, 2019

wryun commented Aug 30, 2017 •

edited

Loading

nilebox commented Aug 30, 2017 •

edited

Loading