Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validity of retrying PUT instance requests (and concurrent request behaviour) #301

Closed
wryun opened this issue Aug 30, 2017 · 10 comments
Closed
Assignees

Comments

@wryun
Copy link

wryun commented Aug 30, 2017

I'm trying to understand the exact meaning of the spec around retrying/concurrency. This touches on two current PRs, #300 and #254.

The intention of the first PR seems to be that if anything at all happens, the marketplace SHOULD not retry but instead DELETE the resource. However, it seems like a reading of the spec implies that the broker MUST:

  • return 422 if it can't handle concurrency due to activity in progress (PR above changes to mutation, though, which is quite different?)
  • return 202 if provisioning is in progress (or 200 if it's prepared to wait for provisioning to finish)

This makes it seem like the marketplace could elect NOT to DELETE the resource, but instead retry the PUT on certain error responses (notably broker timeouts, but potentially on 5xx's and 408s). And in fact, it's not clear why you would want to bother deleting the resource even though you SHOULD, since it seems more efficient to retry and the broker MUST support that - though admittedly on a repeated broker 500 code it probably makes sense to DELETE and try a completely new request.

From discussing this with @nilebox, he has said that this is not the expected reading of the spec, and that brokers must be assumed to be less intelligent than this. Possible clarifications:

  • clearly identify if brokers that don't support concurrency are allowed to behave in an undefined way OR must respond with 422
  • move the additional handling of 422 described in 'Blocking Operations' up to the table of response codes

Alternatively, if retries are a desired behaviour, it may make sense to have a 500 code which indicates that the request is retryable (or conversely, a 500 which indicates it's a terminal failure).

@nilebox
Copy link

nilebox commented Aug 30, 2017

To be clear, the issue is about clarifying the blocking operations section (with related PRs mentioned above), and specifically two sentences:

Brokers do not have to support concurrent requests that act on the same set of resources.

Based on our prior discussions about support for stateless broker, I read this as: "broker is allowed to assume that no concurrent provisioning requests for the same instanceId will ever be executed, period". So, if the concurrent requests do occur, the behavior of the broker is unpredictable.

If a broker receives a request that it is not able to process due to other activity being done on that resource then the broker MUST reject the request with an HTTP 422 Unprocessable Entity.

And I am reading this as "but broker MAY detect concurrent requests and return specific errors". Even though there is a "MUST" requirement in the sentence, it is applied only "if not able to process" (very vague phrase), hence I am treating it as a "MAY" requirement. And this sounds again like an edge case trying to mitigate potential bugs in marketplace, as previously mentioned by @shalako in #291 (comment)

I agree with @wryun that this section is ambiguous, and needs to be clarified, specifically:

  • is marketplace allowed to execute concurrent requests to the broker?
  • does broker have to ("MUST") detect concurrent requests (and if it can't handle them, throw correct errors)?

@duglin
Copy link
Contributor

duglin commented Aug 30, 2017

My 2 cents:

is marketplace allowed to execute concurrent requests to the broker?

yes.

does broker have to ("MUST") detect concurrent requests (and if it can't handle them, throw correct errors)?

No it does not need to detect concurrent requests. However, if it does and doesn't like them, then it MUST use the 422 error code instead of something else. If it doesn't detect them AND it doesn't behave well if someone does do it, then IMO its a bug in the broker.

#300 is not about retries at all - its just about how to deal with concurrent requests.

@wryun
Copy link
Author

wryun commented Aug 30, 2017

@duglin the reason why I say #300 is connected to retries is that, AFAIK, the main thing that would prohibit retry behaviour in the event of timeout is that concurrent requests are not supported by the broker.

In this case, the change of 'act on' to 'mutate' could suggest that it's ok to make retries to any broker as they would not change the parameters (i.e. that should return a 20x on PUT, but PATCH is not supported). I'm probably taking an odd interpretation of mutate here, but to me 'act on' seems clearer if the intent is to allow the broker to support no concurrent requests.

EDIT: to give some context, our internal marketplace currently retries PUTs even though this is against the SHOULD behaviour of the spec. Also, this is related to an older issue from @nilebox #182 which I hadn't seen before. It's confusing us, at least!

@nilebox
Copy link

nilebox commented Aug 30, 2017

@duglin I do like your interpretation more. It's not the same as "Brokers do not have to support concurrent requests" though. According to your interpretation, brokers MUST properly handle concurrent requests (whether through idempotency, or by producing the same side effects as the original request, or by detecting and rejecting unprocessable concurrent requests).

If your interpretation is correct then IMO the text should be rewritten to something like:

In some situations, marketplace MAY issue concurrent requests (with identical parameters) to brokers, and brokers MUST correctly handle them. If a broker receives a request that it is not able to process due to other activity being done on that resource then the broker MUST reject the duplicate request with an HTTP 422 Unprocessable Entity.

@mattmcneeney
Copy link
Contributor

@wryun I'm trying to work out if this is still a problem for anyone. I know we did some work to clarify some retry/concurrency behaviour.

@wryun
Copy link
Author

wryun commented Jun 27, 2019

I'm no longer working on this. @ash2k is probably the best person to comment.

@mattmcneeney
Copy link
Contributor

@ash2k do you have any thoughts on this? We will close this on the next call if we don't hear anything. Thanks!

@ash2k
Copy link

ash2k commented Jul 26, 2019

It's not a problem for me/us anymore, but if the spec is not clear/explicit about a situation, why not explicitly describe the intended behavior? If everybody on the call agree about how it should work - great, put it into the spec :) If people have different opinions, then it's even more important to clarify the spec, isn't it? Just closing the issue is not helping anyone who wants to understand it better.

@mattmcneeney
Copy link
Contributor

Absolutely - I totally agree that if there is undefined or unclear behaviour, we should make an effort to clarify it. The problem I have is that the text in the specification quoted above no longer exists, and I'm struggling to work out if this behaviour is actually still unclear.

The first comment in this thread states:

The intention of the first PR seems to be that if anything at all happens, the marketplace SHOULD not retry but instead DELETE the resource. However, it seems like a reading of the spec implies that the broker MUST:

  • return 422 if it can't handle concurrency due to activity in progress (PR above changes to mutation, though, which is quite different?)
  • return 202 if provisioning is in progress (or 200 if it's prepared to wait for provisioning to finish)

From looking at the spec today, it looks like on a re-sent PUT, a service broker MUST return a 202 Accepted if the Service Instance is not yet fully provisioned (see this section).

I'm not sure that the Blocking Operations or Orphan Mitigation sections actually apply here.

What do you think?

@ash2k
Copy link

ash2k commented Jul 29, 2019

I'm sorry, I don't have an opinion on the spec. If nobody is concerned with the current wording, feel free to close the issue - it can always be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants