-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AzureSQL short term retention policies #1355
Add AzureSQL short term retention policies #1355
Conversation
21712ee
to
50c6570
Compare
50c6570
to
cacaf32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good but I think we should be calling Result on the policy futures coming back from CreateOrUpdate?
return nil, err | ||
} | ||
|
||
return &future, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to find the corresponding change in the call to AddLongTermRetention
but couldn't see it in the diff - I'm guessing because the calling code just ignores the response/future. Is there any problem with not calling .Response
on the future? I guess it's not waiting until the operation has finished without that. Shouldn't we be calling Result
on them so we can see any error from the operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Response
doesn't wait until the operation has finished either, as the implementation for Response
just does:
// Response returns the last HTTP response.
func (f Future) Response() *http.Response {
if f.pt == nil {
return nil
}
return f.pt.latestResponse()
}
and f.pt.latestResponse()
just says: // returns the cached HTTP response after a call to pollForStatus(), can be nil
So if for example it took more than a single polling interval, or we hit Response
too quickly after calling it would be nil
- so I'm pretty sure we were doing the wrong thing before as well and what I have here is effectively the same as what we had before. Doubly so because the result of Response()
was always being ignored anyway.
I see a few paths towards fixing this...
- Do the wait inline. This would work and it'd probably be fast as I don't think that these operations take a long time, but it has the disadvantage of breaking a "rule" of Kubernetes controllers which seems to be that you don't loop inside of the
Reconcile
function, you set a variable and let reconcile call you again (respecting the backoff, etc configured for the operator as a whole). - Set up a state machine infrastructure so that we can go through the required workflow for the DB. The workflow is something like: Create DB -> poll create DB LRO -> Set LongTermRetention -> wait for LongTermRetention LRO -> Set ShortTermRetention -> wait for ShortTermRetention LRO -> Set "complete".
- Similar to 2 above but rather than thinking of it as states (which I think Kubernetes doesn't really love), just do a delta comparison to each entity in Azure and set them one at a time. I think the workflow would be something like this:
a. Poll LRO if we have one - if not done just keep waiting, if done check result. Will need error handling for each type of LRO.
b. Does DB exist? If not, create and store LRO. If yes, compare with Spec. If different post and store LRO. If same continue.
c. Does LongTermRetention match spec? If no, post and store LRO. If yes continue.
d. Does ShortTermRetention match spec? If no, post and store LRO. If yes continue.
e. Set provisioned = true
I think the right thing to do is technically option 3, which also does away with the spec JSON hash checking in favor of an actual diff with Azure (which has the added benefit of allowing us to correct differences in Azure that Kubernetes didn't know about). The issue is that both 2 and 3 (that fix this issue the "right" way) are big undertakings that would effectively require full rewrites of the SQL DB reconciler. That introduces more risk and also is more duplicate effort given we're tracking towards a generic implementation that does exactly the above in the code generated path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively I think that this is a situation where yes things are not ideal, but this is far from the only place that's true in the operator currently and it's not clear to me that it's the right thing to build a bespoke infrastructure to solve this problem in ASO when we have a generic one coming, so it might just be best to live with it for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - yeah, if this is an existing issue then I think you're right that we can land this as is and fix it in the generic case. I think option 3 would be the right plan as well if we were doing that.
* Add AzureSQL short term retention policies
Closes #1302
What this PR does / why we need it:
Adds support for Azure SQL short term retention alongside existing support for long term retention on the AzureSQLDatabase object.
If applicable: