Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a kubernetes style resource async class #2716

Merged

Conversation

chrisst
Copy link
Contributor

@chrisst chrisst commented Nov 19, 2019

Refactor the existing async Operation to allow for multiple classes to
implement the async feature set.
Change resource.erb to be aware of multiple async types
Add a handwritten operation for kubernetes style resources
Using CloudRun as the initial resource that will support this

fixes hashicorp/terraform-provider-google#4091

Cloudrun Service and domainMapping will require this new style of polling in order to correctly register success of the create/update operations. It looks like for almost all resources of this shape we can rely on the ready status to be true. However DomainMapping has some funky edge cases where the resource can be stuck in ready status UNKNOWN while a manual step of verifying the domain happens. However after a discussion of trying to introduce a new pattern into the existing async logic with @rambleraptor and we came to the conclusion that until there are more than 2 resources that have this pattern it's not worth trying to make the existing async object work with this new style of pattern. So this is a minimal implementation that relies on a handwritten Operation until we determine it's worth pursuing a fully generated version.

Release Note Template for Downstream PRs (will be copied)

`cloudrun`: Wait for the cloudrun resource to reach a ready state before returning success.

@chrisst
Copy link
Contributor Author

chrisst commented Nov 19, 2019

Apologies for the OpAsync refactor not being pulled into its own PR.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician, I work on Magic Modules.
I see that this PR has already had some downstream PRs generated. Any open downstreams are already updated to your most recent commit, f3e9965.

Pull request statuses

No diff detected in terraform-google-conversion.

New Pull Requests

I built this PR into one or more new PRs on other repositories, and when those are closed, this PR will also be merged and closed.
depends: hashicorp/terraform-provider-google-beta#1409
depends: hashicorp/terraform-provider-google#4945
depends: ansible-collections/google.cloud#90
depends: modular-magician/inspec-gcp#253

@chrisst
Copy link
Contributor Author

chrisst commented Nov 19, 2019

j/k - I've pulled the refactor out into a separate PR so we can keep things a bit more sane: #2718

Feel free to limit review/discussion of this PR to the k8sOperation and templating changes.

Copy link
Member

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we've seen two resources with the same behaviour, they're both within a single product. I'd be a lot happier with some custom code to wait on Cloud Run's async pattern since it's a 1-off at the service level. Additionally, that'll map into MMv2's Traits system a lot more nicely. We'll effectively be able to reuse the custom code wholesale, I bet.

Additionally, I'm not sure the assumptions made here map to K8S (see my reasoning in the comments) and not just knative's implementation details. Additionally, even though MM calls the operation-handling code Async, it's really just our representation of aip.dev Operations.

templates/async.yaml.erb Show resolved Hide resolved
templates/terraform/resource.erb Outdated Show resolved Hide resolved
templates/terraform/resource.erb Outdated Show resolved Hide resolved
templates/terraform/resource.erb Outdated Show resolved Hide resolved
third_party/terraform/utils/k8s_operation.go Outdated Show resolved Hide resolved
third_party/terraform/utils/k8s_operation.go Outdated Show resolved Hide resolved
}

for _, condition := range w.Op.Status.Conditions {
if condition.Type == "Ready" && condition.Status == "False" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ready: false" is pretty unintuitive that it's an error state. Can you document why that's the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this Ready/False block become configurable.

There could be a list of error() conditions and each error condition could refer to a block of code like this. That way, we make it clear that Ready/False is used for this resource specifically (this is for Cloud Run??) and we open this up to be used for slightly different variations down the road.

third_party/terraform/utils/k8s_operation.go Outdated Show resolved Hide resolved
@rambleraptor
Copy link
Contributor

I just approved the refactor PR. Can you merge that in (which should clean up this PR) and then I'll look over this PR?

(why, oh why, doesn't GitHub support dependent PRs???)

@rileykarson
Copy link
Member

@rambleraptor: https://github.com/GoogleCloudPlatform/magic-modules/compare/5b4bf68a417bc331ac6e70bdf16569c05c4d9268..f3e9965ee26b365b2a7e815f644783aab0b10199 will compare the two (since I asked in the other PR that we don't merge it quite yet)

Comments will need to be made here and not on that comparison, though.

templates/async.yaml.erb Show resolved Hide resolved
third_party/terraform/utils/k8s_operation.go Outdated Show resolved Hide resolved
}

for _, condition := range w.Op.Status.Conditions {
if condition.Type == "Ready" && condition.Status == "False" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this Ready/False block become configurable.

There could be a list of error() conditions and each error condition could refer to a block of code like this. That way, we make it clear that Ready/False is used for this resource specifically (this is for Cloud Run??) and we open this up to be used for slightly different variations down the road.

@rambleraptor
Copy link
Contributor

Riley is more of an expert on the TF code, so I'll let him continue review on that.

My tl;dr is that our async right now is very, very configurable and async operations aren't as standardized as we'd like them to be. Because these k8s operations were diverge over time, I'd like to maximize the amount of configuration we place into them to avoid having to touch this code as much as possible.

@rileykarson
Copy link
Member

My tl;dr is that our async right now is very, very configurable and async operations aren't as standardized as we'd like them to be.

LRO Operation objects are very standardised. Earlier revisions share nearly the same schema with a few exceptions (GCE, SQL, DNS- among our earliest services). Newer operations are all verbatim implementations of the aip.dev spec to my knowledge, and should require 0 configuration.

Nearly none of the Async fields ever get modified, and I'm not sure why many of them are there.

I'd like to maximize the amount of configuration we place into them to avoid having to touch this code as much as possible.

My concern is that we're muddling two different implementations of asynchronicity. Async is a representation of standard LROs today, and these changes add resource-level async to it while they could be separate definitions. Not only that, but a specific implementation- I can think of a handful of other resources that this doesn't cover like GKE and some of the LB-related resources.

GKE implements LROs and a (K8S-like) status block, and this would not cover that.

@chrisst
Copy link
Contributor Author

chrisst commented Nov 21, 2019

Spoke with riley/alex about this in person and have come to the conclusion that: Operation polling and Resource polling are not mutually exclusive. It probably still makes sense to use Async class inheritance to represent the different polling operations but divorcing the resource polling from the operation polling can simplify the implementation of the resource polling feature. Additionally it will allow us to poll for the Operation to finish and also poll in addition to poll for a field status. This is apparently something that could become handy for GKE where the cluster is not always in a state to be used immediately after the Operation has succeeded.

@chrisst chrisst changed the title Add a kubernetes style resource async class [WIP] Add a kubernetes style resource async class Nov 21, 2019
@chrisst chrisst changed the title [WIP] Add a kubernetes style resource async class Add a kubernetes style resource async class Nov 26, 2019
@chrisst chrisst force-pushed the cloudrun-wait-operation branch from 7f802f3 to e5f8b6e Compare November 26, 2019 21:31
@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician, I work on Magic Modules.
I see that this PR has already had some downstream PRs generated. Any open downstreams are already updated to your most recent commit, e5f8b6e.

Pull request statuses

terraform-provider-google-beta already has an open PR.
No diff detected in terraform-google-conversion.
terraform-provider-google already has an open PR.
Ansible already has an open PR.
InSpec already has an open PR.

New Pull Requests

I didn't open any new pull requests because of this PR.

api/async.rb Show resolved Hide resolved
api/async.rb Show resolved Hide resolved
google/yaml_validator.rb Show resolved Hide resolved
templates/terraform/resource.erb Outdated Show resolved Hide resolved
api/async.rb Show resolved Hide resolved
api/async.rb Show resolved Hide resolved
products/cloudrun/api.yaml Outdated Show resolved Hide resolved
Copy link
Member

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm still not in love with treating these as mutually exclusive given that we've seen as many examples of resources with both pollable states and LROs as we've seen resources with pollable states, but I guess this stops us from needing to change the bad design of timeouts being nested inside the operation block for now.

@chrisst
Copy link
Contributor Author

chrisst commented Nov 27, 2019

fyi they aren't mutually exclusive as of 606a3dee44f34c03df66a87774ba2abffe35b10c There isn't any code testing that path right now though so I suspect it won't work perfectly until there is.

@chrisst chrisst force-pushed the cloudrun-wait-operation branch from 606a3de to 18b3cbd Compare November 27, 2019 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloud Run Services should poll for ready state
5 participants