-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deploy: fix the owner reference kind to be rc #13996
Conversation
[test] |
@@ -435,7 +436,9 @@ func (c *DeploymentConfigController) cleanupOldDeployments(existingDeployments [ | |||
continue | |||
} | |||
|
|||
err := c.rn.ReplicationControllers(deployment.Namespace).Delete(deployment.Name, nil) | |||
err := c.rn.ReplicationControllers(deployment.Namespace).Delete(deployment.Name, &metav1.DeleteOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis check.
APIVersion: "v1", | ||
Kind: deployapi.Kind("DeploymentConfig").Kind, | ||
Kind: kapi.Kind("ReplicationController").Kind, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncdc any chance this is the source of your problem? The names wouldn't match, right?
Needs a test that verifies GC of deployers works. |
APIVersion: "v1", | ||
Kind: deployapi.Kind("DeploymentConfig").Kind, | ||
Kind: kapi.Kind("ReplicationController").Kind, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, @mfojtik this pod is more related to the DC than the RC and should be cleaned up when the DC is deleted, right? That would mean the kind was correct and the name and UID were wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be cleaned up because when you delete DC that should trigger deletion of RC and that should remove the deployer pod. Similarely when you set revisionHistory, this should automatically remove failed deployer pods (success deployer pods are removed automatically) as we delete old RC. correct @Kargakis ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfojtik correct, this pod is more related to the RC than the DC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to set finalizers in the RCs for enabling a cascading deletion from a DC to reach pods but that should be part of https://trello.com/c/B10xLQ6L/927-set-owner-refs-in-rcs-owned-by-dcs.
@mfojtik I think you can re-use the existing test for cleaning up older RCs. |
What I'd really like to see is a test that can reliably reproduce the transient deletion we saw in Jenkins that is then fixed by this PR (in whatever form it ends up in). |
@ncdc @Kargakis @deads2k does kube GC even check the "kind" ? it seems like the UUID is correct (it points to the RC). Also i'm not able to reproduce the failure Andy saw... I tried:
... unless the GC only checks the UUID and not the Kind. In that case the cause of the problem is somewhere else and this is just a cosmetic issue. EDIT: @Kargakis do we have logic in place when I delete the RC and the failed deployer pod is deleted automatically as well? (because that is what I see)... also reminds me that we should probably also set the owner reference for the hook pods... |
OK, so it seems like the GC honors the UID from the owner reference, which means this is not fixing #13995 but we should fix this anyway... i'll add owner reference to hook pods in this PR. |
Make sure that when we cleanup RCs based on revisionHistoryLimit, we delete with propagationPolicy set to Background. |
The failure that I originally found was from a CI run for #13886. It didn't look like that PR would cause the problem. Either way, I still haven't reasoned through how this could have happened. That's why I want a test case that can "slow down time" and try to be prescriptive about what happens and see if we can reproduce. |
@ncdc sure, I will work on that test, I think this PR is not related to that issue at this point. @deads2k @liggitt @enj will you guys have any objections granting the "deployer" SA ability to delete pods in the project? When I add ownerRef in a hook pod created by the deployer I'm hitting this error:
There are two options we discussed with @Kargakis:
I like 1) because it is cleaner, but less secure, but will take an advise from you :) EDIT: so i tried the controller approach and got
but I guess making the controller more privileged is better than making the SA more privileged? |
@@ -998,7 +998,7 @@ func (c *MasterConfig) DeploymentConfigInstantiateClients() (*osclient.Client, k | |||
|
|||
// DeploymentControllerClients returns the deployment controller client objects | |||
func (c *MasterConfig) DeploymentControllerClients() (*osclient.Client, kclientsetinternal.Interface, kclientsetexternal.Interface) { | |||
_, osClient, internalKubeClientset, externalKubeClientset, err := c.GetServiceAccountClients(bootstrappolicy.InfraDeploymentConfigControllerServiceAccountName) | |||
_, osClient, internalKubeClientset, externalKubeClientset, err := c.GetServiceAccountClients(bootstrappolicy.InfraDeploymentControllerServiceAccountName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k fyi
@@ -208,7 +208,7 @@ func init() { | |||
}, | |||
// DeploymentController.podClient | |||
{ | |||
Verbs: sets.NewString("get", "list", "create", "watch", "delete", "update"), | |||
Verbs: sets.NewString("get", "list", "watch"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k down here we have:
{
Verbs: sets.NewString("create", "update", "patch"),
Resources: sets.NewString("events"),
},
and this is copied in other policies... do we really need PATCH and UPDATE for events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe patching/updating events has to do with event compression?
@mfojtik I am missing some context on this PR but shouldn't you use a custom controller SA for this? @deads2k's #14033 is working towards that. Then you could grant that SA whatever permissions it needed. Maybe there are other issues I am missing in regards to this being inside a project... I cannot comment on if you actually have other viable solutions to the problem as a whole. |
We are already using a custom sa for the deployer, the question is "do we want to give pod deletion power to the deployer?" |
} | ||
|
||
encoder := kapi.Codecs.LegacyCodec(kapi.Registry.EnabledVersions()...) | ||
errors := []error{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var errors []error
} | ||
newPod, ok := objCopy.(*kapi.Pod) | ||
if !ok { | ||
errors = append(errors, fmt.Errorf("object %#+v is not a pod", objCopy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unlike that this error will ever be hit but if you want it, that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis for the sake of somebody will copy&paste this code somewhere else :-)
@Kargakis Ah OK. Not really sure on that one. |
oldPodBytes, err := runtime.Encode(encoder, pod) | ||
if err != nil { | ||
errors = append(errors, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed.
@deads2k rolled back the changes to ICT controller, will open a separate PR for that once this merge. |
@bparees @coreydaley pls check, I made some tweaks to accomodate the cleanup policy (i think the build controller now needs ability to delete builds and create build configs?) |
Evaluated for origin test up to c3d4af7 |
Evaluated for origin merge up to c3d4af7 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/744/) (Base Commit: a666b7c) (Image: devenv-rhel7_6247) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/1605/) (Base Commit: a666b7c) |
@mfojtik looks like this merged before we had a chance to look at it... why would the build controller need the ability to create buildconfigs? |
If you delete something with an owner ref, you have to be able to create owners. I think. |
No, I confused this with another rule: adding owner refs to things means you can also delete them. Not sure about this one. |
sorry it was not create but delete build and get bc (https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/controller/build.go#L59) |
ok yeah those make sense. |
@@ -428,7 +429,10 @@ func (c *DeploymentConfigController) cleanupOldDeployments(existingDeployments [ | |||
continue | |||
} | |||
|
|||
err := c.rn.ReplicationControllers(deployment.Namespace).Delete(deployment.Name, nil) | |||
policy := metav1.DeletePropagationBackground | |||
err := c.rn.ReplicationControllers(deployment.Namespace).Delete(deployment.Name, &metav1.DeleteOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tnozicka fyi this is related to DeploymentConfig GC
This PR will: