feat: Add dependency filter #555

karlkfi · 2022-02-25T22:45:14Z

Pass TaskContext into TaskBuilder.Build
Combine dependency graph for apply and prune objects.
This is required to catch dependencies that would have been deleted.
Replace graph.SortObjs into DependencyGraph + Sort + HydrateSetList
Replace graph.ReverseSortObjs with ReverseSetList to perform on the
combined (apply + prune) set list.
Add planned pending applies and prune to the InventoryManager
before executing the task queue.
This allows the DependencyFilter to validate against the planned
actuation strategy of objects that haven't been applied/pruned yet.
Add the dependency graph to the TaskContext, for the
DependencyFilter to use.
This can be removed in the future if the filters are managed by the
solver.
Make Graph.Sort non-destructive, so the graph can be re-used by the
DependencyFilter.
Add Graph.EdgesFrom and EdgesTo for the DependencyFilter to use.
This requires storing the reverse edge list.
Add an e2e test for the DependencyFilter
Add an e2e test for the LocalNamespaceFilter

Fixes #526
Fixes #528

Replaces #523

seans3

This is the wrong approach. The approach should be three separate efforts to address the three completely different issues.

#528 is the result of overzealous client-side validation. As I mentioned, we should not be doing client-side validation. The solution is to rollback the overzealous validation.

#554 should be a simple fix with a better test to prove the fix.

For #526, it is not clear this is a bug as much as an optional feature. If we are solving this, we are not going to do it a couple days before a release without tests.

/hold

mortent

One small comment, but I think this approach looks fine. As for the concerns raised by @seans3:

For #554 and #526: We need more tests and I think the PR also mentions that those will be added before this is merged. So I think we are all in agreement there.
I don't think the timing related to a release matters. We have to trust that the combination of code reviews and tests are sufficient to catch most bugs, and when they don't (which will happen from time to time), we have to add better tests. It is unclear to me if waiting would make it safer as we don't have any clients (to my knowledge) who are running unreleased code. We can leverage the tests in https://github.com/GoogleContainerTools/kpt to do an additional check before we make a release. I have done that in the past, but it not something we have automated.

For #526 specifically, I think that is a bug since it seems strange to handle dependencies, but also ignore them in the face of failures. Skipping them seems to align with the general approach of the library, where we continue with as much as possible when something fails, but also make it safe to retry to "make things right" after a failure.

I think we are all in agreement that we want to avoid as much client-side validation as possible. But in this case, we have always restricted dependencies to only include resources within the apply set. This just changes how we report the error.

mortent · 2022-02-25T23:02:21Z

pkg/apply/filter/dependency-filter.go

+// DependencyFilter implements ValidationFilter interface to determine if an
+// object can be applied or deleted based on the status of it's dependencies.
+type DependencyFilter struct {
+	TaskContext *taskrunner.TaskContext


Maybe not something we need to address in this PR, but it feels like the TaskContext should be passed in as a parameter to the Filter function rather than being a property on the struct.

I agree, but was trying to limit the amount of unrelated changes. That can be done in a separate PR.

karlkfi · 2022-02-26T01:12:37Z

For #526, it is not clear this is a bug as much as an optional feature. If we are solving this, we are not going to do it a couple days before a release without tests.

IMO we should not launch depends-on in Config Sync without this bug fixed, and it was probably a mistake to launch the feature in kpt with this bug in the first place. Prior to this PR, depends-on was only half implemented (sorted ordering, but no enforcement).

karlkfi · 2022-02-26T01:15:31Z

#528 is the result of overzealous client-side validation. As I mentioned, we should not be doing client-side validation. The solution is to rollback the overzealous validation.

For CRDs this is probably fine, because there is a service-side error that prevents deletion of CRDs with CRs in use.

For namespaces tho, this is catastrophic, because namespace deletion causes garbage collection of everything else in the namespace.

For all other explicit dependencies there is no server-side dependency enforcement. So the client-side enforcement cannot be rolled back without completely removing the dependency ordering feature.

karlkfi · 2022-02-26T01:17:11Z

#554 should be a simple fix with a better test to prove the fix.

This is fair. I can extract this to another PR.

karlkfi · 2022-02-26T02:12:34Z

Pulled out fix for #554: #556

karlkfi · 2022-02-26T23:02:04Z

#528 is the result of overzealous client-side validation. As I mentioned, we should not be doing client-side validation. The solution is to rollback the overzealous validation.

@seans3 Removing the validation for external dependencies won't fix this bug. It would stop the confusing validation error, but it wouldn't fix the behavior. It wouldn't fail server-side, because there's no server-side impl. And without the DependencyFilter, neither the apply nor the prune would be skipped. So the apply would go through and then its dependency would be deleted. And we would lose the validation error for otherwise legitimately external dependencies.

Combining the graph for both applies and prunes allows for detection of implicit dependencies (CRDs & Namespaces) which would have otherwise been missed. And for explicit dependencies (depends-on & apply-time-mutation), it detects them as "internal" dependencies instead of "external". Then both implicit and explicit deps can be checked by the DependencyFilter. And to keep the behavior otherwise the same, the graph only needs to be sorted once and then filtered down to include either the apply objects or prune objects.

I was planning to add more unit tests, but the e2e tests and applier/destroyer tests already show that the existing behavior is otherwise preserved, and I wanted to make sure the approach was otherwise acceptable before writing more tests.

pkg/apply/filter/dependency-filter.go

pkg/apply/solver/solver.go

pkg/object/graph/graph.go

karlkfi · 2022-02-28T20:40:08Z

/hold

- Pass TaskContext into TaskBuilder.Build - Combine dependency graph for apply and prune objects. This is required to catch dependencies that would have been deleted. - Replace graph.SortObjs into DependencyGraph + Sort + HydrateSetList - Replace graph.ReverseSortObjs with ReverseSetList to perform on the combined (apply + prune) set list. - Add planned pending applies and prune to the InventoryManager before executing the task queue. This allows the DependencyFilter to validate against the planned actuation strategy of objects that haven't been applied/pruned yet. - Add the dependency graph to the TaskContext, for the DependencyFilter to use. This can be removed in the future if the filters are managed by the solver. - Make Graph.Sort non-destructive, so the graph can be re-used by the DependencyFilter. - Add Graph.EdgesFrom and EdgesTo for the DependencyFilter to use. This requires storing the reverse edge list. - Add an e2e test for the DependencyFilter - Add an e2e test for the LocalNamespaceFilter Fixes kubernetes-sigs#526 Fixes kubernetes-sigs#528

karlkfi · 2022-03-01T01:49:44Z

/unhold

Added unit tests:

DependencyFilter.Filter
solver.Build (apply, prune, apply & prune, inventory status)
graph.ReverseSetList
graph.HydrateSetList
graph.DependencyGraph
Graph.Dependencies
Graph.Dependents
Graph.Sort (non-destructive)

Discussed approach with @seans3 offline. His primary complaint was that the dependency ordering was never intended to be a guarantee, just an expedient to produce fewer errors. However, while this may have been true when dependencies were first added, there are a number of use cases that have been discovered that are not solved by just ordering, which are better solved by an ordering guarantee. Most of these use cases come down to improving UX such that the error is obvious and once fixed in the local config, a re-apply is sufficient to resolve the problem.

One example worth calling out is that this helps expedite deployment of chains of dependencies that would otherwise be applied all up front and fall into retry-back-off, causing end-to-end reconciliation to take orders of magnitude longer to complete compared to waiting to apply resources until their dependencies are reconciled. This is especially true for infrastructure resources which take a notoriously long time to reconcile (like GKE clusters). Plus, some retry loops (like Jobs) have a finite number of retries configured before failing, after which a reapply of the initial spec is insufficient.

seans3

/approve

The tests look good. Thanks for all the quick work. I will leave the LGTM to Jingfang.

k8s-ci-robot · 2022-03-01T01:55:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: karlkfi, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [seans3]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Liujingfang1 · 2022-03-01T03:27:24Z

Thanks for adding the unit tests.

/lgtm

karlkfi requested review from mortent, seans3 and Liujingfang1 February 25, 2022 22:45

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 25, 2022

karlkfi mentioned this pull request Feb 25, 2022

[WIP] fix: skip if dependencies not reconciled #523

Closed

k8s-ci-robot requested review from ash2k and haiyanmeng February 25, 2022 22:45

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 25, 2022

seans3 reviewed Feb 25, 2022

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 25, 2022

mortent reviewed Feb 26, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2022

karlkfi force-pushed the karl-dep-filter2 branch from 308cf9b to 01bf108 Compare February 26, 2022 22:16

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2022

Liujingfang1 reviewed Feb 28, 2022

View reviewed changes

pkg/apply/filter/dependency-filter.go Outdated Show resolved Hide resolved

pkg/apply/filter/dependency-filter.go Outdated Show resolved Hide resolved

pkg/apply/solver/solver.go Show resolved Hide resolved

pkg/object/graph/graph.go Show resolved Hide resolved

karlkfi force-pushed the karl-dep-filter2 branch from 01bf108 to 84d8cdf Compare February 28, 2022 18:44

karlkfi force-pushed the karl-dep-filter2 branch from 84d8cdf to 4599e4f Compare March 1, 2022 01:31

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2022

karlkfi requested review from Liujingfang1 and seans3 March 1, 2022 01:52

seans3 approved these changes Mar 1, 2022

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 1, 2022

k8s-ci-robot assigned Liujingfang1 Mar 1, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 1, 2022

k8s-ci-robot merged commit 5c6134a into kubernetes-sigs:master Mar 1, 2022

karlkfi deleted the karl-dep-filter2 branch March 1, 2022 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add dependency filter #555

feat: Add dependency filter #555

karlkfi commented Feb 25, 2022 •

edited

Loading

seans3 left a comment

mortent left a comment

mortent Feb 25, 2022

karlkfi Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 28, 2022

karlkfi commented Mar 1, 2022

seans3 left a comment

k8s-ci-robot commented Mar 1, 2022

Liujingfang1 commented Mar 1, 2022

feat: Add dependency filter #555

feat: Add dependency filter #555

Conversation

karlkfi commented Feb 25, 2022 • edited Loading

seans3 left a comment

Choose a reason for hiding this comment

mortent left a comment

Choose a reason for hiding this comment

mortent Feb 25, 2022

Choose a reason for hiding this comment

karlkfi Feb 26, 2022

Choose a reason for hiding this comment

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 26, 2022

karlkfi commented Feb 28, 2022

karlkfi commented Mar 1, 2022

seans3 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 1, 2022

Liujingfang1 commented Mar 1, 2022

karlkfi commented Feb 25, 2022 •

edited

Loading