Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA: prepare for DRA integration #7529

Merged
merged 8 commits into from
Dec 9, 2024

Conversation

towca
Copy link
Collaborator

@towca towca commented Nov 25, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler.

This PR implements the first bits of the DRA integration in CA:

  • A DRA flag guard is added to help with preserving old behavior until we're confident in the new logic.
  • The scheduler framework integration is extended to cover the new SharedDRAManager interface introduced in DRA: allow Cluster Autoscaler to integrate with DRA scheduler plugin kubernetes#127904.
  • The ClusterSnapshot interface is extended to cover DRA objects.
  • SchedulerPluginRunner is extended so that it can run the Reserve phase of the framework.

Which issue(s) this PR fixes:

The CA/DRA integration is tracked in kubernetes/kubernetes#118612, this is just part of the implementation.

Special notes for your reviewer:

This is mostly a refactor and some preparations for the actual DRA logic in the next PR. All new logic is behind a DRA flag guard, the intention is to have no change in behavior if the flag is disabled.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

https://github.com/kubernetes/enhancements/blob/9de7f62e16fc5c1ea3bd40689487c9edc7fa5057/keps/sig-node/4381-dra-structured-parameters/README.md

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 25, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: towca

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 25, 2024
@towca
Copy link
Collaborator Author

towca commented Nov 25, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 25, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 25, 2024
@towca towca mentioned this pull request Nov 25, 2024
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2024
towca added a commit to towca/autoscaler that referenced this pull request Nov 28, 2024
towca added a commit to towca/autoscaler that referenced this pull request Dec 4, 2024
This flag will be used to guard any behavior-changing logic needed for
DRA, to make it clear that existing behavior for non-DRA use-cases is
preserved.
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 4, 2024
towca added a commit to towca/autoscaler that referenced this pull request Dec 4, 2024
Copy link
Contributor

@BigDarkClown BigDarkClown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, with some naming/clarification/typo comments and some reservation, as I am not expert on DRA and might miss some DRA-specific gaps. For full LGTM it would be nice to get feedback from more DRA-knowledgable folks.

},
"delta": func() (clustersnapshot.ClusterSnapshot, error) {
fwHandle, err := framework.NewTestFrameworkHandle()
if err != nil {
return nil, err
}
return NewPredicateSnapshot(base.NewDeltaSnapshotBase(), fwHandle), nil
return NewPredicateSnapshot(base.NewDeltaSnapshotBase(), fwHandle, false), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any particular reasons why some tests are initialized with true and some with false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I missed these thanks. Changed all to true, it seems better to test more code paths in unit tests.

schedulerframeworkruntime.WithSnapshotSharedLister(sharedLister),
}

if draEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be a bit more readable to place schedProfile := before if draEnabled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

clusterSnapshot := store.NewBasicSnapshotStore()
err := clusterSnapshot.AddNodeInfo(framework.NewTestNodeInfo(node1))
// with default predicate checker
defaultPluginnRunner, clusterSnapshot, err := newTestPluginRunnerAndSnapshot(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: defaultPluginnRunner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -305,25 +302,29 @@ func TestDebugInfo(t *testing.T) {

customConfig, err := scheduler.ConfigFromPath(customConfigFile)
assert.NoError(t, err)
customPluginRunner, err := newTestPluginRunner(clusterSnapshot, customConfig)
customPluginnRunner, clusterSnapshot, err := newTestPluginRunnerAndSnapshot(customConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: customPluginnRunner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

draEnabled: draEnabled,
}
snapshot.pluginRunner = NewSchedulerPluginRunner(fwHandle, snapshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hurts, but okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, any better ideas 😅?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If golang is a flat earth, I feel like we've fallen off the edge

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isn't actually a Golang issue, more of a code structure one. I added a comment explaining why it's there and left a TODO for improving it (I think a simple refactor would remove the PluginRunner dependency on PredicateSnapshot). In the interest of time and the 1.32 release I'd prefer not to tackle this one in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think our object composition has resulted in this internationally recognized metaphor:

image

)

// Snapshot contains a snapshot of all DRA objects taken at a ~single point in time.
type Snapshot struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some reservations to the naming: We will have ClusterSnapshot that has its ClusterSnapshotStore, which in turn has its drasnapshot.Snapshot. Do we want to use Snapshot here? Maybe SnapshotStore would be better, to make it consistent with our agreed indication of storing data? Maybe even full DraSnapshotStore, although this name is ugly and I don't like it (it's probably also not consistent with best practises).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on the other hand in the context of the dynamicresources package the name Snapshot seems the most accurate to me (since we don't have the 2 layers or any interfaces here - why "Store"?), and it's very simple. So I guess it's a tradeoff between readability in clustersnapshot and dynamicresources? I'm leaning more towards keeping the simple dynamicresources.Snapshot name, but it's not a strong opinion. @jackfrancis WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note, I don't have a very hard preference here, I am reasonably okay with merging it as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd prefer to keep the current name, if only to reduce the rebasing because of time constraints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards Snapshot as well (w/ the namespacing of the dynamicresources package providing some context to disambiguate between other snapshot things in the source surface area)

@@ -30,6 +31,19 @@ import (
type ClusterSnapshot interface {
ClusterSnapshotStore

// AddNodeInfo adds the given NodeInfo to the snapshot without checking scheduler predicates. The Node and the Pods are added,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol nevermind I didn't read far enough :) I see they have moved

Comment on lines 82 to 88
func profileWithDraPlugin(profile *schedulerconfig.KubeSchedulerProfile) *schedulerconfig.KubeSchedulerProfile {
result := profile.DeepCopy()
addPluginIfNotPresent(result.Plugins.PreFilter, draplugin.Name)
addPluginIfNotPresent(result.Plugins.Filter, draplugin.Name)
addPluginIfNotPresent(result.Plugins.Reserve, draplugin.Name)
return result
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me casually "hmm, could we refactor this for more slice-y expressiveness here, or make addPluginIfNotPresent accept a slice of schedulerconfig.PluginSets? And then it occurred to me that I wouldn't dare do that without UT. So here's a basic suggestion here:

Suggested change
func profileWithDraPlugin(profile *schedulerconfig.KubeSchedulerProfile) *schedulerconfig.KubeSchedulerProfile {
result := profile.DeepCopy()
addPluginIfNotPresent(result.Plugins.PreFilter, draplugin.Name)
addPluginIfNotPresent(result.Plugins.Filter, draplugin.Name)
addPluginIfNotPresent(result.Plugins.Reserve, draplugin.Name)
return result
}
func profileWithDraPlugin(profile *schedulerconfig.KubeSchedulerProfile) *schedulerconfig.KubeSchedulerProfile {
result := profile.DeepCopy()
if result != nil && result.Plugins != nil {
for _, pluginSet := range []schedulerconfig.PluginSet{
result.Plugins.PreFilter,
result.Plugins.Filter,
result.Plugins.Reserve,
} {
addPluginIfNotPresent(pluginSet, draplugin.Name)
}
}
return result
}

And here's a quick and dirty handle_test.go that will protect the existing implementation (this includes my nil handling additions) so in case someone else thinks to refactor later the UT will ensure that the outcomes don't change in unexpected ways.

/*
Copyright 2024 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package framework

import (
	"testing"

	"github.com/google/go-cmp/cmp"

	draplugin "k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources"

	schedulerconfig "k8s.io/kubernetes/pkg/scheduler/apis/config"
	schedulerconfiglatest "k8s.io/kubernetes/pkg/scheduler/apis/config/latest"
)

// TestProfileWithDraPlugin tests that the profileWithDraPlugin function correctly adds the DRA plugin to the profile.
func TestProfileWithDraPlugin(t *testing.T) {
	schedConfig, err := schedulerconfiglatest.Default()
	cmp.Diff(err, nil)
	cmp.Diff(len(schedConfig.Profiles) > 0, true)
	schedProfile := &schedConfig.Profiles[0]
	tests := []struct {
		name               string
		profile            *schedulerconfig.KubeSchedulerProfile
		expectedDRAPlugins bool
	}{
		{
			name:               "* to struct literal input",
			profile:            &schedulerconfig.KubeSchedulerProfile{},
			expectedDRAPlugins: false,
		},
		{
			name:               "nil input",
			profile:            nil,
			expectedDRAPlugins: false,
		},
		{
			name:               "default",
			profile:            schedProfile,
			expectedDRAPlugins: true,
		},
	}

	for _, test := range tests {
		test := test
		t.Run(test.name, func(t *testing.T) {
			actual := profileWithDraPlugin(test.profile)
			found := map[string]bool{
				"PreFilter": false,
				"Filter":    false,
				"Reserve":   false,
			}
			if test.expectedDRAPlugins {
				for _, enabledPreFilterPlugins := range actual.Plugins.PreFilter.Enabled {
					if enabledPreFilterPlugins.Name == draplugin.Name {
						found["PreFilter"] = true
					}
				}
				for _, enabledFilterPlugins := range actual.Plugins.Filter.Enabled {
					if enabledFilterPlugins.Name == draplugin.Name {
						found["Filter"] = true
					}
				}
				for _, enabledReservePlugins := range actual.Plugins.Reserve.Enabled {
					if enabledReservePlugins.Name == draplugin.Name {
						found["Reserve"] = true
					}
				}
				cmp.Diff(found, map[string]bool{
					"PreFilter": true,
					"Filter":    true,
					"Reserve":   true,
				})
			}
		})
	}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, thanks for keeping me honest with the unit testing here. I tried your test and it turns out the implementation doesn't actually add the DRA plugin to the config at all 🤦 I assumed it does because the integration tests pass, but I checked and the DRA plugin is actually just present in the default config. So no modifications to the plugin sets should be needed here, I removed that part.

I'm pretty sure this was needed before 1.32, but maybe it was a red herring all along..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRA plugin is actually just present in the default config

this discovery isn't problematic, is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be, the plugin is only doing anything if the relevant DRA feature gate is enabled in the cluster. This is what I expected to happen all along - we shouldn't need to hack the plugin sets, the plugins should be enabled/disabled via feature gates without CA having to get involved.

Make SharedDRAManager a part of the ClusterSnapshotStore interface, and
implement dummy methods to satisfy the interface. Actual implementation
will come in later commits.

This is needed so that ClusterSnapshot can feed DRA objects to the DRA
scheduler plugin, and obtain ResourceClaim modifications back from it.

The integration is behind the DRA flag guard, this should be a no-op
if the flag is disabled.
towca added 3 commits December 9, 2024 17:14
This should be a no-op, as no DRA objects are passed yet.
…fying DRA objects

A new DRA Snapshot type is introduced, for now with just dummy methods
to be implemented in later commits. The new type is intended to hold all
DRA objects in the cluster.

ClusterSnapshotStore.SetClusterState() is extended to take the new DRA Snapshot in
addition to the existing parameters.

ClusterSnapshotStore.DraSnapshot() is added to retrieve the DRA snapshot set by
SetClusterState() back. This will be used by PredicateSnapshot to implement DRA
logic later.

This should be a no-op, as DraSnapshot() is never called, and no DRA
snapshot is passed to SetClusterState() yet.
Store the DRA snapshot inside the current internal data in
SetClusterState().

Retrieve the DRA snapshot from the current internal data in
DraSnapshot().

Clone the DRA snapshot whenever the internal data is cloned
during Fork(). This matches the forking logic that BasicSnapshotStore
uses, ensuring that the DRA object state is correctly
forked/commited/reverted during the corresponding ClusterSnapshot
operations.

This should be a no-op, as DraSnapshot() isn't called anywhere yet,
adn no DRA snapshot is passed to SetClusterState() yet.
towca added 2 commits December 9, 2024 17:38
All the NodeInfo methods have to take DRA into account, and the logic
for that will be the same for different ClusterSnapshotStore implementations.
Instead of duplicating the new logic in Basic and Delta, the methods
are moved to ClusterSnapshot and the logic will be implemented once in
PredicateSnapshot.

PredicateSnapshot will use the DRA Snapshot exposed by its ClusterSnapshotStore
to implement these methods. The DRA Snapshot has to be stored in the
ClusterSnapshotStore layer, as we need to be able to fork/commit/revert it.

Lower-level methods for adding/removing just the schedulerframework.NodeInfo
parts are added to ClusterSnapshotStore. PredicateSnapshot utilizes these methods
to implement AddNodeInfo and RemoveNodeInfo.

This should be a no-op, it's just a refactor.
RunReserveOnNode runs the Reserve phase of schedulerframework,
which is necessary to obtain ResourceClaim allocations computed
by the DRA scheduler plugin.

RunReserveOnNode isn't used anywhere yet, so this should be a no-op.
@jackfrancis
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 9, 2024
@towca
Copy link
Collaborator Author

towca commented Dec 9, 2024

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 9, 2024
@k8s-ci-robot k8s-ci-robot merged commit 37b3da4 into kubernetes:master Dec 9, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants