nrt: log: introduce and use "generation" for cache #798

ffromani · 2024-09-04T13:57:12Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Improves debuggability of the overreserve cache

We introduce the concept of "generation" which is an opaque monotonically increasing integer similar in spirit to the resourceVersion kube API field.
Every time the internal state of the cache is updated, which happens only in the resync loop by design, we increment the generation.

GetCachedNRTCopy will also return the generation of the data being used, so we have now an uniform way to correlate readers and writer of the cache, and we gain better visibility of the data being used.

Which issue(s) this PR fixes:

Fixes N/A

Special notes for your reviewer:

In a nutshell, this change is to enable better logging and to make it easier to correlate users of the cache (filter/score) with the reconciliation loop, so it's easier to infer which data was used/when was updated.

NONE

k8s-ci-robot · 2024-09-04T13:57:15Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

k8s-ci-robot · 2024-09-04T13:57:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2024-09-04T13:57:29Z

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name	Link
🔨 Latest commit	`0dae3ec`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/66deee147dc40b0008909ce4

ffromani · 2024-09-04T13:57:41Z

/test all

ffromani · 2024-09-04T17:28:31Z

/test all

ffromani · 2024-09-04T17:52:44Z

/test all

ffromani · 2024-09-08T15:11:31Z

/hold

ffromani · 2024-09-08T15:12:29Z

Howdy @Tal-or @PiotrProkop could you PTAL? Changes to discardreserved are trivial and minimal, but still I'd like a review

ffromani · 2024-09-09T07:28:19Z

pkg/noderesourcetopology/logging/logging.go

@@ -33,6 +31,7 @@ const (
 	KeyFlow          string = "flow"
 	KeyContainer     string = "container"
 	KeyContainerKind string = "kind"
+	KeyGeneration    string = "gen"


or generation? size of log entries is a concern too

I would prefer readability over compactness

fair enough, "generation" it is

PiotrProkop

some small nits, overall looks good!

PiotrProkop · 2024-09-09T10:13:59Z

pkg/noderesourcetopology/cache/overreserve.go

@@ -97,30 +98,33 @@ func NewOverReserve(ctx context.Context, lh logr.Logger, cfg *apiconfig.NodeReso
 	return obj, nil
 }

-func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool) {
+func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, CachedNRTInfo) {


Suggested change

func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, CachedNRTInfo) {

func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (nrt *topologyv1alpha2.NodeResourceTopology, info CachedNRTInfo) {

and then just change all return to return nrt, info ?

I'm not a fan of the named returns but for silly reasons, but this case seems interesting I'll check. Thanks!

OK, tried out. Looks very nice in the overreserved impl, but doesn't look so great in the discardreserved and passthrough impl, IMO leading to a slightly more convoluted code than we have now. I value the fact implementation across the implementations is as consistent as could be, so I think overall I'd like more the current approach and not using the named parameters just yet.

pkg/noderesourcetopology/cache/cache.go

In order to improve the debuggability of the overreserve cache, we would like to 1. correlate the cache state being used with 2. the actions the resync loop is doing 3. infer in a easier way the current state of the cache This change aims to improve points 1 and 2, while also trying to make 3 easier in the future. We introduce the concept of "generation" which is an opaque monotonically increasing integer similar in spirit to the `resourceVersion` kube API field. Every time the internal state of the cache is updated, which happens only in the resync loop by design, we increment the generation. GetCachedNRTCopy will also return the generation of the data being used, so we have now an uniform way to correlate readers and writer of the cache, and we gain better visibility of the data being used. With verbose enough logging, using the generation is now easier (albeit admittedly still clunky) to reconstruct the chain of changes which lead to a given cache state, which was much harder previously. Similarly, there's now a clear way to learn which cache state was used to make a given scheduling decision, which was much harder before. The changes involve mostly logging; to avoid proliferation of return values, however, a trivial refactoring is done in `GetCachedNRTCopy`. A beneficial side effect is much improved documentation of the return values. Signed-off-by: Francesco Romani <[email protected]>

PiotrProkop · 2024-09-11T11:49:11Z

/lgtm

ffromani · 2024-09-11T14:24:39Z

@Tal-or please un-hold if you like the PR

Tal-or · 2024-09-11T14:25:55Z

/hold cancel
Thanks LGTM

k8s-ci-robot requested review from PiotrProkop and Tal-or September 4, 2024 13:57

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 4, 2024

ffromani force-pushed the nrt-log-generation branch from ab78a02 to 344741b Compare September 4, 2024 17:27

ffromani changed the title ~~WIP: nrt: log: introduce and use "generation" for cache~~ nrt: log: introduce and use "generation" for cache Sep 4, 2024

ffromani force-pushed the nrt-log-generation branch from 344741b to 6de85d7 Compare September 4, 2024 17:52

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 4, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 8, 2024

ffromani marked this pull request as ready for review September 8, 2024 15:11

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 8, 2024

k8s-ci-robot requested review from seanmalloy and swatisehgal September 8, 2024 15:11

ffromani commented Sep 9, 2024

View reviewed changes

PiotrProkop reviewed Sep 9, 2024

View reviewed changes

ffromani force-pushed the nrt-log-generation branch 2 times, most recently from 454246b to 5bd1762 Compare September 9, 2024 11:50

ffromani force-pushed the nrt-log-generation branch from 5bd1762 to 0dae3ec Compare September 9, 2024 12:46

k8s-ci-robot assigned PiotrProkop Sep 11, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 11, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2024

k8s-ci-robot merged commit 7a836bc into kubernetes-sigs:master Sep 11, 2024
10 checks passed

ffromani deleted the nrt-log-generation branch September 11, 2024 14:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nrt: log: introduce and use "generation" for cache #798

nrt: log: introduce and use "generation" for cache #798

ffromani commented Sep 4, 2024 •

edited

Loading

k8s-ci-robot commented Sep 4, 2024

k8s-ci-robot commented Sep 4, 2024

netlify bot commented Sep 4, 2024 •

edited

Loading

ffromani commented Sep 4, 2024

ffromani commented Sep 4, 2024

ffromani commented Sep 4, 2024

ffromani commented Sep 8, 2024

ffromani commented Sep 8, 2024

ffromani Sep 9, 2024

Tal-or Sep 9, 2024

ffromani Sep 9, 2024 •

edited

Loading

PiotrProkop left a comment

PiotrProkop Sep 9, 2024

ffromani Sep 9, 2024

ffromani Sep 9, 2024

PiotrProkop commented Sep 11, 2024

ffromani commented Sep 11, 2024

Tal-or commented Sep 11, 2024

	func (ov OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, CachedNRTInfo) {
	func (ov OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod corev1.Pod) (nrt *topologyv1alpha2.NodeResourceTopology, info CachedNRTInfo) {

nrt: log: introduce and use "generation" for cache #798

nrt: log: introduce and use "generation" for cache #798

Conversation

ffromani commented Sep 4, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

k8s-ci-robot commented Sep 4, 2024

k8s-ci-robot commented Sep 4, 2024

netlify bot commented Sep 4, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

ffromani commented Sep 4, 2024

ffromani commented Sep 4, 2024

ffromani commented Sep 4, 2024

ffromani commented Sep 8, 2024

ffromani commented Sep 8, 2024

ffromani Sep 9, 2024

Choose a reason for hiding this comment

Tal-or Sep 9, 2024

Choose a reason for hiding this comment

ffromani Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

PiotrProkop left a comment

Choose a reason for hiding this comment

PiotrProkop Sep 9, 2024

Choose a reason for hiding this comment

ffromani Sep 9, 2024

Choose a reason for hiding this comment

ffromani Sep 9, 2024

Choose a reason for hiding this comment

PiotrProkop commented Sep 11, 2024

ffromani commented Sep 11, 2024

Tal-or commented Sep 11, 2024

ffromani commented Sep 4, 2024 •

edited

Loading

netlify bot commented Sep 4, 2024 •

edited

Loading

ffromani Sep 9, 2024 •

edited

Loading