Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove normalize score section in the scheduling framwork #1178

Merged

Conversation

draveness
Copy link
Contributor

@draveness draveness commented Jul 29, 2019

/sig scheduling
/priority important-soon
/kind feature

As the discussion on kubernetes/kubernetes#80383, we decided to merge the normalize score extension point into the score.

/assign @bsalamat @ahg-g

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Jul 29, 2019
@k8s-ci-robot k8s-ci-robot requested review from bgrant0607 and k82cn July 29, 2019 15:42
@draveness
Copy link
Contributor Author

/remove-kind feature
/cc @liu-cong

@k8s-ci-robot
Copy link
Contributor

@draveness: GitHub didn't allow me to request PR reviews from the following users: liu-cong.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/remove-kind feature
/cc @liu-cong

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the kind/feature Categorizes issue or PR as related to a new feature. label Jul 29, 2019
@draveness draveness force-pushed the feature/update-normalize-score branch from ec94458 to 6e5cb9b Compare July 30, 2019 02:12
@draveness
Copy link
Contributor Author

@bsalamat Comments have been addressed, PTAL.

@draveness
Copy link
Contributor Author

BTW: do we need to remove the normalize score in this image?

Copy link
Member

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Draven

1. The first phase is called "score" which is used to rank nodes that have passed
the filtering phase. The scheduler will call `Score` of each scoring plugin for
each node. There will be a well defined range of integers representing the minimum
and maximum scores.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat we don't enforce this range currently, is this supposed to be enforced before or after the normalize (if exists)?

Copy link
Contributor Author

@draveness draveness Jul 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat we don't enforce this range currently, is this supposed to be enforced before or after the normalize (if exists)?

if we refactor InterPodAffinity with the scheduling framework and the score-normalize pattern, we can't enforce this in the "score" phase.

We could reword to "the scoring plugin will return a node to score map after the score and normalize score phases, which will be a well-defined range of integers..."

But it would be a little more complicated for the user to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enforce the range at the time we apply the weights.

Copy link
Contributor Author

@draveness draveness Jul 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enforce the range at the time we apply the weights.

Yes, we could do that, but the developers are supposed to decide how to enforce the range in this extension point, ex: shrink by the ratio ( 10*(current-min)/(max-min)), etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can do that, but the framework should also either error out when the score returned by a plugin is not within the range, or apply a simple default flooring/ceiling logic.

Copy link
Contributor Author

@draveness draveness Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't call it by default

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set the range of scores to [0, 100], we should multiply the framework's score with 0.1 or priority functions' score with 10.

We can also keep the current range [0, 10] which don't need the annoying multiply operation. And I don't think [0, 10] and [0, 100] would make a difference here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a reason to change the range from [0, 10] to [0, 100], unless past experiences indicated that the [0, 10] range is too tight, is that the case @bsalamat?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have seen many instances that our scoring functions are unpredictable. While extending the range will not address this problem, it can be one step towards making the score functions more differentiating. One can imagine that in a large cluster with thousands of nodes, a [0, 10] range is not differentiating enough. Now that we are transitioning to the framework, we can consider expanding the range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have seen many instances that our scoring functions are unpredictable. While extending the range will not address this problem, it can be one step towards making the score functions more differentiating. One can imagine that in a large cluster with thousands of nodes, a [0, 10] range is not differentiating enough. Now that we are transitioning to the framework, we can consider expanding the range.

OK, I'll multiply the original score with 10.

Copy link

@liu-cong liu-cong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm to me overall.

Thanks Draven!

@draveness draveness force-pushed the feature/update-normalize-score branch from 6e5cb9b to 5bfa707 Compare August 6, 2019 02:01
@ahg-g
Copy link
Member

ahg-g commented Aug 6, 2019

@draveness can you please add a new commit when making new changes to the PR instead of amending? this is useful for the reviewers to see what changes were made since the last review.

@draveness
Copy link
Contributor Author

@draveness can you please add a new commit when making new changes to the PR instead of amending? this is useful for the reviewers to see what changes were made since the last review.

Sure, I could do this the next time.

@draveness
Copy link
Contributor Author

draveness commented Aug 7, 2019

To summaries, what we reached consensus is as below:

  1. A Score function should return a value in the range [0, 100] unless a normalize score is provided
  2. A NormalizeScore should return a value in the range [0, 100] when provided
  3. A NormalizeScore is optional to implement, the framework could provide a default implementation, but it should explicitly be called by the plugin (we shouldn't call it by default). feat: remove normalize score section in the scheduling framwork #1178 (comment)
  4. Multiply the score returned by priority function by 10. feat: remove normalize score section in the scheduling framwork #1178 (comment)
  5. Use a named array instead of a score array in normalize score function. Per discussion in feat: use named array instead of array in normalizing score kubernetes#80901 (review)

Is there anything wrong or missing? If not, I'll update the KEP and related PRs to these changes.

/cc @bsalamat @ahg-g

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and bsalamat August 7, 2019 02:51
@bsalamat
Copy link
Member

bsalamat commented Aug 7, 2019

To summaries, what we reached consensus is as below:

  1. A Score function should return a value in the range [0, 100] unless a normalize score is provided
  2. A NormalizeScore should return a value in the range [0, 100] when provided

I would rephrase these two items as:

  1. The output of a score plugin must be an integer in range of [0, 100]. This is the output after running the optional NormalizeScore function of the plugin. If NormalizeScore is not provided, the output of Score must be in this range.

The reason that I propose this minor change is that right after running Score functions, we do not want to check whether a NormalizeScore is provided.

  1. A NormalizeScore is optional to implement, the framework could provide a default implementation, but it should explicitly be called by the plugin (we shouldn't call it by default). feat: remove normalize score section in the scheduling framwork #1178 (comment)

As @ahg-g pointed, the framework will not provide a default implementation. When the output of a score plugin without NormalizeScore is in the [0,100] range, there is no need to run any normalization.

  1. Multiply the score returned by priority function by 10. feat: remove normalize score section in the scheduling framwork #1178 (comment)

That's one option. However, these are our internal functions. We could change them to provide finer grain scores.

1.Use a named array instead of a score array in normalize score function. Per discussion in kubernetes/kubernetes#80901 (review)

Is there anything wrong or missing? If not, I'll update the KEP and related PRs to these changes.

@liu-cong
Copy link

  1. A NormalizeScore is optional to implement, the framework could provide a default implementation, but it should explicitly be called by the plugin (we shouldn't call it by default). feat: remove normalize score section in the scheduling framwork #1178 (comment)

As @ahg-g pointed, the framework will not provide a default implementation. When the output of a score plugin without NormalizeScore is in the [0,100] range, there is no need to run any normalization.

@bsalamat , It seems that the purpose of NormalizeScore is to make sure the result is in the [0, 100] range. It's the ScorePlugin developer's responsibility anyway to make sure the score results is in the range. They can choose whatever approach they want to do this, be it using a normalizeScore helper function, imbedding in the score calculation, etc. If that's the case, why do we provide NormalizeScore as a public interface? In my opinion this forces the developer to make one more decision which is not necessary.

Did I miss any thing?

@ahg-g
Copy link
Member

ahg-g commented Aug 12, 2019

The difference between Score and NormalizeScore is the interface, the former evaluates one node at a time, the latter evaluates all the nodes (normalization could require a global view). It is a pattern that I assume was observed frequently and warranted an explicit support from the framework.

@draveness draveness force-pushed the feature/update-normalize-score branch from 725b3ba to 3cbd7d5 Compare August 19, 2019 06:39
@draveness
Copy link
Contributor Author

I've updated the score section and kept the current range. Besides, I created an kubernetes/kubernetes#81281 to discuss the path to expand the range to [0, 100], we could update the KEP after the discussion.

@bsalamat @ahg-g Please take another look

@draveness draveness force-pushed the feature/update-normalize-score branch from 3cbd7d5 to e27703a Compare August 19, 2019 06:45
@draveness
Copy link
Contributor Author

draveness commented Aug 19, 2019

@ahg-g comments have been addressed, please take another look

@ahg-g
Copy link
Member

ahg-g commented Aug 19, 2019

@ahg-g comments have been addressed, please take another look

Thanks, there is still one instance of "NodeScoreMax"

@draveness
Copy link
Contributor Author

@ahg-g comments have been addressed, please take another look

Thanks, there is still one instance of "NodeScoreMax"

done

@draveness draveness force-pushed the feature/update-normalize-score branch from 3a154d3 to b2c8d03 Compare August 19, 2019 17:29
@ahg-g
Copy link
Member

ahg-g commented Aug 20, 2019

/lgtm

please squash the commits

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2019
@draveness draveness force-pushed the feature/update-normalize-score branch from 3fd04bc to 5ff5d20 Compare August 20, 2019 15:23
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2019
@draveness
Copy link
Contributor Author

/lgtm

please squash the commits

Done, friendly ping @bsalamat for final approval.

Copy link
Member

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks, @draveness!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, bsalamat, draveness, liu-cong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2019
@k8s-ci-robot k8s-ci-robot merged commit aba897e into kubernetes:master Aug 20, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants