Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix T1 site cores ResourceControl logic #12176

Merged
merged 1 commit into from
Dec 2, 2024
Merged

Conversation

amaltaro
Copy link
Contributor

Fixes #12121

Status

not-tested

Description

Given that Tier0 configuration sets this value <=100% (e.g. 12.5), the integer division would always return 0, hence not changing any of the default thresholds.

With the current change, we can now properly calculate a percentage of the site slots (e.g. 1250 for CNAF).

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

It has to go together with this T0 PR: dmwm/T0#5007

External dependencies / deployment changes

None

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 4 changes in unstable tests
  • Python3 Pylint check: succeeded
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/93/artifact/artifacts/PullRequestReport.html

@mapellidario
Copy link
Member

Sorry Alan, I fail to catch the rationale for this change.

Could you provide some example values for infoSSB[site]['slotsCPU'] before being updatesd and self.t1SitesCores ? is self.t1SitesCores set to an integer in the range [0, 100] and nobody noticed before that self.t1SitesCores // 100 always return 0? Which value should be <= 100%?

@amaltaro
Copy link
Contributor Author

To be on the safe side, I am now casting the result to an integer.

Dario, no problem! Yes, the problem is that that division always return 0. I expected the multiplication to take precedence in that expression, but it looks like division goes first, bringing the result to 0, see:

>>> slotsCPU * t1SitesCores // 100
1250.0
>>> slotsCPU * (t1SitesCores // 100)
0.0

So I could actually have kept the integer division and simply enforce the precedence (plus casting, to be on the safe side, given that resource slots are integer).

@dmwm-bot
Copy link

Jenkins results:

  • Python3 Unit tests: succeeded
    • 1 tests added
    • 1 changes in unstable tests
  • Python3 Pylint check: failed
    • 1 warnings and errors that must be fixed
    • 4 warnings
    • 8 comments to review
  • Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/WMCore-PR-Report/97/artifact/artifacts/PullRequestReport.html

Copy link
Member

@mapellidario mapellidario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alan for the comment, thanks for the changes!

@amaltaro
Copy link
Contributor Author

amaltaro commented Dec 2, 2024

Antonio and I worked on vocms0502 resource control operational issues, and we ended up applying this patch to the agent.
Now T1 thresholds are looking Okay. With that, I am proceeding with the merge of this PR.

@amaltaro amaltaro merged commit 0df84ad into dmwm:master Dec 2, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure T0 WMAgent to use ResourceControlUpdater
3 participants