-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include maintenance node in HTCondor playbook #1057
Include maintenance node in HTCondor playbook #1057
Conversation
kysrpex
commented
Dec 19, 2023
- Add maintenance node to the HTCondor playbook
- Remove usegalaxy_eu.htcondor role from the maintenance playbook
- Remove vars specific to usegalaxy_eu.htcondor role from maintenance group
Let the maintenance node play the submitter role in the HTCondor playbook. This is achieved by adding the maintenance node to the `htcondor-secondary-submit` group, a subgroup of the `htcondor` group, that the HTCondor playbook targets.
`usegalaxy_eu.htcondor` has now been replaced with `grycap.htcondor` (part of the HTCondor playbook).
… group The role `usegalaxy_eu.htcondor` no longer runs on the maintenance node, thus its role variables are no longer needed.
Do not run `usegalaxy-eu.htcondor_release`, `usegalaxy-eu.fix-stop-ITs` nor `usegalaxy-eu.remove-orphan-condor-jobs` on the maintenance node.
@sanjaysrikakulam Add whatever stuff may still be missing. I suggest to run this by calling |
I created the htcondor Jenkins project as discussed. It runs and is enabled. |
Cool, thank you! |
@@ -294,11 +294,11 @@ | |||
when: htcondor_role_submit | |||
- grycap.htcondor | |||
- name: usegalaxy-eu.htcondor_release | |||
when: htcondor_role_submit | |||
when: htcondor_role_submit and inventory_hostname != "maintenance.galaxyproject.eu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will be removed again once the nspawn container stops doing that?
(just for my better understanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that usegalaxy-eu.htcondor_release
is designed to run on max one submitter node (this is of course a bad solution and needs to be addressed later, but we want to have the monitoring working properly too). If we do not prevent usegalaxy-eu.htcondor_release
from running on more than one submitter node, then #896 comes back.
So unfortunately no, this will not be removed again once the systemd-nspawn container is decommissioned. It will not be as simple as that :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually deployed and tested. Discussed with @kysrpex regarding the various issues found during the deployment, and all of them were fixed via different PRs in respective repos.
The dashboard is back online (except for a couple of panels, need to look into it). So, merging this PR.