Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include maintenance node in HTCondor playbook #1057

Conversation

Let the maintenance node play the submitter role in the HTCondor playbook. This is achieved by adding the maintenance node to the `htcondor-secondary-submit` group, a subgroup of the `htcondor` group, that the HTCondor playbook targets.
`usegalaxy_eu.htcondor` has now been replaced with `grycap.htcondor` (part of the HTCondor playbook).
… group

The role `usegalaxy_eu.htcondor` no longer runs on the maintenance node, thus its role variables are no longer needed.
@kysrpex kysrpex self-assigned this Dec 19, 2023
Do not run `usegalaxy-eu.htcondor_release`, `usegalaxy-eu.fix-stop-ITs` nor `usegalaxy-eu.remove-orphan-condor-jobs` on the maintenance node.
@kysrpex
Copy link
Contributor Author

kysrpex commented Dec 19, 2023

@sanjaysrikakulam Add whatever stuff may still be missing.

I suggest to run this by calling ansible-playbook locally before merging (for example with the --limit option to run it only on the maintenance node) to make sure everything works, since the tasks done by the maintenance node are noncritical.

@kysrpex
Copy link
Contributor Author

kysrpex commented Dec 19, 2023

I created the htcondor Jenkins project as discussed. It runs and is enabled.

@sanjaysrikakulam
Copy link
Member

I created the htcondor Jenkins project as discussed. It runs and is enabled.

Cool, thank you!

@@ -294,11 +294,11 @@
when: htcondor_role_submit
- grycap.htcondor
- name: usegalaxy-eu.htcondor_release
when: htcondor_role_submit
when: htcondor_role_submit and inventory_hostname != "maintenance.galaxyproject.eu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will be removed again once the nspawn container stops doing that?
(just for my better understanding)

Copy link
Contributor Author

@kysrpex kysrpex Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that usegalaxy-eu.htcondor_release is designed to run on max one submitter node (this is of course a bad solution and needs to be addressed later, but we want to have the monitoring working properly too). If we do not prevent usegalaxy-eu.htcondor_release from running on more than one submitter node, then #896 comes back.

So unfortunately no, this will not be removed again once the systemd-nspawn container is decommissioned. It will not be as simple as that :(

Copy link
Member

@sanjaysrikakulam sanjaysrikakulam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually deployed and tested. Discussed with @kysrpex regarding the various issues found during the deployment, and all of them were fixed via different PRs in respective repos.

The dashboard is back online (except for a couple of panels, need to look into it). So, merging this PR.

@sanjaysrikakulam sanjaysrikakulam merged commit 7bda04a into usegalaxy-eu:master Dec 21, 2023
2 checks passed
@kysrpex kysrpex deleted the include_maintenance_node_in_htcondor_playbook branch June 17, 2024 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants