-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TORQUE: Review configuration #215
Comments
Got the following configuration which sets up multiple queues and distributes jobs to the correct queue: ## apply queue defaults
qmgr -c 'set server queue_centric_limits=True'
## create the routing queue
qmgr -c 'create queue default'
qmgr -c 'set queue default queue_type=Route'
qmgr -c 'set queue default route_destinations=serial'
qmgr -c 'set queue default route_destinations+=smp'
qmgr -c 'set queue default route_destinations+=mpi'
qmgr -c 'set queue default enabled=True'
qmgr -c 'set queue default started=True'
## create the serial queue
qmgr -c 'create queue serial'
qmgr -c 'set queue serial queue_type=Execution'
qmgr -c 'set queue serial Priority=100'
qmgr -c 'set queue serial resources_max.nodes=1'
qmgr -c 'set queue serial resources_max.ncpus=1'
qmgr -c 'set queue serial resources_max.procct=1'
qmgr -c 'set queue serial resources_default.nodes=1'
qmgr -c 'set queue serial resources_default.ncpus=1'
qmgr -c 'set queue serial resources_default.procct=1'
qmgr -c 'set queue serial enabled=True'
qmgr -c 'set queue serial started=True'
## create the smp queue
qmgr -c 'create queue smp'
qmgr -c 'set queue smp queue_type=Execution'
qmgr -c 'set queue smp Priority=95'
qmgr -c 'set queue smp resources_max.nodes=1'
qmgr -c 'set queue smp resources_default.nodes=1'
qmgr -c 'set queue smp enabled=True'
qmgr -c 'set queue smp started=True'
## create the mpi queue
qmgr -c 'create queue mpi'
qmgr -c 'set queue mpi queue_type=Execution'
qmgr -c 'set queue mpi Priority=90'
qmgr -c 'set queue mpi resources_default.nodes=2'
qmgr -c 'set queue mpi resources_default.ncpus=2'
qmgr -c 'set queue mpi resources_default.procct=2'
qmgr -c 'set queue mpi resources_min.ncpus=2'
qmgr -c 'set queue mpi resources_min.procct=2'
qmgr -c 'set queue mpi enabled=True'
qmgr -c 'set queue mpi started=True'
Having trouble getting Torque to not allocate more job resources than an entire node can take (i.e 4 cores worth of jobs for a 2 core node). |
With the above configuration, no more jobs than the nodes resources allow for will be started on a single node - below shows a cluster with 4-core compute hosts, showing 4 single-core jobs running on 1 host and the others moving onto the next available host.
|
- Refs alces-software/clusterware#215 - Removed the `batch` queue configuration and creation - Create `default` queue which routes any submitted job to the appropriate queue type (`serial`, `smp` or `mpi`) - Create `serial`, `smp` and `mpi` queues with appropriate min, default and max values for each queue. Also enable and start each of these queues - Set `queue_centric_limits` to `true`, enforcing any configured queue limits on submitted jobs, rather than server-wide configuration min, default or max values
The default TORQUE configuration we have seems to not do scheduling entirely correctly, for example in testing it is possible to:
The autoscaler also assumes that multiple jobs cannot be run on a single compute host (this is usually the case for TORQUE) - so the autoscaler is calculating the number of nodes required based on a single job per node, when in fact the scheduler is allowing us currently to run multiple jobs on a single compute host.
There is probably some series of configuration that allocates jobs appropriately
Possibly useful (thanks @mjtko) - http://www.supercluster.org/pipermail/torqueusers/2012-May/014636.html
The text was updated successfully, but these errors were encountered: