This page contains in-depth details on how to configure the pool configuration file for Batch Shipyard.
- Note that environment variable conventions used below are for Linux.
Windows environment variables should follow Windows conventions. For example,
the Azure Batch Shared Directory on compute nodes are referenced in Linux
as
$AZ_BATCH_NODE_SHARED_DIR
, while on Windows, it would be%AZ_BATCH_NODE_SHARED_DIR%
.
The pool schema is as follows:
pool_specification:
id: batch-shipyard-pool
vm_configuration:
platform_image:
publisher: Canonical
offer: UbuntuServer
sku: 16.04-LTS
version: latest
native: false
license_type: null
custom_image:
arm_image_id: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/galleries/<gallery_name>/images/<image_name>/versions/<version>
node_agent: <node agent sku id>
native: false
license_type: null
vm_size: STANDARD_D2_V2
vm_count:
dedicated: 4
low_priority: 8
max_tasks_per_node: 1
resize_timeout: 00:20:00
node_fill_type: pack
autoscale:
evaluation_interval: 00:15:00
scenario:
name: active_tasks
maximum_vm_count:
dedicated: 16
low_priority: 8
maximum_vm_increment_per_evaluation:
dedicated: 4
low_priority: -1
node_deallocation_option: taskcompletion
sample_lookback_interval: 00:10:00
required_sample_percentage: 70
bias_last_sample: true
bias_node_type: low_priority
rebalance_preemption_percentage: 50
time_ranges:
weekdays:
start: 1
end: 5
work_hours:
start: 8
end: 17
formula: null
inter_node_communication_enabled: false
per_job_auto_scratch: false
reboot_on_start_task_failed: false
attempt_recovery_on_unusable: false
upload_diagnostics_logs_on_unusable: true
block_until_all_global_resources_loaded: true
transfer_files_on_pool_creation: false
input_data:
azure_batch:
- destination: $AZ_BATCH_NODE_SHARED_DIR/jobonanotherpool
exclude:
- '*.txt'
include:
- wd/*.dat
job_id: jobonanotherpool
task_id: mytask
azure_storage:
- storage_account_settings: mystorageaccount
remote_path: poolcontainer/dir
local_path: $AZ_BATCH_NODE_SHARED_DIR/pooldata
is_file_share: false
exclude:
- '*.tmp'
include:
- pooldata*.bin
blobxfer_extra_options: null
resource_files:
- blob_source: https://some.url
file_mode: '0750'
file_path: path/in/wd/file.bin
ssh:
username: shipyard
expiry_days: 30
ssh_public_key: /path/to/rsa/publickey.pub
ssh_public_key_data: ssh-rsa ...
ssh_private_key: /path/to/rsa/privatekey
generate_docker_tunnel_script: true
generated_file_export_path:
hpn_server_swap: false
allow_docker_access: false
rdp:
username: shipyard
password: null
expiry_days: 30
remote_access_control:
starting_port: 49000
allow:
- 1.2.3.4
deny:
- '*'
virtual_network:
arm_subnet_id: /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/virtualNetworks/<virtual_network_name>/subnets/<subnet_name>
name: myvnet
resource_group: resource-group-of-vnet
create_nonexistant: false
address_space: 10.0.0.0/16
subnet:
name: subnet-for-batch-vms
address_prefix: 10.0.0.0/20
public_ips:
- /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/publicIPAddresses/<public_ip_name1>
- /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Network/publicIPAddresses/<public_ip_name2>
certificates:
sha1-thumbprint:
visibility:
- task
- start_task
- remote_user
additional_node_prep:
commands:
pre: []
post: []
environment_variables:
abc: xyz
environment_variables_keyvault_secret_id: https://myvault.vault.azure.net/secrets/nodeprepenv
gpu:
nvidia_driver:
source: https://some.url
ignore_warnings: false
batch_insights_enabled: false
prometheus:
node_exporter:
enabled: false
port: 9100
options: []
cadvisor:
enabled: false
port: 8080
options: []
container_runtimes:
install:
- kata_containers
- singularity
default: null
The pool_specification
property has the following members:
- (required)
id
is the compute pool ID. This value can be any combination of alphanumeric characters including hyphens and underscores up to 64 characters in length. If this pool specification is used for anauto_pool
then the maximum length of this value is 20 characters which becomes the prefix for the autopool id. - (required)
vm_configuration
specifies the image configuration for the VM. Eitherplatform_image
orcustom_image
must be specified. You cannot specify both. Please see the Batch Shipyard Platform Image support doc for more information on which Marketplace images are supported. If using a custom image, please see the Custom Image Guide first.- (required for platform image)
platform_image
defines the Marketplace platform image to use:- (required for platform image)
publisher
is the publisher name of the Marketplace VM image. - (required for platform image)
offer
is the offer name of the Marketplace VM image. - (required for platform image)
sku
is the sku name of the Marketplace VM image. - (optional)
version
is the image version to use. The default islatest
. - (optional)
native
will convert the platform image to use native Docker container support for Azure Batch, if possible. This can provide better task management (such as job and task termination while tasks are running) and potentially lead to faster compute node provisioning (although not guaranteed), in exchange for some features that are not available in this mode such Singularity containers, task-level data ingress or task-level data egress that is not bound for Azure Storage Blobs, among others. If there is nonative
conversion equivalent for the specifiedpublisher
,offer
,sku
then no conversion is performed and this option will be force disabled. Note thatnative
mode is not compatible with Singularity containers. The default isfalse
. Please see the FAQ for more information. - (optional)
license_type
specifies the type of on-premises license to be used when deploying the operating system. This activates the Azure Hybrid Use Benefit for qualifying license holders. This only applies to Windows OS types. You must comply with the terms set forth by this program; please consult the FAQ for further information. The only valid value iswindows_server
.
- (required for platform image)
- (required for custom image)
custom_image
defines the custom image to use. AADbatch
credentials are required to use custom images for both Batch service and User Subscription modes.- (required for custom image)
arm_image_id
defines the Azure Shared Image Gallery resource id to use as the OS image for the pool. The Shared Image resource must be replicated (and have completed replication) to the same region as the Batch account. - (required for custom image)
node_agent
is the node agent sku id to use with this custom image. You can view supported base images and their node agent sku ids with theaccount images
command. - (optional)
native
will opt to use native Docker container support if possible. This provides better task management (such as job and task termination while tasks are running), in exchange for some other features that are not available in this mode such as task-level data ingress or task-level data egress that is not bound for Azure Storage Blobs. The default isfalse
. - (optional)
license_type
specifies the type of on-premises license to be used when deploying the operating system. This activates the Azure Hybrid Use Benefit for qualifying license holders. This only applies to Windows OS types. You must comply with the terms set forth by this program; please consult the FAQ for further information. The only valid value iswindows_server
.
- (required for custom image)
- (required for platform image)
- (required)
vm_size
is the Azure Virtual Machine Instance Size. Please note that not all regions have every VM size available. - (required)
vm_count
is the number of compute nodes to allocate. You may specify a mixed number of compute nodes in the following properties:- (optional)
dedicated
is the number of dedicated compute nodes to allocate. These nodes cannot be pre-empted. The default value is0
. - (optional)
low_priority
is the number of low-priority compute nodes to allocate. These nodes may be pre-empted at any time. Workloads that are amenable tolow_priority
nodes are those that do not have strict deadlines for pickup and completion. Optimally, these types of jobs would checkpoint their progress and be able to recover when re-scheduled. The default value is0
.
- (optional)
- (optional)
resize_timeout
is the amount of time allowed for resize operations (note that creating a pool resizes from 0 to the specified number of nodes). The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 15 minutes. This should not be specified (and is ignored) forautoscale
enabled pools. - (optional)
max_tasks_per_node
is the maximum number of concurrent tasks that can be running at any one time on a compute node. This defaults to a value of 1 if not specified. The maximum value for the property that Azure Batch will accept is4 x <# cores per compute node>
. For instance, for aSTANDARD_F2
instance, because the virtual machine has 2 cores, the maximum allowable value for this property would be8
. - (optional)
node_fill_type
is the task scheduling compute node fill type policy to apply.pack
, which is the default, attempts to pack the maximum number of tasks on a node (controlled throughmax_tasks_per_node
before scheduling tasks to another node).spread
will schedule tasks evenly across compute nodes before packing. - (optional)
autoscale
designates the autoscale settings for the pool. If specified, thevm_count
becomes the minimum number of virtual machines for each node type forscenario
based autoscale.- (optional)
evaluation_interval
is the time interval between autoscale evaluations performed by the service. The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 15 minutes. The smallest value that can be specified is 5 minutes. Use caution when specifying a smallevaluation_interval
values which can cause pool resizing errors and instability with volatile target counts. - (optional)
scenario
is a pre-set autoscale scenario where a formula will be generated with the parameters specified within this property.- (required)
name
is the autoscale scenario name to apply. Valid values areactive_tasks
,pending_tasks
,workday
,workday_with_offpeak_max_low_priority
,weekday
,weekend
. Please see the autoscale guide for more information about these scenarios. - (required)
maximum_vm_count
is the maximum number of compute nodes that can be allocated from an autoscale evaluation. It is useful to have these limits in place as to control the top-end scale of the autoscale scenario. Specifying a negative value for either of the following properties will result in effectively no maximum limit.- (optional)
dedicated
is the maximum number of dedicated compute nodes that can be allocated. - (optional)
low_priority
is the maximum number of low priority compute nodes that can be allocated.
- (optional)
- (optional)
maximum_vm_increment_per_evaluation
is the maximum amount of VMs to increase per evaluation. Specifying a non-positive value (i.e., less than or equal to0
) for either of the following properties will result in effectively no increment limit.- (optional)
dedicated
is the maximum increase in VMs per evaluation. - (optional)
low_priority
is the maximum increase in VMs per evaluation.
- (optional)
- (optional)
node_deallocation_option
is the node deallocation option to apply. When a pool is resized down and a node is selected for removal, what action is performed for the running task is specified with this option. The valid values are:requeue
,terminate
,taskcompletion
, andretaineddata
. The default istaskcompletion
. Please see this doc for more information. - (optional)
sample_lookback_interval
is the time interval to lookback for past history for certain scenarios such as autoscale based on active and pending tasks. The format for this property is a timedelta with a string representation of "d.HH:mm:ss". "HH:mm:ss" is required, but "d" is optional, if specified. If not specified, the default is 10 minutes. - (optional)
required_sample_percentage
is the required percentage of samples that must be present during thesample_lookback_interval
. If not specified, the default is 70. - (optional)
bias_last_sample
will bias the autoscale scenario, if applicable, to use the last sample during history computation. This can be enabled to more quickly respond to changes in history with respect to averages. The default istrue
. - (optional)
bias_node_type
will bias the the autoscale scenario, if applicable, to favor one type of node over the other when making a decision on how many of each node to allocate. The default isauto
or equal weight to bothdedicated
andlow_priority
nodes. Valid values arenull
(or omitting the property),dedicated
, orlow_priority
. - (optional)
rebalance_preemption_percentage
will rebalance the compute nodes to bias for dedicated nodes when the pre-empted node count reaches the indicated threshold percentage of the total current dedicated and low priority nodes. The default isnull
or no rebalancing is performed. - (optional)
time_ranges
defines the time ranges for the day-of-week based scenarios.- (optional)
weekdays
defines the days of the week which should be considered weekdays, where1
= Monday.- (optional)
start
defines the inclusive start weekday day of the week as an integer. The default is1
. - (optional)
end
defines the inclusive end weekday day of the week as an integer. The default is5
.
- (optional)
- (optional)
work_hours
defines the hours of the day in the work day with a range from0
to23
, inclusive.- (optional)
start
defines the inclusive start hour of the work day as an integer. The default is8
. - (optional)
end
defines the inclusive end hour of the work day as an integer. The default is17
.
- (optional)
- (optional)
- (required)
- (optional)
formula
is a custom autoscale formula to apply to the pool. If bothformula
andscenario
are specified, thenformula
is used.
- (optional)
- (optional)
inter_node_communication_enabled
designates if this pool is set up for inter-node communication. This must be set totrue
for any containers that must communicate with each other such as MPI applications. This property cannot be enabled if there are positive values for bothdedicated and
low_priority` compute nodes specified above. This property will be force enabled if peer-to-peer replication is enabled. - (optional)
per_job_auto_scratch
will enable on-demand distributed scratch space creation across all dedicated or low priority nodes in a pool for a job. This scratch will be available at the location$AZ_BATCH_TASK_DIR/auto_scratch
within the container. The scratch drive is cleaned up automatically on job termination or deletion. This option requires setting the propertyinter_node_communication_enabled
totrue
. Note that SSH and BeeGFS communication must be allowed on the virtual network between nodes. Thus if specifying avirtual_network
and/orremote_access_control
rules, you must ensure that the internal network traffic is not blocked by NSG rules. This option is only available on a subset of supported Linux distributions. The default, if not specified, isfalse
. - (optional)
reboot_on_start_task_failed
allows Batch Shipyard to reboot the compute node in case there is a transient failure in node preparation (e.g., network timeout, resolution failure or download problem). This defaults tofalse
. - (optional)
attempt_recovery_on_unusable
allows Batch Shipyard to attempt to recover nodes that enterunusable
state automatically. Note that enabling this option can lead to infinite wait onpool add
orpool resize
with--wait
. This defaults tofalse
and is ignored forcustom_image
where the behavior is alwaysfalse
. - (optional)
upload_diagnostics_logs_on_unusable
allows Batch Shipyard to attempt upload of diagnostics logs for nodes that have entered unusable state during provisioning to the storage account designated under thebatch_shipyard
:storage_account_settings
global configuration property. Note that this typically will only result in one set of logs being uploaded even if multiple nodes eventually enter this state. These logs can be referenced in conjunction with a support request to provide additional insight into why a compute node failed to provision properly. This defaults totrue
. Note that by setting this property totrue
, these diagnostics logs are not automatically sent to Microsoft and must be included, either indirectly via the SAS URL generated or directly, with support requests. - (optional)
block_until_all_global_resources_loaded
will block the node from entering ready state until all Docker images are loaded. This defaults totrue
. This option has no effect onnative
container support pools (the behavior will effectively reflecttrue
for this property onnative
container support pools). - (optional)
transfer_files_on_pool_creation
will ingress allfiles
specified in theglobal_resources
section of the global configuration file when the pool is created. If files are to be ingressed to Azure Blob or File Storage, then data movement operations are overlapped with the creation of the pool. If files are to be ingressed to a shared file system on the compute nodes, then the files are ingressed after the pool is created and the shared file system is ready. Files can be ingressed to both Azure Blob Storage and a shared file system during the same pool creation invocation. If this property is set totrue
thenblock_until_all_global_resources_loaded
will be force disabled. If omitted, this property defaults tofalse
. - (optional)
input_data
is an object containing data that should be ingressed to all compute nodes as part of node preparation. It is important to note that if you are combining this action withfiles
and are ingressing data to Azure Blob or File storage as part of pool creation, that the blob containers or file shares defined here will be downloaded as soon as the compute node is ready to do so. This may result in the blob container/blobs or file share/files not being ready in time for theinput_data
transfer. It is up to you to ensure that these two operations do not overlap. If there is a possibility of overlap, then you should ingress data defined infiles
prior to pool creation and disable the option abovetransfer_files_on_pool_creation
. This object currently supportsazure_batch
andazure_storage
as members.azure_batch
contains the following members:- (required)
job_id
the job id of the task - (required)
task_id
the id of the task to fetch files from - (optional)
include
is an array of include filters - (optional)
exclude
is an array of exclude filters - (required)
destination
is the destination path to place the files
- (required)
azure_storage
contains the following members:- (required)
storage_account_settings
contains a storage account link as defined in the credentials config. - (required)
remote_path
is required when downloading from Azure Storage. This path on Azure includes either the container or file share path along with all virtual directories. - (required)
local_path
is required when downloading from Azure Storage. This specifies where the files should be downloaded to on the compute node. Please note that you should not specify a destination that is on a shared file system. If you require ingressing to a shared file system location like a GlusterFS volume, then use the global configurationfiles
property and thedata ingress
command. - (optional)
is_file_share
denotes if theremote_path
is on a file share. This defaults tofalse
. - (optional)
include
property defines optional include filters. - (optional)
exclude
property defines optional exclude filters. - (optional)
blobxfer_extra_options
are any extra options to pass toblobxfer
.
- (required)
- (optional)
resource_files
is an array of resource files that should be downloaded as part of the compute node's preparation. Each array entry contains the following information:file_path
is the path within the node prep task working directory to place the file on the compute node. This directory can be referenced by the$AZ_BATCH_NODE_STARTUP_DIR/wd
path.blob_source
is an accessible HTTP/HTTPS URL. This need not be an Azure Blob Storage URL.file_mode
if the file mode to set for the file on the compute node. This is optional.
- (optional)
virtual_network
is the property for specifying an ARM-based virtual network resource for the pool. AADbatch
credentials are required for both Batch service and User Subscription modes. Please see the Virtual Network guide for more information.- (required/optional)
arm_subnet_id
is the full ARM resource id to the subnet on the virtual network. This virtual network must already exist and must exist within the same region and subscription as the Batch account. If this value is specified, the other properties ofvirtual_network
are ignored. AADmanagement
credentials are not strictly required for this case but is recommended to be filled to allow address space validation checks. - (required/optional)
name
is the name of the virtual network. Ifarm_subnet_id
is not specified, this value is required. Note that this requires AADmanagement
credentials. - (optional)
resource_group
containing the virtual network. If the resource group name is not specified here, theresource_group
specified in thebatch
credentials will be used instead. - (optional)
create_nonexistant
specifies if the virtual network and subnet should be created if not found. If not specified, this defaults tofalse
. - (required if creating, optional otherwise)
address_space
is the allowed address space for the virtual network. - (required/optional)
subnet
specifies the subnet properties. This is required ifarm_subnet_id
is not specified, i.e., the virtual networkname
is specified instead.- (required)
name
is the subnet name. - (required)
address_prefix
is the subnet address prefix to use for allocation Batch compute nodes to. The maximum number of compute nodes a subnet can support is 4096 which maps roughly to a CIDR mask of 20-bits.
- (required)
- (required/optional)
- (optional)
ssh
is the property for creating a user to accomodate SSH sessions to compute nodes. If this property is absent, then an SSH user is not created with pool creation. If you are running Batch Shipyard on Windows, please refer to these instructions on how to generate an SSH keypair for use with Batch Shipyard. This property is ignored for Windows-based pools.- (required)
username
is the user to create on the compute nodes. - (optional)
expiry_days
is the number of days from now for the account on the compute nodes to expire. The default is 30 days from invocation time. - (optional)
ssh_public_key
is the path to an existing SSH public key to use. If not specified, an RSA public/private keypair will be automatically generated ifssh-keygen
orssh-keygen.exe
can be found on thePATH
. This option cannot be specified withssh_public_key_data
. - (optional)
ssh_public_key_data
is the raw RSA public key data in OpenSSH format, e.g., a string starting withssh-rsa ...
. Only one key may be specified. This option cannot be specified withssh_public_key
. - (optional)
ssh_private_key
is the path to an existing SSH private key to use against eitherssh_public_key
orssh_public_key_data
for connecting to compute nodes. This option should only be specified if eitherssh_public_key
orssh_public_key_data
are specified. - (optional)
generate_docker_tunnel_script
property directs script to generate an SSH tunnel script that can be used to connect to the remote Docker engine running on a compute node. This script can only be used on non-Windows systems. - (optional)
generated_file_export_path
is the path to export the generated RSA keypair and docker tunnel script to. If omitted, the current directory is used. - (experimental)
hpn_server_swap
property enables an OpenSSH server with HPN patches to be swapped with the standard distribution OpenSSH server. This is not supported on all Linux distributions and may be force disabled. - (optional)
allow_docker_access
allows this SSH user access to the Docker daemon. The default isfalse
.
- (required)
- (optional)
rdp
is the property for creating a user to accomodate RDP login sessions to compute nodes. If this property is absent, then an RDP user is not created with pool creation. This property is ignored for Linux-based pools.- (required)
username
is the user to create on the compute nodes. - (optional)
expiry_days
is the number of days from now for the account on the compute nodes to expire. The default is 30 days from invocation time. - (optional)
password
is the password to associate with the user. Passwords must meet the minimum complexity requirements as required by Azure Batch. If left omitted, unspecified or set tonull
, then a random password is generated and logged during anypool add
call with this section defined, orpool user add
.
- (required)
- (optional)
remote_access_control
is a property to control access to the remote access port (SSH or RDP). If this section is omitted, then the Batch service defaults are applied which do not apply any network security rules on these ports.- (optional)
starting_port
is the starting port for each SSH port on each node to map to the "front-end" load balancer. The default value is49000
if not specified. Ports from50000
to55000
are reserved by the Batch service. You must specify enough space for 1000 ports; e.g.,49500
would not be valid since the range would overlap into the reserved range. - (optional)
allow
is a list of allowable address prefixes in CIDR format. - (optional)
deny
is a list of address prefixes in CIDR format to deny.deny
rules have lower priority thanallow
rules. Therefore, you can specify a set of allowable address prefixes and then specify a single deny rule of*
to deny all other IP addresses from connecting to the remote access port. Take care when specifyingdeny
rules when your nodes must make use of SSH or RDP to perform actions between compute nodes.
- (optional)
- (optional)
public_ips
property defines a list of any pre-defined Azure-allocated Public IP addresses that are assigned to the pool. These must not already be bound and there must be a sufficient number of public IPs to cover the number of compute nodes in a pool (or any potential future resizes). These must be fully-qualified ARM Public IP resource ids. - (optional)
certificates
property defines any certificate references to add on this pool. These certificates must already be present on the Batch account and are only applied to new pool allocations.- (required)
sha1-thumbprint
is the actual SHA-1 thumbprint of the certificate to add to the pool.- (required)
visibility
is a list of visibility settings to apply to the certificate. Valid values arenode_prep
,remote_user
, andtask
.
- (required)
- (required)
- (optional)
additional_node_prep
defines any additional node preparation commands to execute on node start.- (optional)
commands
are the commands to execute- (optional)
pre
is an array of additional commands to execute on the compute node host as part of node preparation which occur prior to the Batch Shipyard node preparation steps. This is particularly useful for preparing platform images with software for custom Linux mounts. - (optional)
post
is an array of additional commands to execute on the compute node host as part of node preparation which occur after the Batch Shipyard node preparation steps.
- (optional)
- (optional)
environment_variables
that are set on the Azure Batch start task. Note that environment variables are not expanded and are passed as-is. - (optional)
environment_variables_keyvault_secret_id
are any additional environment variables that should be applied to the start task but are stored in KeyVault. The secret stored in KeyVault must be a valid YAML/JSON string, e.g.,{ "env_var_name": "env_var_value" }
.
- (optional)
- (optional)
gpu
property defines additional information for NVIDIA GPU-enabled VMs. If not specified, Batch Shipyard will automatically download the driver for thevm_size
specified.- (optional)
nvidia_driver
property contains the following members:- (required)
source
is the source url to download the driver. This should be the silent-installable driver package.
- (required)
- (optional)
ignore_warnings
property allows overriding the default beahvior to place the node in start task failed state if during node prep there are warnings of possible GPU issues such as infoROM corruption. It is recommended not to set this value totrue
. The default, if not specified, isfalse
.
- (optional)
- (optional)
batch_insights_enabled
property enables Batch Insights monitoring for the pool. This provides simple non-realtime, host-based monitoring through Batch Explorer. The default isfalse
. - (optional)
prometheus
properties are to control if collectors for metrics to export to Prometheus monitoring are enabled. Note that all exporters do not have their ports mapped (NAT) on the load balancer pool. This means that the Prometheus instance itself must reside on, or peered with, the virtual network that the compute nodes are in. This ensures that external parties cannot scrape exporter metrics from compute node instances.- (optional)
node_exporter
contains options for the Node Exporter metrics exporter.- (optional)
enabled
property enables or disables this exporter. Default isfalse
. - (optional)
port
is the port for Prometheus to connect to scrape. This is the internal port on the compute node. - (optional)
options
is a list of options to pass to the node exporter instance running on all nodes. The following collectors are force disabled, in addition to others disabled by default: textfile, mdadm, wifi, xfs, zfs. The infiniband collector is enabled if on an IB/RDMA instance, automatically. The nfs collector is enabled if mounting an NFS RemoteFS storage cluster, automatically.
- (optional)
- (optional)
cadvisor
contains options for the cAdvisor metrics exporter.- (optional)
enabled
property enables or disables this exporter. Default isfalse
. - (optional)
port
is the port for Prometheus to connect to scrape. This is the internal port on the compute node. - (optional)
options
is a list of options to pass to the cAdvisor instance running on all nodes.
- (optional)
- (optional)
- (optional)
container_runtimes
properties control container runtime behavior on the pool compute nodes.- (optional)
install
controls which optional container runtimes to install. A list of valid values for this option arekata_containers
andsingularity
. Note that therunc
container runtime is always installed. Thenvidia
container runtime is automatically installed when allocating a pool with GPUs.singularity
must be specified if running Singularity containers. - (optional)
default
is the default container runtime to use for running Docker containers. This option has no effect onsingularity
containers.
- (optional)
A full template of a credentials file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.