This page contains in-depth details on how to configure the global configuration file for Batch Shipyard.
The global config schema is as follows:
batch_shipyard:
storage_account_settings: mystorageaccount
storage_entity_prefix: shipyard
generated_sas_expiry_days: null
autogenerated_task_id:
prefix: task-
zfill_width: 5
encryption:
enabled: true
pfx:
filename: encrypt.pfx
passphrase: mysupersecretpassword
sha1_thumbprint: 123456789...
public_key_pem: encrypt.pem
fallback_registry: myregistry.azurecr.io
delay_docker_image_preload: false
data_replication:
concurrent_source_downloads: null
global_resources:
additional_registries:
docker:
- myruntimeserver.azurecr.io
docker_images:
- busybox
singularity_images:
- image: shub://singularityhub/busybox
- image: docker://busybox
- image: oras://myazurecr.azurecr.io/repo/myunsignedimage:1.0.0
- image: library://user/repo/image:1.0.0
- image: library://user/repo/encryptedimage:1.0.0
encryption:
certificate:
sha1_thumbprint: 123456789...
signed:
- image: library://sylabs/tests/signed:1.0.0
signing_key:
fingerprint: 8883491F4268F173C6E5DC49EDECE4F3F38D871E
- image: oras://myazurecr.azurecr.io/repo/mysignedimage:1.0.0
signing_key:
fingerprint: 000123000123000123000123000123000123ABCD
file: /path/to/key/file
- image: library://user/repo/encryptedimage:1.0.0
signing_key:
fingerprint: 000123000123000123000123000123000123ABCD
file: /path/to/key/file
encryption:
certificate:
sha1_thumbprint: 123456789...
volumes:
data_volumes:
contdatavol:
container_path: /abc
host_path:
bind_options: ro
hosttempvol:
container_path: /hosttmp
host_path: /tmp
bind_options: rw
shared_data_volumes:
azurefile_vol:
volume_driver: azurefile
storage_account_settings: mystorageaccount
azure_file_share_name: myfileshare
container_path: $AZ_BATCH_NODE_SHARED_DIR/azfile
mount_options:
- file_mode=0777
- dir_mode=0777
bind_options: rw
azureblob_vol:
volume_driver: azureblob
storage_account_settings: mystorageaccount
azure_blob_container_name: mycontainer
container_path: $AZ_BATCH_NODE_SHARED_DIR/azblob
mount_options:
- --use-https=true
bind_options: rw
nfs_server:
volume_driver: storage_cluster
container_path: $AZ_BATCH_NODE_SHARED_DIR/nfs_server
mount_options: []
bind_options: ro
glusterfs_cluster:
volume_driver: storage_cluster
container_path: $AZ_BATCH_NODE_SHARED_DIR/glusterfs_cluster
mount_options: []
bind_options: null
glusterfs_on_compute_vol:
volume_driver: glusterfs_on_compute
container_path: $AZ_BATCH_NODE_SHARED_DIR/glusterfs_on_compute
volume_type: replica
volume_options: []
bind_options: rw
custom_vol:
volume_driver: custom_linux_mount
container_path: $AZ_BATCH_NODE_SHARED_DIR/lustre
fstab_entry:
fs_spec: 10.1.0.4@tcp0:10.1.0.5@tcp0:/lustre
fs_vfstype: lustre
fs_mntops: defaults,_netdev
fs_freq: 0
fs_passno: 0
bind_options: null
files:
- destination:
data_transfer:
method: multinode_scp
ssh_private_key: id_rsa_shipyard
scp_ssh_extra_options: -c [email protected]
rsync_extra_options: ''
split_files_megabytes: 500
max_parallel_transfers_per_node: 2
relative_destination_path: myfiles
shared_data_volume: glustervol
source:
exclude:
- '*.bak'
include:
- '*.dat'
path: /some/local/path/dir
- destination:
storage_account_settings: mystorageaccount
data_transfer:
remote_path: container/dir
is_file_share: false
blobxfer_extra_options: null
source:
exclude:
- '*.tmp'
include:
- '*.bin'
path: /some/local/path/bound/for/storage
- destination:
data_transfer:
method: rsync+ssh
ssh_private_key: id_rsa_shipyard
scp_ssh_extra_options: -c [email protected]
rsync_extra_options: -v
relative_destination_path: relpath/on/host/test2
source:
exclude:
- '*.tmp'
include:
- '*.bin'
path: /another/local/path/dir
The batch_shipyard
property is used to set settings for the tool:
- (required)
storage_account_settings
is a link to the alias of the storage account specified, in this case, it ismystorageaccount
. Batch shipyard requires a general purpose type of storage account for storing metadata in order to execute across a distributed environment. The restriction for a general purpose storage account only applies to this account for Batch Shipyard metadata. Additional storage accounts (of varying types) can be specified in the credentials configuration file and referenced where appropriate. - (optional)
storage_entity_prefix
property is used as a generic qualifier to prefix storage containers (blob containers, tables, queues) with. If not specified, defaults toshipyard
. - (optional)
generated_sas_expiry_days
property is used to set the number of days any non-resource file generated SAS key by Batch Shipyard is valid for. The default is effectively unlimited. This is useful if you want to set SAS keys are only valid for a preferred period of time. - (optional)
autogenerated_task_id
controls how autogenerated task ids are named. Note that the total length of an autogenerated task id must not exceed 64 characters.- (optional)
prefix
is the task prefix to use with the task id. This can be any combination of alphanumeric characters including hyphens and underscores. Empty string is permitted for theprefix
. The default istask-
. - (optional)
zfill_width
is the number of zeros to left pad the integral task number. This can be set to zero which may be useful for task dependency range scenarios in combination with an empty stringprefix
above. The default is5
.
- (optional)
- (optional)
encryption
object is used to define credential encryption which contains the following members:- (required)
enabled
property enables or disables this feature. - (required)
pfx
object defines the PFX certificate- (required)
filename
property is the full path and name to the PFX certificate - (required)
passphrase
property is the passphrase for the PFX certificate. This cannot be empty. - (optional)
sha1_thumbprint
is the SHA1 thumbprint of the certificate. If the PFX file is created using thecert create
command, then the SHA1 thumbprint is output. It is recommended to populate this property such that it does not have to be generated when needed for encryption.
- (required)
- (optional)
public_key_pem
property is the full path and name to the RSA public key in PEM format. If the PFX file is created using thecert create
command, then this file is generated along with the PFX file. It is recommended to populate this property with the PEM file path such that it does not have to be generated when needed for encryption.
- (required)
- (optional)
fallback_registry
is a property that designates a Docker registry to use as a fallback for retrieving Batch Shipyard system images required to bootstrap each compute node. This is useful to minimize occurrences when Docker Hub is experiencing an outage or degradation. If this property is populated, then the associated login information for this registry server must be specified in the credentials configuration underdocker_registry
. Note that this registry must follow naming conventions exactly as if on Docker Hub, except images are naturally prefixed with the server name. To easily replicate/mirror the requisite Batch Shipyard images, please see the commandmisc mirror-images
. This command should be run for every Batch Shipyard version that you intend to use in conjunction with this option. - (optional)
delay_docker_image_preload
controls when to perform Docker image preloading fornative
Linux pools only. If this property is set totrue
fornative
Linux pools, then Docker images are loaded during the node prep phase (i.e., the Azure Batch start task). Advantages to delaying preloading to this phase is to decouple potential image preload failures with other problems that can cause a node to go in unusable state. Additionally, enabling this feature will allow configuration ofdata_replication
options (see below) fornative
Linux pools, includingconcurrent_source_downloads
tuning and other peer-to-peer options. This option has no effect on non-native
pools as images are always "delay" preloaded. Similarly, this option has no effect on Windows pools.
data_replication
is an entirely optional section to exert fine-grained
control of the download and data replication behavior for container images.
- (optional)
data_replication
property is used to configure the internal image replication mechanism between compute nodes within a compute pool. Theconcurrent_source_downloads
property specifies the number of nodes that can concurrently download the source images in parallel. The default, if not specified, is 10.
global_resources
contains properties for populating each compute node
with required container images and for data movement directives.
- (required)
global_resources
property contains information regarding required container images, volume configuration and data ingress information. This property is required.- (optional)
additional_registries
specifies any additional registry login information to load on to the pool, as specified in the credentials configuration. Do not specify any registries here that are already part of eitherdocker_images
orsingularity_images
below. This option is mainly for accessing container registries that do not have associated images with them to preload on to the pool.- (optional)
docker
specifies a list of Docker registries to load. If these require login credentials, they must be specified in the credentials configuration file. - (optional)
singularity
specifies a list of Singularity registries to load.
- (optional)
- (required if using Docker)
docker_images
is an array of Docker images that should be installed on every compute node when this configuration file is supplied while creating a compute pool. Image tags are supported. Image names should be fully qualified including any registry server name prefix (unless it exists in Docker Hub and can be omitted). If you are referencing a private registry that requires a login, then you must add the credential for the registry in thedocker_registry
property in the credentials file. If this property is empty or is not specified, no Docker images will be pre-loaded on to compute nodes which will lead to increased task startup latency. It is highly recommended not to leave this property empty if possible. Note that if you do not specify Docker images to preload, you must specifyallow_run_on_missing_image
astrue
in your job specification for any tasks that reference images that aren't specified in this property. - (required if using Singularity)
singularity_images
property contains all the Singularity images that should be installed on every compute node when this configuration file is supplied while creating a compute pool. Image tags are supported. Image names should be fully qualified including any registry server name prefix. If you are referencing a private registry that requires a login, then you must add the credential for the registry in thesingularity_registry
property in the credentials file. If this property is empty or is not specified, no Singularity images will be pre-loaded on to compute nodes which will lead to increased latency to begin task execution. It is highly recommended not to leave this property empty if possible. Due to Singularity limitations, if the image specified at a certain URI changes, the image will automatically be pulled again from the registry the next time that the image is used in a task which can lead to increased latency to begin task execution if the image differs from a previous pull, and lead to potential inconsistencies between task executions. Note thatsingularity_images
is incompatible withnative
container support enabled pools. For encrypted container support, please see the Singularity Encrypted Containers documentation for more details.- (optional)
unsigned
is a list of Singularity images that will not be verified when installing on every compute node.shub://
,docker://
,library://
, andoras://
URI prefixes are supported.- (required)
image
is the unsigned Singularity image. - (optional)
encryption
is the image encryption properties. Only images encrypted with an asymmetric RSA key pair are currently supported in Batch Shipyard.- (required)
certificate
is the PFX decryption certificate with the appropriate private key that has been bound to the Batch account. This cannot be a CER certificate as a private key is required for image decryption.- (required)
sha1_thumbprint
is the associated SHA-1 thumbprint of the certificate. This must be associated with the PFX with the private key.
- (required)
- (required)
- (required)
- (optional)
signed
is a list of objects containing the Singularity image that will be verified when installing on every compute node as well as the information to verify the image.library://
, andoras://
URI prefixes are supported.- (required)
image
is the signed Singularity image. - (required)
signing_key
is the signing key properties.- (required)
fingerprint
is the key fingerprint of the Singularity image to verify. If nokey_file
is specified, it uses this key fingerprint to pull the key from the default key server "https://keys.sylabs.io" - (optional)
file
is a local path to a public key file. The key fingerprint of the key infile
must match thefingerprint
.
- (required)
- (optional)
encryption
is the image encryption properties. Only images encrypted with an asymmetric RSA key pair are currently supported in Batch Shipyard.- (required)
certificate
is the PFX decryption certificate with the appropriate private key that has been bound to the Batch account. This cannot be a CER certificate as a private key is required for image decryption.- (required)
sha1_thumbprint
is the associated SHA-1 thumbprint of the certificate. This must be associated with the PFX with the private key.
- (required)
- (required)
- (required)
- (optional)
- (optional)
files
property specifies data that should be ingressed from a location accessible by the local machine (i.e., machine invokingshipyard.py
to a shared file system location accessible by compute nodes in the pool or Azure Blob or File Storage).files
is a list of objects, which allows for multiple sources to destinations to be ingressed during the same invocation. Note that no Azure Batch environment variables (i.e.,$AZ_BATCH_
-style environment variables) are available as path arguments since ingress actions performed withinfiles
are done locally on the machine invokingshipyard.py
. Each object within thefiles
list contains the following members:- (required)
source
property contains the following members:- (required)
path
is a local path. A single file or a directory can be specified. Filters below will be ignored ifpath
is a file and not a directory.
- (required)
- (optional)
include
is an array of Unix shell-style wildcard filters where only files matching a filter are included in the data transfer. - (optional)
exclude
is an array of Unix shell-style wildcard filters where files matching a filter are excluded from the data transfer. Filters specified inexclude
have precedence over filters specified ininclude
. - (required)
destination
property contains the following members:- (required or optional)
shared_data_volume
orstorage_account_settings
for data ingress to a GlusterFS volume or Azure Blob or File Storage. If you are ingressing to a pool with only one compute node, you may omitshared_data_volumes
. Otherwise, you may specify one or the other, but not both in the same object. Please see below in theshared_data_volumes
for information on how to set up a GlusterFS share. - (required or optional)
relative_destination_path
specifies a relative destination path to place the files, with respect to the target root. If transferring to ashared_data_volume
then this is relative to the GlusterFS volume root. If transferring to a pool with one single node in it, thus, noshared_data_volume
is specified in the prior property, then this is relative to $AZ_BATCH_NODE_ROOT_DIR. To place files directly in$AZ_BATCH_NODE_ROOT_DIR
(not recommended), you can specify this property as empty string when not ingressing to ashared_data_volume
. Note that ifscp
is selected while attempting to transfer directly to this aforementioned path, thenscp
will fail with exit code of 1 but the transfer will have succeeded (this is due to some of the permission options). If this property is not specified for ashared_data_volume
, then files will be placed directly in the GlusterFS volume root. This property cannot be specified for a Azure Storage destination (i.e.,storage_account_settings
).
- (required or optional)
- (required)
data_transfer
specifies how the transfer should take place. The following list contains members for GlusterFS ingress when a GlusterFS volume is provided forshared_data_volume
(see below for ingressing to Azure Blob or File Storage):- (required)
method
specified which method should be used to ingress data, which should be one of:scp
,multinode_scp
,rsync+ssh
ormultinode_rsync+ssh
.scp
will use secure copy to copy a file or a directory (recursively) to the remote share path.multinode_scp
will attempt to simultaneously transfer files to many compute nodes usingscp
at the same time to speed up data transfer.rsync+ssh
will perform an rsync of files through SSH.multinode_rsync+ssh
will attempt to simultaneously transfer files usingrsync
to many compute nodes at the same time to speed up data transfer with. Note that you may specify themultinode_*
methods even with only 1 compute node in a pool which will allow you to take advantage ofmax_parallel_transfers_per_node
below. - (optional)
ssh_private_key
location of the SSH private key for the username specified in thepool_specification
:ssh
section when connecting to compute nodes. The default isid_rsa_shipyard
, if omitted, which is automatically generated if no SSH key is specified when an SSH user is added to a pool. - (optional)
scp_ssh_extra_options
are any extra options to pass toscp
orssh
forscp
/multinode_scp
orrsync+ssh
/multinode_rsync+ssh
methods, respectively. In the example above,-C
enables compression and-c [email protected]
is passed toscp
, which can potentially increase the transfer speed by selecting the[email protected]
cipher which can exploit Intel AES-NI. - (optional)
rsync_extra_options
are any extra options to pass torsync
for thersync+ssh
/multinode_rsync+ssh
transfer methods. This property is ignored for non-rsync transfer methods. - (optional)
split_files_megabytes
splits files into chunks with the specified size in MiB. This can potentially help with very large files. This option forces the transfermethod
tomultinode_scp
. Note that the destination file system must be able to accommodate up to 2x the size of files which are split. Additionally, transfers involving files which are split will incur reconstruction costs after the transfer is complete, which will increase the total end-to-end ingress time. However, in certain scenarios, by splitting files and transferring chunks in parallel along with reconstruction may end up being faster than transferring a large file without chunking. - (optional)
max_parallel_transfers_per_node
is the maximum number of parallel transfer to invoke per node with themultinode_scp
/multinode_rsync+ssh
methods. For example, if there are 3 compute nodes in the pool, and2
is given for this option, then there will be up to 2 scp sessions in parallel per compute node for a maximum of 6 concurrent scp sessions to the pool. The default is 1 if not specified or omitted.
- (required)
- (required)
data_transfer
specifies how the transfer should take place. When Azure Blob or File Storage is selected as the destination for data ingress, blobxfer is invoked. The following list contains members for Azure Blob or File Storage ingress when a storage account link is provided forstorage_account_settings
:- (required)
remote_path
is required when uploading to Azure Storage. This property is the full path to the storage resource, including the container or file share name and all virtual directories. The container or file share need not be created beforehand. - (optional)
is_file_share
specifies if the destination is a Azure File Share rather than an Azure Blob Container. The default isfalse
. - (optional)
blobxfer_extra_options
are any extra options to pass toblobxfer
. Please runblobxfer -h
to see available extra options that may be pertinent to your scenario.
- (required)
- (required)
- (optional)
volumes
property can consist of two different types of volumes:data_volumes
andshared_data_volumes
.data_volumes
can be of two flavors depending upon ifhost_path
is set to null or not. In the former, this is typically used with theVOLUME
keyword in Dockerfiles to initialize a data volume with existing data inside the image. Ifhost_path
is set, then the path on the host is mounted in the container at the path specified withcontainer_path
.- (required)
host_path
host path to bind - (optional)
container_path
container path to map to the host path. If not specified, the samehost_path
is used in the container. - (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
- (required)
- (optional)
shared_data_volumes
property defines persistent shared storage volumes. In the first shared volume,azurefilevol
is the alias of this volume (please see the following section for information regarding other types of supportedshared_data_volumes
types:volume_driver
property specifies the Docker Volume Driver to use. Currently Batch Shipyard supportsazureblob
,azurefile
,glusterfs_on_compute
,storage_cluster
, orcustom_linux_mount
as thevolume_driver
. For this volume (azurefilevol
), as this is an Azure File shared volume, thevolume_driver
should be set asazurefile
.storage_account_settings
is a link to the alias of the storage account specified that holds this Azure File Share. Note that when usingazurefile
for a shared data volume, the storage account that holds the file share must reside within the same Azure region as the Azure Batch compute pool for certain Linux host operating systems. Attempting to mount an Azure File share that is cross-region for operating systems that do not support such functionality will result in failure as those Linux Samba clients do not support share level encryption at this time.azure_file_share_name
is the name of the share name on Azure Files. Note that the Azure File share must be created beforehand, the toolkit does not create Azure File shares, it only mounts them to the compute nodes.container_path
is the path in the container to mount.mount_options
are the mount options to pass to the mount command. This option is ignored on Windows pools. It is recommended to use0777
for bothfile_mode
anddir_mode
on Linux pools as theuid
andgid
cannot be reliably determined before the compute pool is allocated and this volume will be mounted as the root user.- (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
- (optional)
Important note: Specifying a shared_data_volumes
property and any
number of shared data volumes does not automatically bind these specified
mounts to the container when a task is run. Binding of the mount to the
container when the task is run is specified in the
jobs configuration on a per job
or per task basis.
The second shared volume, azureblobvol
is an Azure Blob storage container
mount via blobfuse. Please
carefully review the limitations with using blobfuse and may not necessarily
be the best fit for your workload. If not, consider ingressing and/or
egressing your data from/to blobs using the data movement capabilities of
Batch Shipyard. These volumes have the following properties:
- (required)
volume_driver
property should be set asazureblob
. - (required)
storage_account_settings
is a link to the alias of the storage account specified that holds this Azure File Share. - (required)
azure_blob_container_name
is the name of the container on Azure Blob storage. If the Azure Blob container does not exist, it is created. - (required)
container_path
is the path in the container to mount. - (optional)
mount_options
are the mount and FUSE options to pass to the blobfuse mount command. Please see the blobfuse documentation for available options. - (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
The third shared volume, nfs_server
is an NFS server that is to be
mounted on to compute node hosts. The name nfs_server
should match the
remote_fs
:storage_cluster
:id
specified as your NFS server. These NFS
servers can be configured using the fs
command in Batch Shipyard. These
volumes have the following properties:
- (required)
volume_driver
property should be set asstorage_cluster
. - (required)
container_path
is the path in the container to mount. - (optional)
mount_options
property defines additional mount options to pass when mounting this file system to the compute node. - (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
The fourth shared volume, glusterfs_cluster
is a GlusterFS cluster that is
mounted on to compute node hosts. The name glusterfs_cluster
should match
the remote_fs
:storage_cluster
:id
specified as your GlusterFS cluster.
These GlusterFS clusters can be configured using the fs
command in Batch
Shipyard. These volumes have the following properties:
- (required)
volume_driver
property should be set asstorage_cluster
. - (required)
container_path
is the path in the container to mount. - (optional)
mount_options
property defines additional mount options to pass when mounting this file system to the compute node. - (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
The fifth shared volume, glustervol
, is a
GlusterFS network file system. Please note that
glusterfs_on_compute
are GlusterFS volumes co-located on the VM's temporary
local disk space which is a shared resource. Sizes of the local temp disk for
each VM size can be found
here.
If specifying a glusterfs_on_compute
volume, you must enable internode
communication in the pool configuration file. These volumes have the following
properties:
- (required)
volume_driver
property should be set asglusterfs_on_compute
. - (required)
container_path
is the path in the container to mount. - (optional)
volume_type
property defines the GlusterFS volume type. Currently,replica
is the only supported type. - (optional)
volume_options
property defines additional GlusterFS volume options to set. - (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
glusterfs_on_compute
volumes are mounted on the host at
$AZ_BATCH_NODE_ROOT_DIR/mounts/gluster_on_compute/gv0
. Batch Shipyard will
automatically replace container path references in direct and storage-based
data ingress/egress with their host path equivalents.
Note that when resizing a pool with a glusterfs_on_compute
shared file
systems that you must resize with the pool resize
command in shipyard.py
and not with Azure Portal, Batch Labs or any other tool.
The sixth shared volume, custom_vol
is a custom Linux mount volume. This can
be used to specify a custom filesystem mount where you would join the Batch
compute nodes to an existing filesystem that is accessible (within the virtual
network or publicly). Note that if the software and userland utilities do
not exist by default on the host, mounting of these custom volumes will
fail. Ensure that you have either populated the pool
additional_node_prep_commands
:pre
with the proper commands to install
the software or have prepared a custom image with the appropriate software.
These volumes have the following properties:
- (required)
volume_driver
property should be set ascustom_linux_mount
. - (required)
container_path
is the path in the container to mount. - (required)
fstab_entry
are the required fstab components:- (required)
fs_spec
is the first field, which is the block special device or the remote filesystem to be mounted - (required)
fs_vfstype
is the third field, which is the filesystem type - (optional)
fs_mntops
is the fourth field, which is the mount options associated by the filesystem. If this is omitted,defaults
is supplied. Note thatmount_options
property used in other shared data volumes is not used. - (optional)
fs_freq
is the fifth field, which is used by dump - (optional)
fs_passno
is the sixth field, which is used by fsck
- (required)
- (optional)
bind_options
are the bind options to use, typically one ofro
for read-only,rw
for read-write. If unspecified ornull
, this defaults torw
.
Finally, note that all volumes
can be omitted completely along with
one or all of data_volumes
and shared_data_volumes
if you do not require
this functionality.
A full template of a credentials file can be found here. Note that these templates cannot be used as-is and must be modified to fit your scenario.