Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Commit

Permalink
Enable CUDA 11.0 on nightly development builds (#19295)
Browse files Browse the repository at this point in the history
Remove CUDA 9.2 and CUDA 10.0
  • Loading branch information
waytrue17 authored and Rohit Kumar Srivastava committed Feb 19, 2021
1 parent df60158 commit e7d5935
Show file tree
Hide file tree
Showing 18 changed files with 273 additions and 106 deletions.
2 changes: 1 addition & 1 deletion cd/Jenkinsfile_cd_pipeline
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ pipeline {

parameters {
// Release parameters
string(defaultValue: "cpu,native,cu100,cu101,cu102,cu110", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
string(defaultValue: "cpu,native,cu101,cu102,cu110", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
booleanParam(defaultValue: false, description: 'Whether this is a release build or not', name: "RELEASE_BUILD")
}

Expand Down
2 changes: 1 addition & 1 deletion cd/Jenkinsfile_release_job
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ pipeline {
// any disruption caused by different COMMIT_ID values chaning the job parameter configuration on
// Jenkins.
string(defaultValue: "mxnet_lib/static", description: "Pipeline to build", name: "RELEASE_JOB_TYPE")
string(defaultValue: "cpu,native,cu100,cu101,cu102,cu110", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
string(defaultValue: "cpu,native,cu101,cu102,cu110", description: "Comma separated list of variants", name: "MXNET_VARIANTS")
booleanParam(defaultValue: false, description: 'Whether this is a release build or not', name: "RELEASE_BUILD")
}

Expand Down
3 changes: 1 addition & 2 deletions cd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,12 @@ MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc. a

The CD process is driven by the [CD pipeline job](Jenkinsfile_cd_pipeline), which orchestrates the order in which the artifacts are delivered. For instance, first publish the libmxnet library before publishing the pip package. It does this by triggering the [release job](Jenkinsfile_release_job) with a specific set of parameters for each delivery channel. The release job executes the specific release pipeline for a delivery channel across all MXNet *variants*.

A variant is a specific environment or features for which MXNet is compiled. For instance CPU, GPU with CUDA v10.0, CUDA v9.0 with MKL-DNN support, etc.
A variant is a specific environment or features for which MXNet is compiled. For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with MKL-DNN support, etc.

Currently, below variants are supported. All of these variants except native have MKL-DNN backend enabled.

* *cpu*: CPU
* *native*: CPU without MKL-DNN
* *cu100*: CUDA 10
* *cu101*: CUDA 10.1
* *cu102*: CUDA 10.2
* *cu110*: CUDA 11.0
Expand Down
2 changes: 1 addition & 1 deletion cd/python/pypi/pypi_package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

set -ex

# variant = cpu, native, cu80, cu100, etc.
# variant = cpu, native, cu101, cu102, etc.
export mxnet_variant=${1:?"Please specify the mxnet variant"}

# Due to this PR: https://github.com/apache/incubator-mxnet/pull/14899
Expand Down
6 changes: 3 additions & 3 deletions cd/utils/artifact_repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

# Artifact Repository - Pushing and Pulling libmxnet

The artifact repository is an S3 bucket accessible only to restricted Jenkins nodes. It is used to store compiled MXNet artifacts that can be used by downstream CD pipelines to package the compiled libraries for different delivery channels (e.g. DockerHub, PyPI, Maven, etc.). The S3 object keys for the files being posted will be prefixed with the following distinguishing characteristics of the binary: branch, commit id, operating system, variant and dependency linking strategy (static or dynamic). For instance, s3://bucket/73b29fa90d3eac0b1fae403b7583fdd1529942dc/ubuntu16.04/cu92mkl/static/libmxnet.so
The artifact repository is an S3 bucket accessible only to restricted Jenkins nodes. It is used to store compiled MXNet artifacts that can be used by downstream CD pipelines to package the compiled libraries for different delivery channels (e.g. DockerHub, PyPI, Maven, etc.). The S3 object keys for the files being posted will be prefixed with the following distinguishing characteristics of the binary: branch, commit id, operating system, variant and dependency linking strategy (static or dynamic). For instance, s3://bucket/73b29fa90d3eac0b1fae403b7583fdd1529942dc/ubuntu16.04/cu102mkl/static/libmxnet.so

An MXNet artifact is defined as the following set of files:

Expand Down Expand Up @@ -53,13 +53,13 @@ If not set, derived through the value of sys.platform (https://docs.python.org/3

**Variant**

Manually configured through the --variant argument. The current variants are: cpu, native, cu92, cu100, cu101, cu102 and cu110.
Manually configured through the --variant argument. The current variants are: cpu, native, cu101, cu102, cu110.

As long as the tool is being run from the MXNet code base, the runtime feature detection tool (https://github.com/larroy/mxnet/blob/dd432b7f241c9da2c96bcb877c2dc84e6a1f74d4/docs/api/python/libinfo/libinfo.md) can be used to detect whether the library has been compiled with MKL (library has MKL-DNN feature enabled) and/or CUDA support (compiled with CUDA feature enabled).

If it has been compiled with CUDA support, the output of /usr/local/cuda/bin/nvcc --version can be mined for the exact CUDA version (eg. 8.0, 9.0, etc.).

By knowing which features are enabled on the binary, and if necessary, which CUDA version is installed on the machine, the value for the variant argument can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda version 10, then the variant would be cu100. If neither MKL-DNN nor CUDA features are enabled, the variant would be native.
By knowing which features are enabled on the binary, and if necessary, which CUDA version is installed on the machine, the value for the variant argument can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda version 10.2, then the variant would be cu102. If neither MKL-DNN nor CUDA features are enabled, the variant would be native.

**Dependency Linking**

Expand Down
12 changes: 0 additions & 12 deletions cd/utils/mxnet_base_image.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,6 @@
mxnet_variant=${1:?"Please specify the mxnet variant as the first parameter"}

case ${mxnet_variant} in
cu80*)
echo "nvidia/cuda:8.0-cudnn7-runtime-ubuntu16.04"
;;
cu90*)
echo "nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04"
;;
cu92*)
echo "nvidia/cuda:9.2-cudnn7-runtime-ubuntu16.04"
;;
cu100*)
echo "nvidia/cuda:10.0-cudnn7-runtime-ubuntu16.04"
;;
cu101*)
echo "nvidia/cuda:10.1-cudnn7-runtime-ubuntu16.04"
;;
Expand Down
14 changes: 7 additions & 7 deletions cd/utils/test_artifact_repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,13 +140,13 @@ def test_get_cuda_version(self, mock):
Tests correct cuda version with the right format is returned
:return:
"""
mock.return_value = b'Cuda compilation tools, release 10.0, V10.0.130'
mock.return_value = b'Cuda compilation tools, release 10.2, V10.2.130'
cuda_version = get_cuda_version()
self.assertEqual(cuda_version, '100')
self.assertEqual(cuda_version, '102')

mock.return_value = b'Cuda compilation tools, release 9.2, V9.2.148'
mock.return_value = b'Cuda compilation tools, release 11.0, V11.0.148'
cuda_version = get_cuda_version()
self.assertEqual(cuda_version, '92')
self.assertEqual(cuda_version, '110')

@patch('artifact_repository.check_output')
def test_get_cuda_version_not_found(self, mock):
Expand Down Expand Up @@ -178,11 +178,11 @@ def test_probe_variant_cpu(self, mock_features):
@patch('artifact_repository.get_cuda_version')
def test_probe_variant_cuda(self, mock_cuda_version, mock_features):
"""
Tests 'cu100' is returned if MKLDNN is OFF and CUDA is ON and CUDA version is 10.0
Tests 'cu102' is returned if MKLDNN is OFF and CUDA is ON and CUDA version is 10.2
"""
mock_features.return_value = {'MKLDNN': True, 'CUDA': True}
mock_cuda_version.return_value = '100'
self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'cu100')
mock_cuda_version.return_value = '102'
self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'cu102')

@patch('artifact_repository.get_libmxnet_features')
def test_probe_variant_cuda_returns_none_on_no_features(self, mock_features):
Expand Down
242 changes: 242 additions & 0 deletions ci/docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# We use the cache_from feature introduced in file form version 3.4 (released 2017-11-01)
version: "3.4"

# For simplicity, only the centos7_cpu is commented. But the comments apply to
# all other services as well.
services:
###################################################################################################
# Dockerfile.build.centos7 based images used for building on CentOS7. On
# CentOS7, we respectively test the oldest supported toolchain and dependency
# versions
###################################################################################################
centos7_cpu:
# The resulting image will be named build.centos7_cpu:latest and will be
# pushed to the dockerhub user specified in the environment variable
# ${DOCKER_CACHE_REGISTRY} (typicall "mxnetci") under this name
image: ${DOCKER_CACHE_REGISTRY}/build.centos7_cpu:latest
build:
context: .
dockerfile: Dockerfile.build.centos7
# Use "base" target declared in Dockerfile.build.centos7 as "build.centos7_cpu:latest"
target: base
args:
# BASE_IMAGE is used to dynamically specify the FROM image in Dockerfile.build.centos7
BASE_IMAGE: centos:7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.centos7_cpu:latest
centos7_gpu_cu101:
image: ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu101:latest
build:
context: .
dockerfile: Dockerfile.build.centos7
target: base
args:
BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-centos7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu101:latest
centos7_gpu_cu102:
image: ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu102:latest
build:
context: .
dockerfile: Dockerfile.build.centos7
target: base
args:
BASE_IMAGE: nvidia/cuda:10.2-cudnn7-devel-centos7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu102:latest
centos7_gpu_cu110:
image: ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu110:latest
build:
context: .
dockerfile: Dockerfile.build.centos7
target: base
args:
BASE_IMAGE: nvidia/cuda:11.0-cudnn8-devel-centos7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.centos7_gpu_cu110:latest
###################################################################################################
# Dockerfile.build.ubuntu based images. On Ubuntu we test more recent
# toolchain and dependency versions compared to CentOS7. We attempt to update
# the Ubuntu base image every 6 months, following the Ubuntu release cycle,
# and testing the dependencies in their version provided by the respective
# Ubuntu release.
###################################################################################################
ubuntu_cpu:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_cpu:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu
target: base
args:
BASE_IMAGE: ubuntu:18.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_cpu:latest
ubuntu_gpu_cu101:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu
target: gpu
args:
BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu101:latest
ubuntu_gpu_cu102:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu102:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu
target: gpu
args:
BASE_IMAGE: nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu102:latest
ubuntu_gpu_cu110:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu110:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu
target: gpu
args:
BASE_IMAGE: nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_gpu_cu110:latest
ubuntu_build_cuda:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_build_cuda:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu
target: gpu
args:
BASE_IMAGE: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_build_cuda:latest
###################################################################################################
# Dockerfile.build.android based images used for testing cross-compilation for plain ARM
###################################################################################################
armv6:
image: ${DOCKER_CACHE_REGISTRY}/build.armv6:latest
build:
context: .
dockerfile: Dockerfile.build.arm
target: armv6
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.armv6:latest
armv7:
image: ${DOCKER_CACHE_REGISTRY}/build.armv7:latest
build:
context: .
dockerfile: Dockerfile.build.arm
target: armv7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.armv7:latest
armv8:
image: ${DOCKER_CACHE_REGISTRY}/build.armv8:latest
build:
context: .
dockerfile: Dockerfile.build.arm
target: armv8
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.armv8:latest
###################################################################################################
# Dockerfile.test.arm based images for testing ARM artefacts via QEMU
###################################################################################################
test.armv7:
image: ${DOCKER_CACHE_REGISTRY}/test.armv7:latest
build:
context: .
dockerfile: Dockerfile.test.arm
args:
BASE_IMAGE: arm32v7/ubuntu:20.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/test.armv7:latest
test.armv8:
image: ${DOCKER_CACHE_REGISTRY}/test.armv8:latest
build:
context: .
dockerfile: Dockerfile.test.arm
args:
BASE_IMAGE: arm64v8/ubuntu:20.04
cache_from:
- ${DOCKER_CACHE_REGISTRY}/test.armv8:latest
###################################################################################################
# Dockerfile.build.android based images used for testing cross-compilation for Android
###################################################################################################
android_armv7:
image: ${DOCKER_CACHE_REGISTRY}/build.android_armv7:latest
build:
context: .
dockerfile: Dockerfile.build.android
target: armv7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.android_armv7:latest
android_armv8:
image: ${DOCKER_CACHE_REGISTRY}/build.android_armv8:latest
build:
context: .
dockerfile: Dockerfile.build.android
target: armv8
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.android_armv8:latest
###################################################################################################
# Dockerfile.publish.test based images used for testing binary artifacts on minimal systems.
###################################################################################################
publish.test.centos7_cpu:
image: ${DOCKER_CACHE_REGISTRY}/publish.test.centos7_cpu:latest
build:
context: .
dockerfile: Dockerfile.publish.test.centos7
args:
BASE_IMAGE: centos:7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/publish.test.centos7_cpu:latest
publish.test.centos7_gpu:
image: ${DOCKER_CACHE_REGISTRY}/publish.test.centos7_gpu:latest
build:
context: .
dockerfile: Dockerfile.publish.test.centos7
args:
BASE_IMAGE: nvidia/cuda:9.2-cudnn7-devel-centos7
cache_from:
- ${DOCKER_CACHE_REGISTRY}/publish.test.centos7_gpu:latest
###################################################################################################
# Miscellaneous containers
###################################################################################################
jetson:
image: ${DOCKER_CACHE_REGISTRY}/build.jetson:latest
build:
context: .
dockerfile: Dockerfile.build.jetson
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.jetson:latest
ubuntu_cpu_jekyll:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_cpu_jekyll:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu_cpu_jekyll
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_cpu_jekyll:latest
ubuntu_blc:
image: ${DOCKER_CACHE_REGISTRY}/build.ubuntu_blc:latest
build:
context: .
dockerfile: Dockerfile.build.ubuntu_blc
cache_from:
- ${DOCKER_CACHE_REGISTRY}/build.ubuntu_blc:latest
2 changes: 1 addition & 1 deletion ci/docker/runtime_functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ build_ubuntu_gpu_mkldnn_release() {

# Compiles the dynamic mxnet library
# Parameters:
# $1 -> mxnet_variant: the mxnet variant to build, e.g. cpu, cu100, cu92mkl, etc.
# $1 -> mxnet_variant: the mxnet variant to build, e.g. cpu, native, cu101, cu102, etc.
build_dynamic_libmxnet() {
set -ex

Expand Down
8 changes: 4 additions & 4 deletions ci/jenkins/Jenkins_steps.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,8 @@ def compile_centos7_gpu() {
ws('workspace/build-centos7-gpu') {
timeout(time: max_time, unit: 'MINUTES') {
utils.init_git()
utils.docker_run('centos7_gpu', 'build_centos7_gpu', false)
utils.pack_lib('centos7_gpu', mx_lib)
utils.docker_run('centos7_gpu_cu102', 'build_centos7_gpu', false)
utils.pack_lib(lib_name, mx_lib)
}
}
}
Expand Down Expand Up @@ -1231,8 +1231,8 @@ def test_centos7_python3_cpu() {
ws('workspace/build-centos7-cpu') {
timeout(time: max_time, unit: 'MINUTES') {
try {
utils.unpack_and_init('centos7_cpu', mx_lib)
utils.docker_run('centos7_cpu', 'unittest_centos7_cpu', false)
utils.unpack_and_init(lib_name, mx_lib)
utils.docker_run('centos7_gpu_cu102', 'unittest_centos7_gpu', true)
utils.publish_test_coverage()
} finally {
utils.collect_test_results_unix('nosetests_unittest.xml', 'nosetests_python3_centos7_cpu_unittest.xml')
Expand Down
Loading

0 comments on commit e7d5935

Please sign in to comment.