From 0e09e7bf03c152e23606e39ac847caffd8bd3056 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Sat, 7 Dec 2024 22:15:34 -0800 Subject: [PATCH 01/27] Update doc --- doc/contrib/ci.rst | 424 ++++++++++++++++++++++++++------------------- 1 file changed, 246 insertions(+), 178 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index d6effa0b09d4..0006002bf8c4 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -11,78 +11,78 @@ project. :backlinks: none :local: -************** -GitHub Actions -************** -We make the extensive use of `GitHub Actions `_ to host our -CI pipelines. Most of the tests listed in the configuration files run automatically for every -incoming pull requests and every update to branches. A few tests however require manual activation: - -* R tests with ``noLD`` option: Run R tests using a custom-built R with compilation flag - ``--disable-long-double``. See `this page `_ for more - details about noLD. This is a requirement for keeping XGBoost on CRAN (the R package index). - To invoke this test suite for a particular pull request, simply add a review comment - ``/gha run r-nold-test``. (Ordinary comment won't work. It needs to be a review comment.) - -******************************* -Self-Hosted Runners with RunsOn -******************************* - -`RunsOn `_ is a SaaS (Software as a Service) app that lets us to easily create -self-hosted runners to use with GitHub Actions pipelines. RunsOn uses -`Amazon Web Services (AWS) `_ under the hood to provision runners with -access to various amount of CPUs, memory, and NVIDIA GPUs. Thanks to this app, we are able to test -GPU-accelerated and distributed algorithms of XGBoost while using the familar interface of -GitHub Actions. - -In GitHub Actions, jobs run on Microsoft-hosted runners by default. -To opt into self-hosted runners (enabled by RunsOn), we use the following special syntax: - -.. code-block:: yaml +**************** +Tips for testing +**************** + +==================================== +Running R tests with ``noLD`` option +==================================== +You can run R tests using a custom-built R with compilation flag +``--disable-long-double``. See `this page `_ for more +details about noLD. This is a requirement for keeping XGBoost on CRAN (the R package index). +Unlike other tests, this test must be invoked manually. Simply add a review comment +``/gha run r-nold-test`` to a pull request to kick off the test. +(Ordinary comment won't work. It needs to be a review comment.) + +=============================== +Making changes to CI containers +=============================== +Many of the CI pipelines use Docker containers to ensure consistent testing environment +with a variety of software packages. We have a separate repo, +`dmlc/xgboost-devops `_, to host the logic for +building and publishing CI containers. + +To make changes to the CI container, carry out the following steps: + +1. Identify which container needs updating. Example: + ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main`` +2. Clone `dmlc/xgboost-devops `_ and make changes to the + corresponding Dockerfile. Example: ``containers/dockerfile/Dockerfile.gpu``. +3. Locally build the container, to ensure that the container successfully builds. + Consult :ref:`build_run_docker_locally` for this step. +4. Submit a pull request to `dmlc/xgboost-devops `_ with + the proposed changes to the Dockerfile. Make note of the pull request number. Example: ``#204`` +5. Clone `dmlc/xgboost `_ and update all references to the + old container to point to the new container. More specifically, all Docker tags of format + ``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main`` should have the last + component replaced with ``PR-#``, where ``#`` is the pull request number. For the example above, + we'd replace ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main`` with + ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:PR-204``. +6. Now submit a pull request to `dmlc/xgboost `_. The CI will + run tests using the new container. Verify that all tests pass. +7. Merge the pull request in ``dmlc/xgboost-devops``. +8. Merge the pull request in ``dmlc/xgboost``. - runs-on: - - runs-on - - runner=runner-name - - run-id=${{ github.run_id }} - - tag=[unique tag that uniquely identifies the job in the GH Action workflow] +.. _build_run_docker_locally: -where the runner is defined in ``.github/runs-on.yml``. +=========================================== +Reproducing CI testing environments locally +=========================================== +You can reproduce the same testing environment as the CI pipelines by building and running Docker +containers locally. -********************************************************* -Reproduce CI testing environments using Docker containers -********************************************************* -In our CI pipelines, we use Docker containers extensively to package many software packages together. -You can reproduce the same testing environment as the CI pipelines by running Docker locally. +**Prerequisites** -============= -Prerequisites -============= 1. Install Docker: https://docs.docker.com/engine/install/ubuntu/ 2. Install NVIDIA Docker runtime: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. The runtime lets you access NVIDIA GPUs inside a Docker container. -.. _build_run_docker_locally: - -============================================== -Building and Running Docker containers locally -============================================== -For your convenience, we provide three wrapper scripts: - -* ``ops/docker_build.py``: Build a Docker container -* ``ops/docker_build.sh``: Wrapper for ``ops/docker_build.py`` with a more concise interface -* ``ops/docker_run.py``: Run a command inside a Docker container - -**To build a Docker container**, invoke ``docker_build.sh`` as follows: +**To build a Docker container**, clone the repository `dmlc/xgboost-devops `_ +and invoke ``containers/docker_build.sh`` as follows: .. code-block:: bash - export BRANCH_NAME="master" # Relevant for CI, for local testing, use "master" - bash ops/docker_build.sh CONTAINER_ID + # The following env vars are only relevant for CI + # For local testing, set them to "main" + export GITHUB_SHA="main" + export BRANCH_NAME="main" + bash containers/docker_build.sh CONTAINER_ID where ``CONTAINER_ID`` identifies for the container. The wrapper script will look up the YAML file -``ops/docker/ci_container.yml``. For example, when ``CONTAINER_ID`` is set to ``xgb-ci.gpu``, -the script will use the corresponding entry from ``ci_container.yml``: +``containers/ci_container.yml``. For example, when ``CONTAINER_ID`` is set to ``xgb-ci.gpu``, +the script will use the corresponding entry from ``containers/ci_container.yml``: .. code-block:: yaml @@ -94,9 +94,9 @@ the script will use the corresponding entry from ``ci_container.yml``: RAPIDS_VERSION_ARG: "24.10" The ``container_def`` entry indicates where the Dockerfile is located. The container -definition will be fetched from ``ops/docker/dockerfile/Dockerfile.CONTAINER_DEF`` where +definition will be fetched from ``containers/dockerfile/Dockerfile.CONTAINER_DEF`` where ``CONTAINER_DEF`` is the value of ``container_def`` entry. In this example, the Dockerfile -is ``ops/docker/dockerfile/Dockerfile.gpu``. +is ``containers/dockerfile/Dockerfile.gpu``. The ``build_args`` entry lists all the build arguments for the Docker build. In this example, the build arguments are: @@ -108,38 +108,19 @@ the build arguments are: The build arguments provide inputs to the ``ARG`` instructions in the Dockerfile. -.. note:: Inspect the logs from the CI pipeline to find what's going on under the hood - - When invoked, ``ops/docker_build.sh`` logs the precise commands that it runs under the hood. - Using the example above: - - .. code-block:: bash +When ``containers/docker_build.sh`` completes, you will have access to the container with tag +``492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main``. The prefix +``492475357299.dkr.ecr.us-west-2.amazonaws.com/`` was added so that the container could +later be uploaded to AWS Elastic Container Registry (ECR), a private Docker registry. - # docker_build.sh calls docker_build.py... - python3 ops/docker_build.py --container-def gpu --container-id xgb-ci.gpu \ - --build-arg CUDA_VERSION_ARG=12.4.1 --build-arg NCCL_VERSION_ARG=2.23.4-1 \ - --build-arg RAPIDS_VERSION_ARG=24.10 - - ... - - # .. and docker_build.py in turn calls "docker build"... - docker build --build-arg CUDA_VERSION_ARG=12.4.1 \ - --build-arg NCCL_VERSION_ARG=2.23.4-1 \ - --build-arg RAPIDS_VERSION_ARG=24.10 \ - --load --progress=plain \ - --ulimit nofile=1024000:1024000 \ - -t xgb-ci.gpu \ - -f ops/docker/dockerfile/Dockerfile.gpu \ - ops/ - - The logs come in handy when debugging the container builds. In addition, you can change - the build arguments to make changes to the container. - -**To run commands within a Docker container**, invoke ``docker_run.py`` as follows: +**To run commands within a Docker container**, invoke ``ops/docker_run.py`` from +the main ``dmlc/xgboost`` repo as follows: .. code-block:: bash - python3 ops/docker_run.py --container-id "ID of the container" [--use-gpus] \ + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/[container_id]:main \ + [--use-gpus] \ -- "command to run inside the container" where ``--use-gpus`` should be specified to expose NVIDIA GPUs to the Docker container. @@ -149,83 +130,82 @@ For example: .. code-block:: bash # Run without GPU - python3 ops/docker_run.py --container-id xgb-ci.cpu \ + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ -- bash ops/script/build_via_cmake.sh # Run with NVIDIA GPU - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + --use-gpus \ -- bash ops/pipeline/test-python-wheel-impl.sh gpu -The ``docker_run.py`` script will convert these commands to the following invocations -of ``docker run``: - -.. code-block:: bash - - docker run --rm --pid=host \ - -w /workspace -v /path/to/xgboost:/workspace \ - -e CI_BUILD_UID= -e CI_BUILD_USER= \ - -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ - xgb-ci.cpu \ - bash ops/script/build_via_cmake.sh - - docker run --rm --pid=host --gpus all \ - -w /workspace -v /path/to/xgboost:/workspace \ - -e CI_BUILD_UID= -e CI_BUILD_USER= \ - -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ - xgb-ci.gpu \ - bash ops/pipeline/test-python-wheel-impl.sh gpu - Optionally, you can specify ``--run-args`` to pass extra arguments to ``docker run``: .. code-block:: bash # Allocate extra space in /dev/shm to enable NCCL # Also run the container with elevated privileges - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + --use-gpus \ --run-args='--shm-size=4g --privileged' \ -- bash ops/pipeline/test-python-wheel-impl.sh gpu -which translates to +See :ref:`ci_container_infra` to read about how containers are built and managed in the CI pipelines. -.. code-block:: bash +***************************** +Tour of the CI infrastructure +***************************** - docker run --rm --pid=host --gpus all \ - -w /workspace -v /path/to/xgboost:/workspace \ - -e CI_BUILD_UID= -e CI_BUILD_USER= \ - -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ - --shm-size=4g --privileged \ - xgb-ci.gpu \ - bash ops/pipeline/test-python-wheel-impl.sh gpu +============== +GitHub Actions +============== +We make the extensive use of `GitHub Actions `_ to host our +CI pipelines. Most of the tests listed in the configuration files run automatically for every +incoming pull requests and every update to branches. + +=============================== +Self-Hosted Runners with RunsOn +=============================== +`RunsOn `_ is a SaaS (Software as a Service) app that lets us to easily create +self-hosted runners to use with GitHub Actions pipelines. RunsOn uses +`Amazon Web Services (AWS) `_ under the hood to provision runners with +access to various amount of CPUs, memory, and NVIDIA GPUs. Thanks to this app, we are able to test +GPU-accelerated and distributed algorithms of XGBoost while using the familar interface of +GitHub Actions. + +In GitHub Actions, jobs run on Microsoft-hosted runners by default. +To opt into self-hosted runners (enabled by RunsOn), we use the following special syntax: -******************************************************************* +.. code-block:: yaml + + runs-on: + - runs-on + - runner=runner-name + - run-id=${{ github.run_id }} + - tag=[unique tag that uniquely identifies the job in the GH Action workflow] + +where the runner is defined in ``.github/runs-on.yml``. + +=================================================================== The Lay of the Land: how CI pipelines are organized in the codebase -******************************************************************* +=================================================================== The XGBoost project stores the configuration for its CI pipelines as part of the codebase. The git repository therefore stores not only the change history for its source code but also the change history for the CI pipelines. -================= -File Organization -================= - The CI pipelines are organized into the following directories and files: * ``.github/workflows/``: Definition of CI pipelines, using the GitHub Actions syntax * ``.github/runs-on.yml``: Configuration for the RunsOn service. Specifies the spec for the self-hosted CI runners. * ``ops/conda_env/``: Definitions for Conda environments -* ``ops/packer/``: Packer scripts to build VM images for Amazon EC2 * ``ops/patch/``: Patch files * ``ops/pipeline/``: Shell scripts defining CI/CD pipelines. Most of these scripts can be run locally (to assist with development and debugging); a few must run in the CI. * ``ops/script/``: Various utility scripts useful for testing -* ``ops/docker/dockerfile/``: Dockerfiles to define containers -* ``ops/docker/ci_container.yml``: Defines the mapping between Dockerfiles and containers. - Also specifies the build arguments to be used with each container. See - :ref:`build_run_docker_locally` to learn how this YAML file is used in the context of - a container build. -* ``ops/docker_build.*``: Wrapper scripts to build and test CI containers. See - :ref:`build_run_docker_locally` for the detailed description. +* ``ops/docker_run.py``: Wrapper script to run commands inside a container To inspect a given CI pipeline, inspect files in the following order: @@ -255,41 +235,37 @@ To inspect a given CI pipeline, inspect files in the following order: :align: center :figwidth: 80 % -=================================== -Primitives used in the CI pipelines -=================================== +Many of the CI pipelines use Docker containers to ensure consistent testing environment +with a variety of software packages. We have a separate repo, +`dmlc/xgboost-devops `_, that +hosts the code for building the CI containers. The repository is organized as follows: ------------------------- -Build and run containers ------------------------- +* ``actions/``: Custom actions to be used with GitHub Actions. See :ref:`custom_actions` + for more details. +* ``containers/dockerfile/``: Dockerfiles to define containers +* ``containers/ci_container.yml``: Defines the mapping between Dockerfiles and containers. + Also specifies the build arguments to be used with each container. +* ``containers/docker_build.{py,sh}``: Wrapper scripts to build and test CI containers. See :ref:`build_run_docker_locally` to learn about the utility scripts for building and using containers. -**What's the relationship between the VM image (for Amazon EC2) and the container image?** -In ``ops/packer/`` directory, we define Packer scripts to build VM images for Amazon EC2. -The VM image contains the minimal set of drivers and system software that are needed to -run the containers. - -We update container images much more often than VM images. Whereas VM images are -updated sparingly (once in a few months), container images are updated each time a branch -or a pull request is updated. This way, developers can make changes to containers and -see the results of the changes immediately in the CI run. +=========================================== +Artifact sharing between jobs via Amazon S3 +=========================================== ------------------------------------------- -Stash artifacts, to move them between jobs ------------------------------------------- +We make artifacts from one workflow job available to another job, by uploading the +artifacts to `Amazon S3 `_. In the CI, we utilize the +script ``ops/pipeline/stash-artifacts.sh`` to coordinate artifact sharing. -This primitive is useful when one pipeline job needs to consume the output -from another job. -We use `Amazon S3 `_ to store the stashed files. +**To stash a file**: In the workflow YAML, add the following lines: -**To stash a file**: - -.. code-block:: bash +.. code-block:: yaml - REMOTE_PREFIX="remote directory to place the artifact(s)" - bash ops/pipeline/stash-artifacts.sh stash "${REMOTE_PREFIX}" path/to/file + - name: Stash files + run: | + REMOTE_PREFIX="remote directory to place the artifact(s)" + bash ops/pipeline/stash-artifacts.sh stash "${REMOTE_PREFIX}" path/to/file The ``REMOTE_PREFIX`` argument, which is the second command-line argument for ``stash-artifacts.sh``, specifies the remote directory in which the artifact(s) @@ -301,32 +277,38 @@ variable ``RUNS_ON_S3_BUCKET_CACHE``.) You can upload multiple files, possibly with wildcard globbing: -.. code-block:: bash +.. code-block:: yaml - REMOTE_PREFIX="build-cuda" - bash ops/pipeline/stash-artifacts.sh stash "${REMOTE_PREFIX}" \ - build/testxgboost python-package/dist/*.whl + - name: Stash files + run: | + bash ops/pipeline/stash-artifacts.sh stash build-cuda \ + build/testxgboost python-package/dist/*.whl **To unstash a file**: -.. code-block:: bash +.. code-block:: yaml - REMOTE_PREFIX="remote directory to place the artifact(s)" - bash ops/pipeline/stash-artifacts.sh unstash "${REMOTE_PREFIX}" path/to/file + - name: Stash files + run: | + REMOTE_PREFIX="remote directory to place the artifact(s)" + bash ops/pipeline/stash-artifacts.sh unstash "${REMOTE_PREFIX}" path/to/file You can also use the wildcard globbing. The script will download the matching artifacts from the remote directory. -.. code-block:: bash +.. code-block:: yaml - REMOTE_PREFIX="build-cuda" - # Download all files whose path matches the wildcard pattern python-package/dist/*.whl - bash ops/pipeline/stash-artifacts.sh unstash "${REMOTE_PREFIX}" \ - python-package/dist/*.whl + - name: Stash files + run: | + # Download all files whose path matches the wildcard pattern python-package/dist/*.whl + bash ops/pipeline/stash-artifacts.sh unstash build-cuda \ + python-package/dist/*.whl ------------------------------------------ -Custom actions in ``dmlc/xgboost-devops`` ------------------------------------------ +.. _custom_actions: + +================================= +Custom actions for GitHub Actions +================================= XGBoost implements a few custom `composite actions `_ @@ -334,22 +316,108 @@ to reduce duplicated code within workflow YAML files. The custom actions are hos `dmlc/xgboost-devops `_, to make it easy to test changes to the custom actions in a pull request or a fork. -In a workflow file, we'd refer to ``dmlc/xgboost-devops/{custom-action}@main``. For example: +In a workflow file, we'd refer to ``dmlc/xgboost-devops/actions/{custom-action}@main``. For example: .. code-block:: yaml - - uses: dmlc/xgboost-devops/miniforge-setup@main + - uses: dmlc/xgboost-devops/actions/miniforge-setup@main with: environment-name: cpp_test environment-file: ops/conda_env/cpp_test.yml Each custom action consists of two components: -* Main script (``dmlc/xgboost-devops/{custom-action}/action.yml``): dispatches to a specific version +* Main script (``dmlc/xgboost-devops/actions/{custom-action}/action.yml``): dispatches to a specific version of the implementation script (see the next item). The main script clones ``xgboost-devops`` from a specified fork at a particular ref, allowing us to easily test changes to the custom action. -* Implementation script (``dmlc/xgboost-devops/impls/{custom-action}/action.yml``): Implements the +* Implementation script (``dmlc/xgboost-devops/actions/impls/{custom-action}/action.yml``): Implements the custom script. This design was inspired by Mike Sarahan's work in `rapidsai/shared-actions `_. + + +.. _ci_container_infra: + +=============================================== +Infra for building and publishing CI containers +=============================================== + +-------------------------- +CI pipeline for containers +-------------------------- +The `dmlc/xgboost-devops `_ repo hosts a CI pipeline to build new +containers at a regular schedule. New containers are built in the following occasions: + +* New commits are added to the ``main`` branch of ``dmlc/xgboost-devops``. +* New pull requests are submitted to ``dmlc/xgboost-devops``. +* Every week, at a set day and hour. + +This setup ensures that the CI containers remain up-to-date. + +------------------------ +How wrapper scripts work +------------------------ + +The wrapper scripts ``docker_build.sh``, ``docker_build.py`` (in ``dmlc/xgboost-devops``) and ``docker_run.py`` +(in ``dmlc/xgboost``) are designed to transparently log what commands are being carried out under the hood. +For example, when you run ``bash containers/docker_build.sh xgb-ci.gpu``, the logs will show the following: + +.. code-block:: bash + + # docker_build.sh calls docker_build.py... + python3 containers/docker_build.py --container-def gpu \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + --build-arg CUDA_VERSION_ARG=12.4.1 --build-arg NCCL_VERSION_ARG=2.23.4-1 \ + --build-arg RAPIDS_VERSION_ARG=24.10 + + ... + + # .. and docker_build.py in turn calls "docker build"... + docker build --build-arg CUDA_VERSION_ARG=12.4.1 \ + --build-arg NCCL_VERSION_ARG=2.23.4-1 \ + --build-arg RAPIDS_VERSION_ARG=24.10 \ + --load --progress=plain \ + --ulimit nofile=1024000:1024000 \ + -t 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + -f containers/dockerfile/Dockerfile.gpu \ + containers/ + +The logs come in handy when debugging the container builds. + +Here is an example with ``docker_run.py``: + +.. code-block:: bash + + # Run without GPU + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ + -- bash ops/script/build_via_cmake.sh + + # Run with NVIDIA GPU + # Allocate extra space in /dev/shm to enable NCCL + # Also run the container with elevated privileges + python3 ops/docker_run.py \ + --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + --use-gpus \ + --run-args='--shm-size=4g --privileged' \ + -- bash ops/pipeline/test-python-wheel-impl.sh gpu + +which are translated to the following ``docker run`` invocations: + +.. code-block:: bash + + docker run --rm --pid=host \ + -w /workspace -v /path/to/xgboost:/workspace \ + -e CI_BUILD_UID= -e CI_BUILD_USER= \ + -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ + 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ + bash ops/script/build_via_cmake.sh + + docker run --rm --pid=host --gpus all \ + -w /workspace -v /path/to/xgboost:/workspace \ + -e CI_BUILD_UID= -e CI_BUILD_USER= \ + -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ + --shm-size=4g --privileged \ + 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ + bash ops/pipeline/test-python-wheel-impl.sh gpu From 8d518f45f6e3547c7c0236714c7475954466339b Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 10:47:13 -0800 Subject: [PATCH 02/27] Add more examples for local testing with Docker --- doc/contrib/ci.rst | 82 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 78 insertions(+), 4 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index 0006002bf8c4..a87e2d8cca40 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -69,7 +69,10 @@ containers locally. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html. The runtime lets you access NVIDIA GPUs inside a Docker container. -**To build a Docker container**, clone the repository `dmlc/xgboost-devops `_ +--------------------------- +To build a Docker container +--------------------------- +Clone the repository `dmlc/xgboost-devops `_ and invoke ``containers/docker_build.sh`` as follows: .. code-block:: bash @@ -113,8 +116,10 @@ When ``containers/docker_build.sh`` completes, you will have access to the conta ``492475357299.dkr.ecr.us-west-2.amazonaws.com/`` was added so that the container could later be uploaded to AWS Elastic Container Registry (ECR), a private Docker registry. -**To run commands within a Docker container**, invoke ``ops/docker_run.py`` from -the main ``dmlc/xgboost`` repo as follows: +----------------------------------------- +To run commands within a Docker container +----------------------------------------- +Invoke ``ops/docker_run.py`` from the main ``dmlc/xgboost`` repo as follows: .. code-block:: bash @@ -132,7 +137,7 @@ For example: # Run without GPU python3 ops/docker_run.py \ --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ - -- bash ops/script/build_via_cmake.sh + -- bash ops/pipeline/build-cpu-impl.sh # Run with NVIDIA GPU python3 ops/docker_run.py \ @@ -154,6 +159,75 @@ Optionally, you can specify ``--run-args`` to pass extra arguments to ``docker r See :ref:`ci_container_infra` to read about how containers are built and managed in the CI pipelines. +-------------------------------------------- +Examples: useful tasks for local development +-------------------------------------------- + +* Build XGBoost with GPU support + package it as a Python wheel + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu_build_rockylinux8:main \ + -- ops/pipeline/build-cuda-impl.sh + +* Run Python tests + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.cpu:main \ + -- ops/pipeline/test-python-wheel-impl.sh cpu + +* Run Python tests with GPU algorithm + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \ + --use-gpus \ + -- ops/pipeline/test-python-wheel-impl.sh gpu + +* Run Python tests with GPU algorithm, with multiple GPUs + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.gpu:main \ + --use-gpus \ + --run-args='--shm-size=4g' \ + -- ops/pipeline/test-python-wheel-impl.sh mgpu + # --shm-size=4g is needed for multi-GPU algorithms to function + +* Build and test JVM packages + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13) + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm:main \ + --run-args "-e SCALA_VERSION" \ + -- ops/pipeline/build-test-jvm-packages-impl.sh + +* Build and test JVM packages, with GPU support + + .. code-block:: bash + + export DOCKER_REGISTRY=492475357299.dkr.ecr.us-west-2.amazonaws.com + export SCALA_VERSION=2.12 # Specify Scala version (2.12 or 2.13) + export USE_CUDA=1 + python3 ops/docker_run.py \ + --container-tag ${DOCKER_REGISTRY}/xgb-ci.jvm_gpu_build:main \ + --use-gpus \ + --run-args "-e SCALA_VERSION -e USE_CUDA --shm-size=4g" \ + -- ops/pipeline/build-test-jvm-packages-impl.sh + # --shm-size=4g is needed for multi-GPU algorithms to function + ***************************** Tour of the CI infrastructure ***************************** From b5a413b77368b4c79d416f25e0656f0454c304b4 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 11:06:19 -0800 Subject: [PATCH 03/27] Add note about VM images --- doc/contrib/ci.rst | 47 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index a87e2d8cca40..3467a8159c5f 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -320,6 +320,8 @@ hosts the code for building the CI containers. The repository is organized as fo * ``containers/ci_container.yml``: Defines the mapping between Dockerfiles and containers. Also specifies the build arguments to be used with each container. * ``containers/docker_build.{py,sh}``: Wrapper scripts to build and test CI containers. +* ``vm_images/``: Defines bootstrap scripts to build VM images for Amazon EC2. See + :ref:`vm_images` to learn about how VM images relate to container images. See :ref:`build_run_docker_locally` to learn about the utility scripts for building and using containers. @@ -413,15 +415,17 @@ This design was inspired by Mike Sarahan's work in .. _ci_container_infra: -=============================================== -Infra for building and publishing CI containers -=============================================== +============================================================= +Infra for building and publishing CI containers and VM images +============================================================= -------------------------- -CI pipeline for containers +Notes on Docker containers -------------------------- +**CI pipeline for containers** + The `dmlc/xgboost-devops `_ repo hosts a CI pipeline to build new -containers at a regular schedule. New containers are built in the following occasions: +Docker containers at a regular schedule. New containers are built in the following occasions: * New commits are added to the ``main`` branch of ``dmlc/xgboost-devops``. * New pull requests are submitted to ``dmlc/xgboost-devops``. @@ -429,9 +433,7 @@ containers at a regular schedule. New containers are built in the following occa This setup ensures that the CI containers remain up-to-date. ------------------------- -How wrapper scripts work ------------------------- +**How wrapper scripts work** The wrapper scripts ``docker_build.sh``, ``docker_build.py`` (in ``dmlc/xgboost-devops``) and ``docker_run.py`` (in ``dmlc/xgboost``) are designed to transparently log what commands are being carried out under the hood. @@ -495,3 +497,32 @@ which are translated to the following ``docker run`` invocations: --shm-size=4g --privileged \ 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main \ bash ops/pipeline/test-python-wheel-impl.sh gpu + + +.. _vm_images: +------------------ +Notes on VM images +------------------ +In the ``vm_images/`` directory of `dmlc/xgboost-devops `_, +we define Packer scripts to build images for Virtual Machines (VM) on +`Amazon EC2 `_. +The VM image contains the minimal set of drivers and system software that are needed to +run the containers. + +We update container images much more often than VM images. Whereas it takes only 10 minutes to +build a new container image, it takes 1-2 hours to build a new VM image. + +To enable quick development iteration cycle, we place the most of +the development environment in containers and keep VM images small. +Packages need for testing should be baked into containers, not VM images. +Developers can make changes to containers and see the results of the changes quickly. + +.. note:: Special note for the Windows platform + + We do not use containers when testing XGBoost on Windows. All software must be baked into + the VM image. Containers are not used because + `NVIDIA Container Toolkit `_ + does not yet support Windows natively. + +The `dmlc/xgboost-devops `_ repo hosts a CI pipeline to build new +VM images at a regular schedule (currently monthly). From edca254bab035244de7df4a8dbd9c5566cec5ce1 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 14:33:42 -0800 Subject: [PATCH 04/27] Update doc for stashing files --- doc/contrib/ci.rst | 57 +++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index 3467a8159c5f..456f8ce1ae0d 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -332,53 +332,64 @@ Artifact sharing between jobs via Amazon S3 We make artifacts from one workflow job available to another job, by uploading the artifacts to `Amazon S3 `_. In the CI, we utilize the -script ``ops/pipeline/stash-artifacts.sh`` to coordinate artifact sharing. +script ``ops/pipeline/manage-artifacts.py`` to coordinate artifact sharing. -**To stash a file**: In the workflow YAML, add the following lines: +**To upload files to S3**: In the workflow YAML, add the following lines: .. code-block:: yaml - - name: Stash files + - name: Upload files to S3 run: | REMOTE_PREFIX="remote directory to place the artifact(s)" - bash ops/pipeline/stash-artifacts.sh stash "${REMOTE_PREFIX}" path/to/file + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${REMOTE_PREFIX} \ + path/to/file -The ``REMOTE_PREFIX`` argument, which is the second command-line argument -for ``stash-artifacts.sh``, specifies the remote directory in which the artifact(s) -should be placed. More precisely, the artifact(s) will be placed in -``s3://{RUNS_ON_S3_BUCKET_CACHE}/cache/{GITHUB_REPOSITORY}/stash/{GITHUB_RUN_ID}/{REMOTE_PREFIX}/`` -where ``RUNS_ON_S3_BUCKET_CACHE``, ``GITHUB_REPOSITORY``, and ``GITHUB_RUN_ID`` are set by -the CI. (RunsOn provisions an S3 bucket to stage cache, and its name is stored in the environment -variable ``RUNS_ON_S3_BUCKET_CACHE``.) +The ``--prefix`` argument specifies the remote directory in which the artifact(s) +should be placed. The artifact(s) will be placed in +``s3://{RUNS_ON_S3_BUCKET_CACHE}/cache/{GITHUB_RUN_ID}/{REMOTE_PREFIX}/`` +where ``RUNS_ON_S3_BUCKET_CACHE`` and ``GITHUB_RUN_ID`` are set by the CI. You can upload multiple files, possibly with wildcard globbing: .. code-block:: yaml - - name: Stash files + - name: Upload files to S3 run: | - bash ops/pipeline/stash-artifacts.sh stash build-cuda \ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-cuda \ build/testxgboost python-package/dist/*.whl -**To unstash a file**: +**To download files from S3**: In the workflow YAML, add the following lines: .. code-block:: yaml - - name: Stash files + - name: Download files from S3 run: | - REMOTE_PREFIX="remote directory to place the artifact(s)" - bash ops/pipeline/stash-artifacts.sh unstash "${REMOTE_PREFIX}" path/to/file + REMOTE_PREFIX="remote directory where the artifact(s) were placed" + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${REMOTE_PREFIX} \ + --dest-dir path/to/destination_directory \ + artifacts -You can also use the wildcard globbing. The script will download the matching artifacts -from the remote directory. +You can also use the wildcard globbing. The script will locate all artifacts +under the given prefix that matches the wildcard pattern. .. code-block:: yaml - - name: Stash files + - name: Download files from S3 run: | - # Download all files whose path matches the wildcard pattern python-package/dist/*.whl - bash ops/pipeline/stash-artifacts.sh unstash build-cuda \ - python-package/dist/*.whl + # Locate all artifacts with name *.whl under prefix + # cache/${GITHUB_RUN_ID}/${REMOTE_PREFIX} and + # download them to wheelhouse/. + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${REMOTE_PREFIX} \ + --dest-dir wheelhouse/ \ + *.whl .. _custom_actions: From 62b1acadf3c108ef30be1008c7a26c7998239f0a Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 14:35:05 -0800 Subject: [PATCH 05/27] GITHUB_ACTION -> GITHUB_ACTIONS --- ops/pipeline/enforce-ci.ps1 | 2 +- ops/pipeline/enforce-ci.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/ops/pipeline/enforce-ci.ps1 b/ops/pipeline/enforce-ci.ps1 index 0528472be6cb..e2eb734d1229 100644 --- a/ops/pipeline/enforce-ci.ps1 +++ b/ops/pipeline/enforce-ci.ps1 @@ -1,7 +1,7 @@ ## Ensure that a script is running inside the CI. ## Usage: . ops/pipeline/enforce-ci.ps1 -if ( -Not $Env:GITHUB_ACTION ) { +if ( -Not $Env:GITHUB_ACTIONS ) { $script_name = (Split-Path -Path $PSCommandPath -Leaf) Write-Host "$script_name is not meant to run locally; it should run inside GitHub Actions." Write-Host "Please inspect the content of $script_name and locate the desired command manually." diff --git a/ops/pipeline/enforce-ci.sh b/ops/pipeline/enforce-ci.sh index 1e853a5ea266..292d6baec079 100755 --- a/ops/pipeline/enforce-ci.sh +++ b/ops/pipeline/enforce-ci.sh @@ -5,7 +5,7 @@ set -euo pipefail -if [[ -z ${GITHUB_ACTION:-} ]] +if [[ -z ${GITHUB_ACTIONS:-} ]] then echo "$0 is not meant to run locally; it should run inside GitHub Actions." echo "Please inspect the content of $0 and locate the desired command manually." From 6feb0b7a29b4170fdf078971f06adeaa91856b8a Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 14:46:39 -0800 Subject: [PATCH 06/27] Move container build to xgboost-devops --- .github/workflows/jvm_tests.yml | 54 ++----- .github/workflows/lint.yml | 22 +-- .github/workflows/main.yml | 82 +++------- ops/docker/ci_container.yml | 72 --------- ops/docker/docker_cache_ecr.yml | 4 - ops/docker/dockerfile/Dockerfile.aarch64 | 38 ----- ops/docker/dockerfile/Dockerfile.clang_tidy | 50 ------ ops/docker/dockerfile/Dockerfile.cpu | 57 ------- ops/docker/dockerfile/Dockerfile.gpu | 54 ------- .../Dockerfile.gpu_build_r_rockylinux8 | 58 ------- .../Dockerfile.gpu_build_rockylinux8 | 82 ---------- ops/docker/dockerfile/Dockerfile.i386 | 8 - ops/docker/dockerfile/Dockerfile.jvm | 43 ----- .../dockerfile/Dockerfile.jvm_gpu_build | 54 ------- .../Dockerfile.manylinux2014_aarch64 | 17 -- .../Dockerfile.manylinux2014_x86_64 | 17 -- .../Dockerfile.manylinux_2_28_x86_64 | 15 -- ops/docker/entrypoint.sh | 45 ------ ops/docker/extract_build_args.jq | 12 -- ops/docker/extract_build_args.sh | 26 --- ops/docker_build.py | 137 ---------------- ops/docker_build.sh | 149 ------------------ ops/docker_run.py | 31 ++-- ops/pipeline/build-cpu-arm64.sh | 25 +-- ops/pipeline/build-cpu.sh | 16 +- ops/pipeline/build-cuda-with-rmm.sh | 22 +-- ops/pipeline/build-cuda.sh | 27 ++-- ops/pipeline/build-gpu-rpkg.sh | 9 +- ops/pipeline/build-jvm-doc.sh | 12 +- ops/pipeline/build-jvm-gpu.sh | 5 +- ops/pipeline/build-jvm-manylinux2014.sh | 9 +- ops/pipeline/build-manylinux2014.sh | 17 +- ops/pipeline/build-test-jvm-packages.sh | 6 +- ops/pipeline/deploy-jvm-packages.sh | 9 +- ops/pipeline/get-docker-registry-details.sh | 5 + ops/pipeline/login-docker-registry.sh | 11 ++ ops/pipeline/run-clang-tidy.sh | 10 +- ops/pipeline/test-cpp-gpu.sh | 22 ++- ops/pipeline/test-jvm-gpu.sh | 8 +- ops/pipeline/test-python-wheel.sh | 5 +- 40 files changed, 189 insertions(+), 1156 deletions(-) delete mode 100644 ops/docker/ci_container.yml delete mode 100644 ops/docker/docker_cache_ecr.yml delete mode 100644 ops/docker/dockerfile/Dockerfile.aarch64 delete mode 100644 ops/docker/dockerfile/Dockerfile.clang_tidy delete mode 100644 ops/docker/dockerfile/Dockerfile.cpu delete mode 100644 ops/docker/dockerfile/Dockerfile.gpu delete mode 100644 ops/docker/dockerfile/Dockerfile.gpu_build_r_rockylinux8 delete mode 100644 ops/docker/dockerfile/Dockerfile.gpu_build_rockylinux8 delete mode 100644 ops/docker/dockerfile/Dockerfile.i386 delete mode 100644 ops/docker/dockerfile/Dockerfile.jvm delete mode 100644 ops/docker/dockerfile/Dockerfile.jvm_gpu_build delete mode 100644 ops/docker/dockerfile/Dockerfile.manylinux2014_aarch64 delete mode 100644 ops/docker/dockerfile/Dockerfile.manylinux2014_x86_64 delete mode 100644 ops/docker/dockerfile/Dockerfile.manylinux_2_28_x86_64 delete mode 100755 ops/docker/entrypoint.sh delete mode 100644 ops/docker/extract_build_args.jq delete mode 100755 ops/docker/extract_build_args.sh delete mode 100644 ops/docker_build.py delete mode 100755 ops/docker_build.sh create mode 100755 ops/pipeline/get-docker-registry-details.sh create mode 100755 ops/pipeline/login-docker-registry.sh diff --git a/.github/workflows/jvm_tests.yml b/.github/workflows/jvm_tests.yml index 10e9b32bcf70..965ea49ccad7 100644 --- a/.github/workflows/jvm_tests.yml +++ b/.github/workflows/jvm_tests.yml @@ -12,40 +12,12 @@ concurrency: env: BRANCH_NAME: >- ${{ github.event.pull_request.number && 'PR-' }}${{ github.event.pull_request.number || github.ref_name }} - USE_DOCKER_CACHE: 1 jobs: - build-containers: - name: Build CI containers (${{ matrix.container_id }}) - runs-on: - - runs-on - - runner=${{ matrix.runner }} - - run-id=${{ github.run_id }} - - tag=jvm-tests-build-containers-${{ matrix.container_id }} - strategy: - matrix: - container_id: - - xgb-ci.manylinux2014_x86_64 - - xgb-ci.jvm - - xgb-ci.jvm_gpu_build - runner: [linux-amd64-cpu] - include: - - container_id: xgb-ci.manylinux2014_aarch64 - runner: linux-arm64-cpu - steps: - # Restart Docker daemon so that it recognizes the ephemeral disks - - run: sudo systemctl restart docker - - uses: actions/checkout@v4 - with: - submodules: "true" - - name: Build ${{ matrix.container_id }} - run: bash ops/docker_build.sh ${{ matrix.container_id }} - build-jvm-manylinux2014: name: >- Build libxgboost4j.so targeting glibc 2.17 (arch ${{ matrix.arch }}, runner ${{ matrix.runner }}) - needs: build-containers runs-on: - runs-on - runner=${{ matrix.runner }} @@ -65,8 +37,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.manylinux2014_${{ matrix.arch }} + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-jvm-manylinux2014.sh ${{ matrix.arch }} - name: Upload libxgboost4j.so run: | @@ -77,7 +49,6 @@ jobs: build-jvm-gpu: name: Build libxgboost4j.so with CUDA - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -88,8 +59,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.jvm_gpu_build + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-jvm-gpu.sh - name: Stash files run: | @@ -137,8 +108,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.jvm_gpu_build + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | bash ops/pipeline/stash-artifacts.sh unstash build-jvm-gpu lib/libxgboost4j.so @@ -151,7 +122,6 @@ jobs: build-test-jvm-packages: name: Build and test JVM packages (Linux, Scala ${{ matrix.scala_version }}) - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -166,8 +136,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.jvm + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Build and test JVM packages (Scala ${{ matrix.scala_version }}) run: bash ops/pipeline/build-test-jvm-packages.sh env: @@ -239,8 +209,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.jvm_gpu_build + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | bash ops/pipeline/stash-artifacts.sh unstash build-jvm-gpu lib/libxgboost4j.so @@ -273,8 +243,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh ${{ matrix.variant.container_id }} + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | bash ops/pipeline/stash-artifacts.sh \ diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index 0bd12a81fd05..73636e7ce66d 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -14,26 +14,8 @@ env: ${{ github.event.pull_request.number && 'PR-' }}${{ github.event.pull_request.number || github.ref_name }} jobs: - build-containers: - name: Build CI containers - env: - CONTAINER_ID: xgb-ci.clang_tidy - runs-on: - - runs-on=${{ github.run_id }} - - runner=linux-amd64-cpu - - tag=lint-build-containers - steps: - # Restart Docker daemon so that it recognizes the ephemeral disks - - run: sudo systemctl restart docker - - uses: actions/checkout@v4 - with: - submodules: "true" - - name: Build ${{ env.CONTAINER_ID }} - run: bash ops/docker_build.sh ${{ env.CONTAINER_ID }} - clang-tidy: name: Run clang-tidy - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -44,8 +26,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.clang_tidy + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/run-clang-tidy.sh python-mypy-lint: diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cbed730405fa..e62cc3f35e59 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -12,44 +12,10 @@ concurrency: env: BRANCH_NAME: >- ${{ github.event.pull_request.number && 'PR-' }}${{ github.event.pull_request.number || github.ref_name }} - USE_DOCKER_CACHE: 1 jobs: - build-containers: - name: Build CI containers (${{ matrix.container_id }}) - runs-on: - - runs-on - - runner=${{ matrix.runner }} - - run-id=${{ github.run_id }} - - tag=main-build-containers-${{ matrix.container_id }} - strategy: - matrix: - container_id: - - xgb-ci.gpu_build_rockylinux8 - - xgb-ci.gpu_build_rockylinux8_dev_ver - - xgb-ci.gpu_build_r_rockylinux8 - - xgb-ci.gpu - - xgb-ci.cpu - - xgb-ci.manylinux_2_28_x86_64 - - xgb-ci.manylinux2014_x86_64 - runner: [linux-amd64-cpu] - include: - - container_id: xgb-ci.manylinux2014_aarch64 - runner: linux-arm64-cpu - - container_id: xgb-ci.aarch64 - runner: linux-arm64-cpu - steps: - # Restart Docker daemon so that it recognizes the ephemeral disks - - run: sudo systemctl restart docker - - uses: actions/checkout@v4 - with: - submodules: "true" - - name: Build ${{ matrix.container_id }} - run: bash ops/docker_build.sh ${{ matrix.container_id }} - build-cpu: name: Build CPU - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -60,15 +26,14 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.cpu + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-cpu.sh - name: Stash CLI executable run: bash ops/pipeline/stash-artifacts.sh stash build-cpu ./xgboost build-cpu-arm64: name: Build CPU ARM64 + manylinux_2_28_aarch64 wheel - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-arm64-cpu @@ -79,8 +44,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.aarch64 + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-cpu-arm64.sh - name: Stash files run: | @@ -93,7 +58,6 @@ jobs: build-cuda: name: Build CUDA + manylinux_2_28_x86_64 wheel - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -104,10 +68,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.gpu_build_rockylinux8 - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.manylinux_2_28_x86_64 + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-cuda.sh - name: Stash files run: | @@ -123,7 +85,6 @@ jobs: build-cuda-with-rmm: name: Build CUDA with RMM - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -134,10 +95,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.gpu_build_rockylinux8 - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.manylinux_2_28_x86_64 + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: | bash ops/pipeline/build-cuda-with-rmm.sh xgb-ci.gpu_build_rockylinux8 - name: Stash files @@ -151,7 +110,6 @@ jobs: build-cuda-with-rmm-dev: name: Build CUDA with RMM (dev) - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -162,16 +120,13 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.gpu_build_rockylinux8_dev_ver - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.manylinux_2_28_x86_64 + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: | bash ops/pipeline/build-cuda-with-rmm.sh xgb-ci.gpu_build_rockylinux8_dev_ver build-manylinux2014: name: Build manylinux2014_${{ matrix.arch }} wheel - needs: build-containers runs-on: - runs-on - runner=${{ matrix.runner }} @@ -191,8 +146,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.manylinux2014_${{ matrix.arch }} + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-manylinux2014.sh ${{ matrix.arch }} - name: Upload Python wheel run: | @@ -204,7 +159,6 @@ jobs: build-gpu-rpkg: name: Build GPU-enabled R package - needs: build-containers runs-on: - runs-on=${{ github.run_id }} - runner=linux-amd64-cpu @@ -215,8 +169,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.gpu_build_r_rockylinux8 + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-gpu-rpkg.sh - name: Upload R tarball run: | @@ -253,8 +207,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh xgb-ci.gpu + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Unstash gtest run: | bash ops/pipeline/stash-artifacts.sh unstash ${{ matrix.artifact_from }} \ @@ -300,8 +254,8 @@ jobs: - uses: actions/checkout@v4 with: submodules: "true" - - name: Fetch container from cache - run: bash ops/docker_build.sh ${{ matrix.container }} + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh - name: Unstash Python wheel run: | bash ops/pipeline/stash-artifacts.sh unstash ${{ matrix.artifact_from }} \ diff --git a/ops/docker/ci_container.yml b/ops/docker/ci_container.yml deleted file mode 100644 index 348bf90f8a1f..000000000000 --- a/ops/docker/ci_container.yml +++ /dev/null @@ -1,72 +0,0 @@ -## List of CI containers with definitions and build arguments - -# Each container will be built using the definition from -# ops/docker/dockerfile/Dockerfile.CONTAINER_DEF - -rapids_versions: - stable: &rapids_version "24.10" - dev: &dev_rapids_version "24.12" - -xgb-ci.gpu_build_rockylinux8: - container_def: gpu_build_rockylinux8 - build_args: - CUDA_VERSION_ARG: "12.4.1" - NCCL_VERSION_ARG: "2.23.4-1" - RAPIDS_VERSION_ARG: *rapids_version - -xgb-ci.gpu_build_rockylinux8_dev_ver: - container_def: gpu_build_rockylinux8 - build_args: - CUDA_VERSION_ARG: "12.4.1" - NCCL_VERSION_ARG: "2.23.4-1" - RAPIDS_VERSION_ARG: *dev_rapids_version - -xgb-ci.gpu_build_r_rockylinux8: - container_def: gpu_build_r_rockylinux8 - build_args: - CUDA_VERSION_ARG: "12.4.1" - R_VERSION_ARG: "4.3.2" - -xgb-ci.gpu: - container_def: gpu - build_args: - CUDA_VERSION_ARG: "12.4.1" - NCCL_VERSION_ARG: "2.23.4-1" - RAPIDS_VERSION_ARG: *rapids_version - -xgb-ci.gpu_dev_ver: - container_def: gpu - build_args: - CUDA_VERSION_ARG: "12.4.1" - NCCL_VERSION_ARG: "2.23.4-1" - RAPIDS_VERSION_ARG: *dev_rapids_version - RAPIDSAI_CONDA_CHANNEL_ARG: "rapidsai-nightly" - -xgb-ci.clang_tidy: - container_def: clang_tidy - build_args: - CUDA_VERSION_ARG: "12.4.1" - -xgb-ci.cpu: - container_def: cpu - -xgb-ci.aarch64: - container_def: aarch64 - -xgb-ci.manylinux_2_28_x86_64: - container_def: manylinux_2_28_x86_64 - -xgb-ci.manylinux2014_x86_64: - container_def: manylinux2014_x86_64 - -xgb-ci.manylinux2014_aarch64: - container_def: manylinux2014_aarch64 - -xgb-ci.jvm: - container_def: jvm - -xgb-ci.jvm_gpu_build: - container_def: jvm_gpu_build - build_args: - CUDA_VERSION_ARG: "12.4.1" - NCCL_VERSION_ARG: "2.23.4-1" diff --git a/ops/docker/docker_cache_ecr.yml b/ops/docker/docker_cache_ecr.yml deleted file mode 100644 index e20f35fc8020..000000000000 --- a/ops/docker/docker_cache_ecr.yml +++ /dev/null @@ -1,4 +0,0 @@ -## Constants for AWS ECR (Elastic Container Registry), used for the Docker cache - -DOCKER_CACHE_ECR_ID: "492475357299" -DOCKER_CACHE_ECR_REGION: "us-west-2" diff --git a/ops/docker/dockerfile/Dockerfile.aarch64 b/ops/docker/dockerfile/Dockerfile.aarch64 deleted file mode 100644 index 9dff2a05230b..000000000000 --- a/ops/docker/dockerfile/Dockerfile.aarch64 +++ /dev/null @@ -1,38 +0,0 @@ -FROM quay.io/pypa/manylinux_2_28_aarch64 - -SHELL ["/bin/bash", "-c"] # Use Bash as shell - -# Install all basic requirements -RUN \ - dnf -y update && \ - dnf -y install dnf-plugins-core && \ - dnf config-manager --set-enabled powertools && \ - dnf install -y tar unzip wget xz git which ninja-build gcc-toolset-10-gcc gcc-toolset-10-binutils gcc-toolset-10-gcc-c++ && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-aarch64.sh && \ - bash conda.sh -b -p /opt/miniforge - -ENV PATH=/opt/miniforge/bin:$PATH -ENV CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc -ENV CXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ -ENV CPP=/opt/rh/gcc-toolset-10/root/usr/bin/cpp -ENV GOSU_VERSION=1.10 - -# Create new Conda environment -COPY conda_env/aarch64_test.yml /scripts/ -RUN mamba create -n aarch64_test && \ - mamba env update -n aarch64_test --file=/scripts/aarch64_test.yml && \ - mamba clean --all --yes - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-arm64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.clang_tidy b/ops/docker/dockerfile/Dockerfile.clang_tidy deleted file mode 100644 index de7d9bd3f254..000000000000 --- a/ops/docker/dockerfile/Dockerfile.clang_tidy +++ /dev/null @@ -1,50 +0,0 @@ -ARG CUDA_VERSION_ARG=notset -FROM nvidia/cuda:$CUDA_VERSION_ARG-devel-ubuntu22.04 -ARG CUDA_VERSION_ARG - -# Environment -ENV DEBIAN_FRONTEND=noninteractive - -# Install all basic requirements -RUN \ - apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub && \ - apt-get update && \ - apt-get install -y wget git python3 python3-pip software-properties-common \ - apt-transport-https ca-certificates gnupg-agent && \ - apt-get install -y ninja-build - -# Install clang-tidy: https://apt.llvm.org/ -RUN \ - apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main" && \ - wget -O llvm-snapshot.gpg.key https://apt.llvm.org/llvm-snapshot.gpg.key && \ - apt-key add ./llvm-snapshot.gpg.key && \ - rm llvm-snapshot.gpg.key && \ - apt-get update && \ - apt-get install -y clang-tidy-19 clang-19 libomp-19-dev - -# Set default clang-tidy version -RUN \ - update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-19 100 && \ - update-alternatives --install /usr/bin/clang clang /usr/bin/clang-19 100 - -RUN \ - apt-get install libgtest-dev libgmock-dev -y - -# Install Python packages -RUN \ - pip3 install cmake - -ENV GOSU_VERSION=1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.cpu b/ops/docker/dockerfile/Dockerfile.cpu deleted file mode 100644 index a426ce5da30c..000000000000 --- a/ops/docker/dockerfile/Dockerfile.cpu +++ /dev/null @@ -1,57 +0,0 @@ -FROM ubuntu:22.04 - -# Environment -ENV DEBIAN_FRONTEND=noninteractive -SHELL ["/bin/bash", "-c"] - -# Install all basic requirements -RUN \ - apt-get update && \ - apt-get install -y software-properties-common && \ - add-apt-repository ppa:ubuntu-toolchain-r/test && \ - apt-get update && \ - apt-get install -y tar unzip wget git build-essential doxygen graphviz llvm libidn12 cmake ninja-build gcc-10 g++-10 openjdk-8-jdk-headless && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge - -ENV PATH=/opt/miniforge/bin:$PATH -ENV CC=gcc-10 -ENV CXX=g++-10 -ENV CPP=cpp-10 - -ENV GOSU_VERSION=1.10 -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ - -# Install gRPC -# Patch Abseil to apply https://github.com/abseil/abseil-cpp/issues/1629 -RUN git clone -b v1.65.4 https://github.com/grpc/grpc.git \ - --recurse-submodules --depth 1 && \ - pushd grpc && \ - pushd third_party/abseil-cpp && \ - git fetch origin master && \ - git cherry-pick -n cfde5f74e276049727f9556f13473a59fe77d9eb && \ - popd && \ - cmake -S . -B build -GNinja -DCMAKE_INSTALL_PREFIX=/opt/grpc -DCMAKE_CXX_VISIBILITY_PRESET=hidden && \ - cmake --build build --target install && \ - popd && \ - rm -rf grpc - -# Create new Conda environment -COPY conda_env/linux_cpu_test.yml /scripts/ -RUN mamba create -n linux_cpu_test && \ - mamba env update -n linux_cpu_test --file=/scripts/linux_cpu_test.yml && \ - mamba clean --all --yes - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.gpu b/ops/docker/dockerfile/Dockerfile.gpu deleted file mode 100644 index 96a532fc2ff1..000000000000 --- a/ops/docker/dockerfile/Dockerfile.gpu +++ /dev/null @@ -1,54 +0,0 @@ -ARG CUDA_VERSION_ARG=notset -FROM nvidia/cuda:$CUDA_VERSION_ARG-runtime-ubuntu22.04 -ARG CUDA_VERSION_ARG -ARG RAPIDS_VERSION_ARG - # Should be first 4 digits (e.g. 24.06) -ARG NCCL_VERSION_ARG -ARG RAPIDSAI_CONDA_CHANNEL_ARG="rapidsai" - -# Environment -ENV DEBIAN_FRONTEND=noninteractive -SHELL ["/bin/bash", "-c"] - -# Install all basic requirements -RUN \ - apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub && \ - apt-get update && \ - apt-get install -y wget unzip bzip2 libgomp1 build-essential openjdk-8-jdk-headless && \ - apt-get install libnccl2 libnccl-dev -y --allow-change-held-packages && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge - -ENV PATH=/opt/miniforge/bin:$PATH - -# Create new Conda environment with cuDF, Dask, and cuPy -RUN \ - export NCCL_SHORT_VER=$(echo "$NCCL_VERSION_ARG" | cut -d "-" -f 1) && \ - export CUDA_SHORT_VER=$(echo "$CUDA_VERSION_ARG" | grep -o -E '[0-9]+\.[0-9]') && \ - mamba create -y -n gpu_test -c ${RAPIDSAI_CONDA_CHANNEL_ARG} -c conda-forge -c nvidia \ - python=3.10 "cudf=$RAPIDS_VERSION_ARG.*" "rmm=$RAPIDS_VERSION_ARG.*" cuda-version=$CUDA_SHORT_VER \ - "nccl>=${NCCL_SHORT_VER}" \ - "dask<=2024.10.0" \ - "distributed<=2024.10.0" \ - "dask-cuda=$RAPIDS_VERSION_ARG.*" "dask-cudf=$RAPIDS_VERSION_ARG.*" cupy \ - numpy pytest pytest-timeout scipy scikit-learn pandas matplotlib wheel \ - python-kubernetes urllib3 graphviz hypothesis loky \ - "pyspark>=3.4.0" cloudpickle cuda-python && \ - mamba clean --all --yes - -ENV GOSU_VERSION=1.10 -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.gpu_build_r_rockylinux8 b/ops/docker/dockerfile/Dockerfile.gpu_build_r_rockylinux8 deleted file mode 100644 index 2d18b1eeb315..000000000000 --- a/ops/docker/dockerfile/Dockerfile.gpu_build_r_rockylinux8 +++ /dev/null @@ -1,58 +0,0 @@ -ARG CUDA_VERSION_ARG=notset -FROM nvcr.io/nvidia/cuda:$CUDA_VERSION_ARG-devel-rockylinux8 -ARG CUDA_VERSION_ARG -ARG R_VERSION_ARG - -# Install all basic requirements -RUN \ - curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub | sed '/^Version/d' \ - > /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA && \ - dnf -y update && \ - dnf -y install dnf-plugins-core && \ - dnf config-manager --set-enabled powertools && \ - dnf install -y tar unzip wget xz git which ninja-build readline-devel libX11-devel libXt-devel \ - xorg-x11-server-devel openssl-devel zlib-devel bzip2-devel xz-devel \ - pcre2-devel libcurl-devel texlive-* \ - gcc-toolset-10-gcc gcc-toolset-10-binutils gcc-toolset-10-gcc-c++ \ - gcc-toolset-10-gcc-gfortran gcc-toolset-10-libquadmath-devel \ - gcc-toolset-10-runtime gcc-toolset-10-libstdc++-devel - -ENV PATH=/opt/miniforge/bin:/usr/local/ninja:/opt/software/packages/bin:/opt/R/$R_VERSION_ARG/bin:$PATH -ENV LD_LIBRARY_PATH=/opt/software/packages/lib:/opt/R/$R_VERSION_ARG/lib64:$LD_LIBRARY_PATH -ENV CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc -ENV CXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ -ENV CPP=/opt/rh/gcc-toolset-10/root/usr/bin/cpp -ENV F77=/opt/rh/gcc-toolset-10/root/usr/bin/gfortran -ENV FC=/opt/rh/gcc-toolset-10/root/usr/bin/gfortran - -RUN \ - wget -nv -nc https://cran.r-project.org/src/base/R-4/R-$R_VERSION_ARG.tar.gz && \ - tar xf R-$R_VERSION_ARG.tar.gz && \ - cd R-$R_VERSION_ARG && \ - ./configure --prefix=/opt/R/$R_VERSION_ARG --enable-R-shlib --with-pcrel && \ - make -j$(nproc) && \ - make install - -run \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge && \ - /opt/miniforge/bin/python -m pip install auditwheel awscli && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.29/cmake-3.29.5-linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.29.5-linux-x86_64.sh --skip-license --prefix=/usr - -ENV GOSU_VERSION=1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -nc -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.gpu_build_rockylinux8 b/ops/docker/dockerfile/Dockerfile.gpu_build_rockylinux8 deleted file mode 100644 index b686bfbb2b0d..000000000000 --- a/ops/docker/dockerfile/Dockerfile.gpu_build_rockylinux8 +++ /dev/null @@ -1,82 +0,0 @@ -ARG CUDA_VERSION_ARG=notset -FROM nvcr.io/nvidia/cuda:$CUDA_VERSION_ARG-devel-rockylinux8 -ARG CUDA_VERSION_ARG -ARG NCCL_VERSION_ARG -ARG RAPIDS_VERSION_ARG - -# Install all basic requirements -RUN \ - curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub | sed '/^Version/d' \ - > /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA && \ - dnf -y update && \ - dnf -y install dnf-plugins-core && \ - dnf config-manager --set-enabled powertools && \ - dnf install -y tar unzip wget xz git which ninja-build gcc-toolset-10-gcc gcc-toolset-10-binutils gcc-toolset-10-gcc-c++ && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge && \ - /opt/miniforge/bin/python -m pip install awscli && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.29/cmake-3.29.5-linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.29.5-linux-x86_64.sh --skip-license --prefix=/usr - -# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html) -RUN \ - export CUDA_SHORT=`echo $CUDA_VERSION_ARG | grep -o -E '[0-9]+\.[0-9]'` && \ - export NCCL_VERSION=$NCCL_VERSION_ARG && \ - dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \ - dnf -y update && \ - dnf install -y libnccl-${NCCL_VERSION}+cuda${CUDA_SHORT} libnccl-devel-${NCCL_VERSION}+cuda${CUDA_SHORT} - -ENV PATH=/opt/miniforge/bin:/usr/local/ninja:$PATH -ENV CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc -ENV CXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ -ENV CPP=/opt/rh/gcc-toolset-10/root/usr/bin/cpp -ENV CUDAHOSTCXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ - -ENV GOSU_VERSION=1.10 - -# Install gRPC -# Patch Abseil to apply https://github.com/abseil/abseil-cpp/issues/1629 -RUN git clone -b v1.65.4 https://github.com/grpc/grpc.git \ - --recurse-submodules --depth 1 && \ - pushd grpc && \ - pushd third_party/abseil-cpp && \ - git fetch origin master && \ - git cherry-pick -n cfde5f74e276049727f9556f13473a59fe77d9eb && \ - popd && \ - cmake -S . -B build -GNinja -DCMAKE_INSTALL_PREFIX=/opt/grpc -DCMAKE_CXX_VISIBILITY_PRESET=hidden && \ - cmake --build build --target install && \ - popd && \ - rm -rf grpc - -# Install RMM -# Patch out -Werror -# Patch CCCL 2.5.0 to apply https://github.com/NVIDIA/cccl/pull/1957 -RUN git clone -b branch-${RAPIDS_VERSION_ARG} https://github.com/rapidsai/rmm.git --recurse-submodules --depth 1 && \ - pushd rmm && \ - find . -name CMakeLists.txt -print0 | xargs -0 sed -i 's/-Werror//g' && \ - mkdir build && \ - pushd build && \ - cmake .. -GNinja -DCMAKE_INSTALL_PREFIX=/opt/rmm -DCUDA_STATIC_RUNTIME=ON && \ - pushd _deps/cccl-src/ && \ - git fetch origin main && \ - git cherry-pick -n 9fcb32c228865f21f2b002b29d38a06b4c6fbd73 && \ - popd && \ - cmake --build . --target install && \ - popd && \ - popd && \ - rm -rf rmm - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -nc -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.i386 b/ops/docker/dockerfile/Dockerfile.i386 deleted file mode 100644 index f128a008fa6c..000000000000 --- a/ops/docker/dockerfile/Dockerfile.i386 +++ /dev/null @@ -1,8 +0,0 @@ -FROM i386/debian:sid - -ENV DEBIAN_FRONTEND=noninteractive -SHELL ["/bin/bash", "-c"] - -RUN \ - apt-get update && \ - apt-get install -y tar unzip wget git build-essential ninja-build cmake diff --git a/ops/docker/dockerfile/Dockerfile.jvm b/ops/docker/dockerfile/Dockerfile.jvm deleted file mode 100644 index 9fd62e52de93..000000000000 --- a/ops/docker/dockerfile/Dockerfile.jvm +++ /dev/null @@ -1,43 +0,0 @@ -FROM rockylinux:8 - -# Install all basic requirements -RUN \ - dnf -y update && \ - dnf -y install dnf-plugins-core && \ - dnf config-manager --set-enabled powertools && \ - dnf install -y tar unzip make bzip2 wget xz git which ninja-build java-1.8.0-openjdk-devel \ - gcc-toolset-10-gcc gcc-toolset-10-binutils gcc-toolset-10-gcc-c++ \ - gcc-toolset-10-runtime gcc-toolset-10-libstdc++-devel && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.29/cmake-3.29.5-linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.29.5-linux-x86_64.sh --skip-license --prefix=/usr && \ - # Maven - wget -nv -nc https://archive.apache.org/dist/maven/maven-3/3.9.7/binaries/apache-maven-3.9.7-bin.tar.gz && \ - tar xvf apache-maven-3.9.7-bin.tar.gz -C /opt && \ - ln -s /opt/apache-maven-3.9.7/ /opt/maven - -ENV PATH=/opt/miniforge/bin:/opt/maven/bin:$PATH -ENV CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc -ENV CXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ -ENV CPP=/opt/rh/gcc-toolset-10/root/usr/bin/cpp - -# Install Python packages -RUN pip install numpy pytest scipy scikit-learn wheel kubernetes awscli - -ENV GOSU_VERSION=1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -nc -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.jvm_gpu_build b/ops/docker/dockerfile/Dockerfile.jvm_gpu_build deleted file mode 100644 index 4983493a6878..000000000000 --- a/ops/docker/dockerfile/Dockerfile.jvm_gpu_build +++ /dev/null @@ -1,54 +0,0 @@ -ARG CUDA_VERSION_ARG=notset -FROM nvcr.io/nvidia/cuda:$CUDA_VERSION_ARG-devel-rockylinux8 -ARG CUDA_VERSION_ARG -ARG NCCL_VERSION_ARG - -# Install all basic requirements -RUN \ - curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/D42D0685.pub | sed '/^Version/d' \ - > /etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA && \ - dnf -y update && \ - dnf -y install dnf-plugins-core && \ - dnf config-manager --set-enabled powertools && \ - dnf install -y tar unzip wget xz git which ninja-build java-1.8.0-openjdk-devel gcc-toolset-10-gcc gcc-toolset-10-binutils gcc-toolset-10-gcc-c++ && \ - # Python - wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/24.3.0-0/Miniforge3-24.3.0-0-Linux-x86_64.sh && \ - bash conda.sh -b -p /opt/miniforge && \ - # CMake - wget -nv -nc https://cmake.org/files/v3.29/cmake-3.29.5-linux-x86_64.sh --no-check-certificate && \ - bash cmake-3.29.5-linux-x86_64.sh --skip-license --prefix=/usr && \ - # Maven - wget -nv -nc https://archive.apache.org/dist/maven/maven-3/3.9.7/binaries/apache-maven-3.9.7-bin.tar.gz && \ - tar xvf apache-maven-3.9.7-bin.tar.gz -C /opt && \ - ln -s /opt/apache-maven-3.9.7/ /opt/maven - -# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html) -RUN \ - export CUDA_SHORT=`echo $CUDA_VERSION_ARG | grep -o -E '[0-9]+\.[0-9]'` && \ - export NCCL_VERSION=$NCCL_VERSION_ARG && \ - dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \ - dnf -y update && \ - dnf install -y libnccl-${NCCL_VERSION}+cuda${CUDA_SHORT} libnccl-devel-${NCCL_VERSION}+cuda${CUDA_SHORT} libnccl-static-${NCCL_VERSION}+cuda${CUDA_SHORT} - -ENV PATH=/opt/miniforge/bin:/opt/maven/bin:$PATH -ENV CC=/opt/rh/gcc-toolset-10/root/usr/bin/gcc -ENV CXX=/opt/rh/gcc-toolset-10/root/usr/bin/c++ -ENV CPP=/opt/rh/gcc-toolset-10/root/usr/bin/cpp - -# Install Python packages -RUN pip install numpy pytest scipy scikit-learn wheel kubernetes awscli - -ENV GOSU_VERSION=1.10 - -# Install lightweight sudo (not bound to TTY) -RUN set -ex; \ - wget -nv -nc -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.manylinux2014_aarch64 b/ops/docker/dockerfile/Dockerfile.manylinux2014_aarch64 deleted file mode 100644 index 7800033f552d..000000000000 --- a/ops/docker/dockerfile/Dockerfile.manylinux2014_aarch64 +++ /dev/null @@ -1,17 +0,0 @@ -FROM quay.io/pypa/manylinux2014_aarch64 - -RUN yum update -y && yum install -y java-1.8.0-openjdk-devel - -# Install lightweight sudo (not bound to TTY) -ENV GOSU_VERSION=1.10 -RUN set -ex; \ - curl -o /usr/local/bin/gosu -L "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-arm64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.manylinux2014_x86_64 b/ops/docker/dockerfile/Dockerfile.manylinux2014_x86_64 deleted file mode 100644 index 8214b598d8d4..000000000000 --- a/ops/docker/dockerfile/Dockerfile.manylinux2014_x86_64 +++ /dev/null @@ -1,17 +0,0 @@ -FROM quay.io/pypa/manylinux2014_x86_64 - -RUN yum update -y && yum install -y java-1.8.0-openjdk-devel - -# Install lightweight sudo (not bound to TTY) -ENV GOSU_VERSION=1.10 -RUN set -ex; \ - curl -o /usr/local/bin/gosu -L "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/dockerfile/Dockerfile.manylinux_2_28_x86_64 b/ops/docker/dockerfile/Dockerfile.manylinux_2_28_x86_64 deleted file mode 100644 index f5dac54b9b8f..000000000000 --- a/ops/docker/dockerfile/Dockerfile.manylinux_2_28_x86_64 +++ /dev/null @@ -1,15 +0,0 @@ -FROM quay.io/pypa/manylinux_2_28_x86_64 - -# Install lightweight sudo (not bound to TTY) -ENV GOSU_VERSION=1.10 -RUN set -ex; \ - curl -o /usr/local/bin/gosu -L "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \ - chmod +x /usr/local/bin/gosu && \ - gosu nobody true - -# Default entry-point to use if running locally -# It will preserve attributes of created files -COPY docker/entrypoint.sh /scripts/ - -WORKDIR /workspace -ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/ops/docker/entrypoint.sh b/ops/docker/entrypoint.sh deleted file mode 100755 index 40135c197c73..000000000000 --- a/ops/docker/entrypoint.sh +++ /dev/null @@ -1,45 +0,0 @@ -#!/usr/bin/env bash - -# This wrapper script propagates the user information from the host -# to the container. This way, any files generated by processes running -# in the container will be accessible in the host. - -set -euo pipefail - -COMMAND=("$@") - -if ! touch /this_is_writable_file_system; then - echo "You can't write to your filesystem!" - echo "If you are in Docker you should check you do not have too many images" \ - "with too many files in them. Docker has some issue with it." - exit 1 -else - rm /this_is_writable_file_system -fi - -## Assumption: the host passes correct user information via environment variables -## CI_BUILD_UID, CI_BUILD_GID, CI_BUILD_USER, CI_BUILD_GROUP - -if [[ -n ${CI_BUILD_UID:-} ]] && [[ -n ${CI_BUILD_GID:-} ]] -then - groupadd -o -g "${CI_BUILD_GID}" "${CI_BUILD_GROUP}" || true - useradd -o -m -g "${CI_BUILD_GID}" -u "${CI_BUILD_UID}" \ - "${CI_BUILD_USER}" || true - export HOME="/home/${CI_BUILD_USER}" - shopt -s dotglob - cp -r /root/* "$HOME/" - chown -R "${CI_BUILD_UID}:${CI_BUILD_GID}" "$HOME" - - # Allows project-specific customization - if [[ -e "/workspace/.pre_entry.sh" ]]; then - gosu "${CI_BUILD_UID}:${CI_BUILD_GID}" /workspace/.pre_entry.sh - fi - - # Enable passwordless sudo capabilities for the user - chown root:"${CI_BUILD_GID}" "$(which gosu)" - chmod +s "$(which gosu)"; sync - - exec gosu "${CI_BUILD_UID}:${CI_BUILD_GID}" "${COMMAND[@]}" -else - exec "${COMMAND[@]}" -fi diff --git a/ops/docker/extract_build_args.jq b/ops/docker/extract_build_args.jq deleted file mode 100644 index b35240edb626..000000000000 --- a/ops/docker/extract_build_args.jq +++ /dev/null @@ -1,12 +0,0 @@ -## Example input: -## xgb-ci.gpu_build_r_rockylinux8 -## Example output: -## --build-arg CUDA_VERSION_ARG=12.4.1 --build-arg R_VERSION_ARG=4.3.2 -def compute_build_args($input; $container_id): - $input | - .[$container_id] | - select(.build_args != null) | - .build_args | - to_entries | - map("--build-arg " + .key + "=" + .value) | - join(" "); diff --git a/ops/docker/extract_build_args.sh b/ops/docker/extract_build_args.sh deleted file mode 100755 index 42a83047742c..000000000000 --- a/ops/docker/extract_build_args.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/bash -## Extract container definition and build args from ops/docker/ci_container.yml, -## given the container ID. -## -## Example input: -## xgb-ci.clang_tidy -## Example output: -## CONTAINER_DEF='clang_tidy' BUILD_ARGS='--build-arg CUDA_VERSION_ARG=12.4.1' - -if [ "$#" -ne 1 ]; then - echo "Usage: $0 [container_id]" - exit 1 -fi - -CONTAINER_ID="$1" -CONTAINER_DEF=$( - yq -o json ops/docker/ci_container.yml | - jq -r --arg container_id "${CONTAINER_ID}" '.[$container_id].container_def' -) -BUILD_ARGS=$( - yq -o json ops/docker/ci_container.yml | - jq -r --arg container_id "${CONTAINER_ID}" \ - 'include "ops/docker/extract_build_args"; - compute_build_args(.; $container_id)' -) -echo "CONTAINER_DEF='${CONTAINER_DEF}' BUILD_ARGS='${BUILD_ARGS}'" diff --git a/ops/docker_build.py b/ops/docker_build.py deleted file mode 100644 index 1fed975ce223..000000000000 --- a/ops/docker_build.py +++ /dev/null @@ -1,137 +0,0 @@ -""" -Wrapper script to build a Docker container with layer caching -""" - -import argparse -import itertools -import pathlib -import subprocess -import sys -from typing import Optional - -from docker_run import OPS_DIR, fancy_print_cli_args - - -def parse_build_args(raw_build_args: list[str]) -> dict[str, str]: - parsed_build_args = dict() - for arg in raw_build_args: - try: - key, value = arg.split("=", maxsplit=1) - except ValueError as e: - raise ValueError( - f"Build argument must be of form KEY=VALUE. Got: {arg}" - ) from e - parsed_build_args[key] = value - return parsed_build_args - - -def docker_build( - container_id: str, - *, - build_args: dict[str, str], - dockerfile_path: pathlib.Path, - docker_context_path: pathlib.Path, - cache_from: Optional[str], - cache_to: Optional[str], -) -> None: - ## Set up command-line arguments to be passed to `docker build` - # Build args - docker_build_cli_args = list( - itertools.chain.from_iterable( - [["--build-arg", f"{k}={v}"] for k, v in build_args.items()] - ) - ) - # When building an image using a non-default driver, we need to specify - # `--load` to load it to the image store. - # See https://docs.docker.com/build/builders/drivers/ - docker_build_cli_args.append("--load") - # Layer caching - if cache_from: - docker_build_cli_args.extend(["--cache-from", cache_from]) - if cache_to: - docker_build_cli_args.extend(["--cache-to", cache_to]) - # Remaining CLI args - docker_build_cli_args.extend( - [ - "--progress=plain", - "--ulimit", - "nofile=1024000:1024000", - "-t", - container_id, - "-f", - str(dockerfile_path), - str(docker_context_path), - ] - ) - cli_args = ["docker", "build"] + docker_build_cli_args - fancy_print_cli_args(cli_args) - subprocess.run(cli_args, check=True, encoding="utf-8") - - -def main(args: argparse.Namespace) -> None: - # Dockerfile to be used in docker build - dockerfile_path = ( - OPS_DIR / "docker" / "dockerfile" / f"Dockerfile.{args.container_def}" - ) - docker_context_path = OPS_DIR - - build_args = parse_build_args(args.build_arg) - - docker_build( - args.container_id, - build_args=build_args, - dockerfile_path=dockerfile_path, - docker_context_path=docker_context_path, - cache_from=args.cache_from, - cache_to=args.cache_to, - ) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Build a Docker container") - parser.add_argument( - "--container-def", - type=str, - required=True, - help=( - "String uniquely identifying the container definition. The container " - "definition will be fetched from " - "docker/dockerfile/Dockerfile.CONTAINER_DEF." - ), - ) - parser.add_argument( - "--container-id", - type=str, - required=True, - help="String ID to assign to the newly built container", - ) - parser.add_argument( - "--build-arg", - type=str, - default=[], - action="append", - help=( - "Build-time variable(s) to be passed to `docker build`. Each variable " - "should be specified as a key-value pair in the form KEY=VALUE. " - "The variables should match the ARG instructions in the Dockerfile. " - "When passing multiple variables, specify --build-arg multiple times. " - "Example: --build-arg CUDA_VERSION_ARG=12.5 --build-arg RAPIDS_VERSION_ARG=24.10'" - ), - ) - parser.add_argument( - "--cache-from", - type=str, - help="Use an external cache source for the Docker build", - ) - parser.add_argument( - "--cache-to", - type=str, - help="Export layers from the container to an external cache destination", - ) - - if len(sys.argv) == 1: - parser.print_help() - sys.exit(1) - - parsed_args = parser.parse_args() - main(parsed_args) diff --git a/ops/docker_build.sh b/ops/docker_build.sh deleted file mode 100755 index 7d83daec9574..000000000000 --- a/ops/docker_build.sh +++ /dev/null @@ -1,149 +0,0 @@ -#!/bin/bash -## Build a CI container and cache the layers in AWS ECR (Elastic Container Registry). -## This script provides a convenient wrapper for ops/docker_build.py. -## Build-time variables (--build-arg) and container defintion are fetched from -## ops/docker/ci_container.yml. -## -## Note. This script takes in some inputs via environment variables. - -USAGE_DOC=$( -cat <<-EOF -Usage: ops/docker_build.sh [container_id] - -In addition, the following environment variables should be set. - - BRANCH_NAME: Name of the current git branch or pull request (Required) - - USE_DOCKER_CACHE: If set to 1, enable caching -EOF -) - -ECR_LIFECYCLE_RULE=$( -cat <<-EOF -{ - "rules": [ - { - "rulePriority": 1, - "selection": { - "tagStatus": "any", - "countType": "sinceImagePushed", - "countUnit": "days", - "countNumber": 30 - }, - "action": { - "type": "expire" - } - } - ] -} -EOF -) - -set -euo pipefail - -for arg in "BRANCH_NAME" -do - if [[ -z "${!arg:-}" ]] - then - echo -e "Error: $arg must be set.\n\n${USAGE_DOC}" - exit 1 - fi -done - -if [[ "$#" -lt 1 ]] -then - echo "${USAGE_DOC}" - exit 2 -fi -CONTAINER_ID="$1" - -# Fetch CONTAINER_DEF and BUILD_ARGS -source <(ops/docker/extract_build_args.sh ${CONTAINER_ID} | tee /dev/stderr) 2>&1 - -if [[ "${USE_DOCKER_CACHE:-}" != "1" ]] # Any value other than 1 is considered false -then - USE_DOCKER_CACHE=0 -fi - -if [[ ${USE_DOCKER_CACHE} -eq 0 ]] -then - echo "USE_DOCKER_CACHE not set; caching disabled" -else - DOCKER_CACHE_ECR_ID=$(yq ".DOCKER_CACHE_ECR_ID" ops/docker/docker_cache_ecr.yml) - DOCKER_CACHE_ECR_REGION=$(yq ".DOCKER_CACHE_ECR_REGION" ops/docker/docker_cache_ecr.yml) - DOCKER_CACHE_REPO="${DOCKER_CACHE_ECR_ID}.dkr.ecr.${DOCKER_CACHE_ECR_REGION}.amazonaws.com" - echo "Using AWS ECR; repo URL = ${DOCKER_CACHE_REPO}" - # Login for Docker registry - echo "aws ecr get-login-password --region ${DOCKER_CACHE_ECR_REGION} |" \ - "docker login --username AWS --password-stdin ${DOCKER_CACHE_REPO}" - aws ecr get-login-password --region ${DOCKER_CACHE_ECR_REGION} \ - | docker login --username AWS --password-stdin ${DOCKER_CACHE_REPO} -fi - -# Pull pre-built container from the cache -# First try locating one for the particular branch or pull request -CACHE_FROM_CMD="" -IS_CACHED=0 -if [[ ${USE_DOCKER_CACHE} -eq 1 ]] -then - DOCKER_TAG="${BRANCH_NAME//\//-}" # Slashes are not allowed in Docker tag - DOCKER_URL="${DOCKER_CACHE_REPO}/${CONTAINER_ID}:${DOCKER_TAG}" - echo "docker pull --quiet ${DOCKER_URL}" - if time docker pull --quiet "${DOCKER_URL}" - then - echo "Found a cached container for the branch ${BRANCH_NAME}: ${DOCKER_URL}" - IS_CACHED=1 - else - # If there's no pre-built container from the cache, - # use the pre-built container from the master branch. - DOCKER_URL="${DOCKER_CACHE_REPO}/${CONTAINER_ID}:master" - echo "Could not find a cached container for the branch ${BRANCH_NAME}." \ - "Using a cached container from the master branch: ${DOCKER_URL}" - echo "docker pull --quiet ${DOCKER_URL}" - if time docker pull --quiet "${DOCKER_URL}" - then - IS_CACHED=1 - else - echo "Could not find a cached container for the master branch either." - IS_CACHED=0 - fi - fi - if [[ $IS_CACHED -eq 1 ]] - then - CACHE_FROM_CMD="--cache-from type=registry,ref=${DOCKER_URL}" - fi -fi - -# Run Docker build -set -x -python3 ops/docker_build.py \ - --container-def ${CONTAINER_DEF} \ - --container-id ${CONTAINER_ID} \ - ${BUILD_ARGS} \ - --cache-to type=inline \ - ${CACHE_FROM_CMD} -set +x - -# Now cache the new container -if [[ ${USE_DOCKER_CACHE} -eq 1 ]] -then - DOCKER_URL="${DOCKER_CACHE_REPO}/${CONTAINER_ID}:${DOCKER_TAG}" - echo "docker tag ${CONTAINER_ID} ${DOCKER_URL}" - docker tag "${CONTAINER_ID}" "${DOCKER_URL}" - - # Attempt to create Docker repository; it will fail if the repository already exists - echo "aws ecr create-repository --repository-name ${CONTAINER_ID} --region ${DOCKER_CACHE_ECR_REGION}" - if aws ecr create-repository --repository-name ${CONTAINER_ID} --region ${DOCKER_CACHE_ECR_REGION} - then - # Repository was created. Now set expiration policy - echo "aws ecr put-lifecycle-policy --repository-name ${CONTAINER_ID}" \ - "--region ${DOCKER_CACHE_ECR_REGION} --lifecycle-policy-text file:///dev/stdin" - echo "${ECR_LIFECYCLE_RULE}" | aws ecr put-lifecycle-policy --repository-name ${CONTAINER_ID} \ - --region ${DOCKER_CACHE_ECR_REGION} --lifecycle-policy-text file:///dev/stdin - fi - - echo "docker push --quiet ${DOCKER_URL}" - if ! time docker push --quiet "${DOCKER_URL}" - then - echo "ERROR: could not update Docker cache ${DOCKER_URL}" - exit 1 - fi -fi diff --git a/ops/docker_run.py b/ops/docker_run.py index 7e61c5a14f39..06f9d6cc8dc8 100644 --- a/ops/docker_run.py +++ b/ops/docker_run.py @@ -24,7 +24,7 @@ ) -def parse_run_args(raw_run_args: str) -> list[str]: +def parse_run_args(*, raw_run_args: str) -> list[str]: return [x for x in raw_run_args.split() if x] @@ -39,7 +39,7 @@ def get_user_ids() -> dict[str, str]: } -def fancy_print_cli_args(cli_args: list[str]) -> None: +def fancy_print_cli_args(*, cli_args: list[str]) -> None: print( "=" * LINEWIDTH + "\n" @@ -52,9 +52,9 @@ def fancy_print_cli_args(cli_args: list[str]) -> None: def docker_run( - container_id: str, - command_args: list[str], *, + container_tag: str, + command_args: list[str], use_gpus: bool, workdir: pathlib.Path, user_ids: dict[str, str], @@ -71,16 +71,16 @@ def docker_run( itertools.chain.from_iterable([["-e", f"{k}={v}"] for k, v in user_ids.items()]) ) docker_run_cli_args.extend(extra_args) - docker_run_cli_args.append(container_id) + docker_run_cli_args.append(container_tag) docker_run_cli_args.extend(command_args) cli_args = ["docker", "run"] + docker_run_cli_args - fancy_print_cli_args(cli_args) + fancy_print_cli_args(cli_args=cli_args) subprocess.run(cli_args, check=True, encoding="utf-8") -def main(args: argparse.Namespace) -> None: - run_args = parse_run_args(args.run_args) +def main(*, args: argparse.Namespace) -> None: + run_args = parse_run_args(raw_run_args=args.run_args) user_ids = get_user_ids() if args.use_gpus: @@ -90,8 +90,8 @@ def main(args: argparse.Namespace) -> None: run_args.append("-it") docker_run( - args.container_id, - args.command_args, + container_tag=args.container_tag, + command_args=args.command_args, use_gpus=args.use_gpus, workdir=args.workdir, user_ids=user_ids, @@ -102,17 +102,20 @@ def main(args: argparse.Namespace) -> None: if __name__ == "__main__": parser = argparse.ArgumentParser( usage=( - f"{sys.argv[0]} --container-id CONTAINER_ID [--use-gpus] [--interactive] " + f"{sys.argv[0]} --container-tag CONTAINER_TAG [--use-gpus] [--interactive] " "[--workdir WORKDIR] [--run-args RUN_ARGS] -- COMMAND_ARG " "[COMMAND_ARG ...]" ), description="Run tasks inside a Docker container", ) parser.add_argument( - "--container-id", + "--container-tag", type=str, required=True, - help="String ID of the container to run.", + help=( + "Container tag to identify the container, e.g. " + "492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:main" + ), ) parser.add_argument( "--use-gpus", @@ -165,4 +168,4 @@ def main(args: argparse.Namespace) -> None: sys.exit(1) parsed_args = parser.parse_args() - main(parsed_args) + main(args=parsed_args) diff --git a/ops/pipeline/build-cpu-arm64.sh b/ops/pipeline/build-cpu-arm64.sh index ff948ca0c77a..fad473e58d06 100755 --- a/ops/pipeline/build-cpu-arm64.sh +++ b/ops/pipeline/build-cpu-arm64.sh @@ -1,6 +1,6 @@ #!/bin/bash -set -euox pipefail +set -euo pipefail if [[ -z "${GITHUB_SHA:-}" ]] then @@ -8,13 +8,17 @@ then exit 1 fi +source ops/pipeline/classify-git-branch.sh +source ops/pipeline/get-docker-registry-details.sh + WHEEL_TAG=manylinux_2_28_aarch64 +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main echo "--- Build CPU code targeting ARM64" - +set -x echo "--- Build libxgboost from the source" python3 ops/docker_run.py \ - --container-id xgb-ci.aarch64 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- ops/script/build_via_cmake.sh \ --conda-env=aarch64_test \ -DUSE_OPENMP=ON \ @@ -22,12 +26,12 @@ python3 ops/docker_run.py \ echo "--- Run Google Test" python3 ops/docker_run.py \ - --container-id xgb-ci.aarch64 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- bash -c "cd build && ctest --extra-verbose" echo "--- Build binary wheel" python3 ops/docker_run.py \ - --container-id xgb-ci.aarch64 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- bash -c \ "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" python3 ops/script/rename_whl.py \ @@ -37,7 +41,7 @@ python3 ops/script/rename_whl.py \ echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ - --container-id xgb-ci.aarch64 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/*.whl python3 ops/script/rename_whl.py \ --wheel-path wheelhouse/*.whl \ @@ -45,8 +49,7 @@ python3 ops/script/rename_whl.py \ --platform-tag ${WHEEL_TAG} mv -v wheelhouse/*.whl python-package/dist/ -# Make sure that libgomp.so is vendored in the wheel -python3 ops/docker_run.py \ - --container-id xgb-ci.aarch64 \ - -- bash -c \ - "unzip -l python-package/dist/*.whl | grep libgomp || exit -1" +if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then + echo "error: libgomp.so was not vendored in the wheel" + exit -1 +fi diff --git a/ops/pipeline/build-cpu.sh b/ops/pipeline/build-cpu.sh index dc0572f0ca4d..edcfd43d56ed 100755 --- a/ops/pipeline/build-cpu.sh +++ b/ops/pipeline/build-cpu.sh @@ -1,8 +1,14 @@ #!/bin/bash -set -euox pipefail +set -euo pipefail + +source ops/pipeline/classify-git-branch.sh +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.cpu:main echo "--- Build CPU code" +set -x # This step is not necessary, but here we include it, to ensure that # DMLC_CORE_USE_CMAKE flag is correctly propagated. We want to make sure that we use @@ -15,14 +21,14 @@ echo "--- Run Google Test with sanitizer enabled" # Work around https://github.com/google/sanitizers/issues/1614 sudo sysctl vm.mmap_rnd_bits=28 python3 ops/docker_run.py \ - --container-id xgb-ci.cpu \ + --container-tag ${CONTAINER_TAG} \ -- ops/script/build_via_cmake.sh \ -DUSE_SANITIZER=ON \ -DENABLED_SANITIZERS="address;leak;undefined" \ -DCMAKE_BUILD_TYPE=Debug \ -DSANITIZER_PATH=/usr/lib/x86_64-linux-gnu/ python3 ops/docker_run.py \ - --container-id xgb-ci.cpu \ + --container-tag ${CONTAINER_TAG} \ --run-args '-e ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer -e ASAN_OPTIONS=symbolize=1 -e UBSAN_OPTIONS=print_stacktrace=1:log_path=ubsan_error.log @@ -32,10 +38,10 @@ python3 ops/docker_run.py \ echo "--- Run Google Test" python3 ops/docker_run.py \ - --container-id xgb-ci.cpu \ + --container-tag ${CONTAINER_TAG} \ -- ops/script/build_via_cmake.sh \ -DCMAKE_PREFIX_PATH=/opt/grpc \ -DPLUGIN_FEDERATED=ON python3 ops/docker_run.py \ - --container-id xgb-ci.cpu \ + --container-tag ${CONTAINER_TAG} \ -- bash -c "cd build && ctest --extra-verbose" diff --git a/ops/pipeline/build-cuda-with-rmm.sh b/ops/pipeline/build-cuda-with-rmm.sh index 479c9a1b1a28..024c9f351d1f 100755 --- a/ops/pipeline/build-cuda-with-rmm.sh +++ b/ops/pipeline/build-cuda-with-rmm.sh @@ -17,10 +17,13 @@ fi container_id="$1" source ops/pipeline/classify-git-branch.sh - -set -x +source ops/pipeline/get-docker-registry-details.sh WHEEL_TAG=manylinux_2_28_x86_64 +BUILD_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" +MANYLINUX_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main" + +set -x echo "--- Build with CUDA with RMM" @@ -33,7 +36,7 @@ fi echo "--- Build libxgboost from the source" python3 ops/docker_run.py \ - --container-id "${container_id}" \ + --container-tag "${BUILD_CONTAINER_TAG}" \ -- ops/script/build_via_cmake.sh \ -DCMAKE_PREFIX_PATH="/opt/grpc;/opt/rmm;/opt/rmm/lib64/rapids/cmake" \ -DUSE_CUDA=ON \ @@ -49,7 +52,7 @@ python3 ops/docker_run.py \ echo "--- Build binary wheel" python3 ops/docker_run.py \ - --container-id "${container_id}" \ + --container-tag "${BUILD_CONTAINER_TAG}" \ -- bash -c \ "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" python3 ops/script/rename_whl.py \ @@ -59,7 +62,7 @@ python3 ops/script/rename_whl.py \ echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ - --container-id xgb-ci.${WHEEL_TAG} \ + --container-tag "${MANYLINUX_CONTAINER_TAG}" \ -- auditwheel repair \ --plat ${WHEEL_TAG} python-package/dist/*.whl python3 ops/script/rename_whl.py \ @@ -67,8 +70,7 @@ python3 ops/script/rename_whl.py \ --commit-hash ${GITHUB_SHA} \ --platform-tag ${WHEEL_TAG} mv -v wheelhouse/*.whl python-package/dist/ -# Make sure that libgomp.so is vendored in the wheel -python3 ops/docker_run.py \ - --container-id xgb-ci.${WHEEL_TAG} \ - -- bash -c \ - "unzip -l python-package/dist/*.whl | grep libgomp || exit -1" +if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then + echo "error: libgomp.so was not vendored in the wheel" + exit -1 +fi diff --git a/ops/pipeline/build-cuda.sh b/ops/pipeline/build-cuda.sh index 49475c01c69e..2170b8a681ac 100755 --- a/ops/pipeline/build-cuda.sh +++ b/ops/pipeline/build-cuda.sh @@ -1,7 +1,7 @@ #!/bin/bash ## Build XGBoost with CUDA -set -euox pipefail +set -euo pipefail if [[ -z "${GITHUB_SHA:-}" ]] then @@ -9,9 +9,12 @@ then exit 1 fi -WHEEL_TAG=manylinux_2_28_x86_64 - source ops/pipeline/classify-git-branch.sh +source ops/pipeline/get-docker-registry-details.sh + +WHEEL_TAG=manylinux_2_28_x86_64 +BUILD_CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_rockylinux8:main +MANYLINUX_CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main echo "--- Build with CUDA" @@ -28,7 +31,7 @@ set -x # TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+ git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet python3 ops/docker_run.py \ - --container-id xgb-ci.gpu_build_rockylinux8 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- ops/script/build_via_cmake.sh \ -DCMAKE_PREFIX_PATH="/opt/grpc;/workspace/cccl" \ -DUSE_CUDA=ON \ @@ -43,7 +46,7 @@ python3 ops/docker_run.py \ echo "--- Build binary wheel" python3 ops/docker_run.py \ - --container-id xgb-ci.gpu_build_rockylinux8 \ + --container-tag ${BUILD_CONTAINER_TAG} \ -- bash -c \ "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" python3 ops/script/rename_whl.py \ @@ -53,7 +56,7 @@ python3 ops/script/rename_whl.py \ echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ - --container-id xgb-ci.manylinux_2_28_x86_64 \ + --container-tag ${MANYLINUX_CONTAINER_TAG} \ -- auditwheel repair \ --plat ${WHEEL_TAG} python-package/dist/*.whl python3 ops/script/rename_whl.py \ @@ -61,15 +64,13 @@ python3 ops/script/rename_whl.py \ --commit-hash ${GITHUB_SHA} \ --platform-tag ${WHEEL_TAG} mv -v wheelhouse/*.whl python-package/dist/ -# Make sure that libgomp.so is vendored in the wheel -python3 ops/docker_run.py \ - --container-id xgb-ci.manylinux_2_28_x86_64 \ - -- bash -c "unzip -l python-package/dist/*.whl | grep libgomp || exit -1" +if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then + echo "error: libgomp.so was not vendored in the wheel" + exit -1 +fi # Generate the meta info which includes xgboost version and the commit info -python3 ops/docker_run.py \ ---container-id xgb-ci.gpu_build_rockylinux8 \ --- python ops/script/format_wheel_meta.py \ +python3 ops/script/format_wheel_meta.py \ --wheel-path python-package/dist/*.whl \ --commit-hash ${GITHUB_SHA} \ --platform-tag ${WHEEL_TAG} \ diff --git a/ops/pipeline/build-gpu-rpkg.sh b/ops/pipeline/build-gpu-rpkg.sh index d1384ef766a6..a96a2a4a0247 100755 --- a/ops/pipeline/build-gpu-rpkg.sh +++ b/ops/pipeline/build-gpu-rpkg.sh @@ -1,6 +1,6 @@ #!/bin/bash -set -euox pipefail +set -euo pipefail if [[ -z "${GITHUB_SHA:-}" ]] then @@ -8,8 +8,13 @@ then exit 1 fi +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main + echo "--- Build XGBoost R package with CUDA" +set -x python3 ops/docker_run.py \ - --container-id xgb-ci.gpu_build_r_rockylinux8 \ + --container-tag ${CONTAINER_TAG} \ -- ops/pipeline/build-gpu-rpkg-impl.sh \ ${GITHUB_SHA} diff --git a/ops/pipeline/build-jvm-doc.sh b/ops/pipeline/build-jvm-doc.sh index 00fdac7a1353..a61f903cb5b9 100755 --- a/ops/pipeline/build-jvm-doc.sh +++ b/ops/pipeline/build-jvm-doc.sh @@ -3,9 +3,7 @@ ## Note: this script assumes that the user has already built libxgboost4j.so ## and place it in the lib/ directory. -set -euox pipefail - -echo "--- Build JVM packages doc" +set -euo pipefail if [[ -z ${BRANCH_NAME:-} ]] then @@ -19,6 +17,12 @@ then exit 2 fi +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main + +echo "--- Build JVM packages doc" +set -x python3 ops/docker_run.py \ - --container-id xgb-ci.jvm_gpu_build \ + --container-tag ${CONTAINER_TAG} \ -- ops/pipeline/build-jvm-doc-impl.sh ${BRANCH_NAME} diff --git a/ops/pipeline/build-jvm-gpu.sh b/ops/pipeline/build-jvm-gpu.sh index 7656a3d2f188..3d6f446eb462 100755 --- a/ops/pipeline/build-jvm-gpu.sh +++ b/ops/pipeline/build-jvm-gpu.sh @@ -4,6 +4,9 @@ set -euo pipefail source ops/pipeline/classify-git-branch.sh +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main echo "--- Build libxgboost4j.so with CUDA" @@ -29,5 +32,5 @@ mkdir -p build-gpu/ # TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+ git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet --depth 1 python3 ops/docker_run.py \ - --container-id xgb-ci.jvm_gpu_build \ + --container-tag ${CONTAINER_TAG} \ -- bash -c "${COMMAND}" diff --git a/ops/pipeline/build-jvm-manylinux2014.sh b/ops/pipeline/build-jvm-manylinux2014.sh index e69dd3682b90..4eaae23bf7bc 100755 --- a/ops/pipeline/build-jvm-manylinux2014.sh +++ b/ops/pipeline/build-jvm-manylinux2014.sh @@ -1,7 +1,7 @@ #!/bin/bash ## Build libxgboost4j.so targeting glibc 2.17 systems -set -euox pipefail +set -euo pipefail if [[ $# -ne 1 ]] then @@ -10,15 +10,18 @@ then fi arch=$1 +container_id="xgb-ci.manylinux2014_${arch}" -image="xgb-ci.manylinux2014_${arch}" +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" # Build XGBoost4J binary echo "--- Build libxgboost4j.so (targeting glibc 2.17)" set -x mkdir build python3 ops/docker_run.py \ - --container-id ${image} \ + --container-tag "${CONTAINER_TAG}" \ -- bash -c \ "cd build && cmake .. -DJVM_BINDINGS=ON -DUSE_OPENMP=ON && make -j$(nproc)" ldd lib/libxgboost4j.so diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index a8f5af8bc3cd..b572fed0186a 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -1,6 +1,6 @@ #!/bin/bash -set -euox pipefail +set -euo pipefail if [[ -z "${GITHUB_SHA:-}" ]] then @@ -16,24 +16,27 @@ fi arch="$1" -WHEEL_TAG="manylinux2014_${arch}" -image="xgb-ci.${WHEEL_TAG}" +source ops/pipeline/get-docker-registry-details.sh +WHEEL_TAG="manylinux2014_${arch}" +container_id="xgb-ci.${WHEEL_TAG}" python_bin="/opt/python/cp310-cp310/bin/python" +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" echo "--- Build binary wheel for ${WHEEL_TAG}" +set -x # Patch to add warning about manylinux2014 variant patch -p0 < ops/patch/remove_nccl_dep.patch patch -p0 < ops/patch/manylinux2014_warning.patch python3 ops/docker_run.py \ - --container-id ${image} \ + --container-tag "${CONTAINER_TAG}" \ -- bash -c \ "cd python-package && ${python_bin} -m pip wheel --no-deps -v . --wheel-dir dist/" git checkout python-package/pyproject.toml python-package/xgboost/core.py # discard the patch python3 ops/docker_run.py \ - --container-id ${image} \ + --container-tag "${CONTAINER_TAG}" \ -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/*.whl python3 ops/script/rename_whl.py \ --wheel-path wheelhouse/*.whl \ @@ -48,13 +51,13 @@ echo "--- Build binary wheel for ${WHEEL_TAG} (CPU only)" patch -p0 < ops/patch/remove_nccl_dep.patch patch -p0 < ops/patch/cpu_only_pypkg.patch python3 ops/docker_run.py \ - --container-id ${image} \ + --container-tag "${CONTAINER_TAG}" \ -- bash -c \ "cd python-package && ${python_bin} -m pip wheel --no-deps -v . --wheel-dir dist/" git checkout python-package/pyproject.toml # discard the patch python3 ops/docker_run.py \ - --container-id ${image} \ + --container-tag "${CONTAINER_TAG}" \ -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/xgboost_cpu-*.whl python3 ops/script/rename_whl.py \ --wheel-path wheelhouse/xgboost_cpu-*.whl \ diff --git a/ops/pipeline/build-test-jvm-packages.sh b/ops/pipeline/build-test-jvm-packages.sh index d04cc3510de5..aea905e00294 100755 --- a/ops/pipeline/build-test-jvm-packages.sh +++ b/ops/pipeline/build-test-jvm-packages.sh @@ -12,6 +12,8 @@ EOF set -euo pipefail +source ops/pipeline/get-docker-registry-details.sh + for arg in "SCALA_VERSION" do if [[ -z "${!arg:-}" ]] @@ -21,8 +23,10 @@ do fi done +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm:main + set -x -python3 ops/docker_run.py --container-id xgb-ci.jvm \ +python3 ops/docker_run.py --container-tag ${CONTAINER_TAG} \ --run-args "-e SCALA_VERSION=${SCALA_VERSION}" \ -- ops/pipeline/build-test-jvm-packages-impl.sh diff --git a/ops/pipeline/deploy-jvm-packages.sh b/ops/pipeline/deploy-jvm-packages.sh index e821f334b9d2..f76724a702cb 100755 --- a/ops/pipeline/deploy-jvm-packages.sh +++ b/ops/pipeline/deploy-jvm-packages.sh @@ -1,9 +1,10 @@ #!/bin/bash ## Deploy JVM packages to S3 bucket -set -euox pipefail +set -euo pipefail source ops/pipeline/enforce-ci.sh +source ops/pipeline/get-docker-registry-details.sh if [[ "$#" -lt 3 ]] then @@ -15,9 +16,13 @@ variant="$1" container_id="$2" scala_version="$3" +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" + +set -x + if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] then echo "--- Deploy JVM packages to xgboost-maven-repo S3 repo" - python3 ops/docker_run.py --container-id "${container_id}" \ + python3 ops/docker_run.py --container-tag "${CONTAINER_TAG}" \ -- ops/pipeline/deploy-jvm-packages-impl.sh "${variant}" "${scala_version}" fi diff --git a/ops/pipeline/get-docker-registry-details.sh b/ops/pipeline/get-docker-registry-details.sh new file mode 100755 index 000000000000..000db9a2655a --- /dev/null +++ b/ops/pipeline/get-docker-registry-details.sh @@ -0,0 +1,5 @@ +## Get details for AWS ECR (Elastic Container Registry) in environment variables + +ECR_AWS_ACCOUNT_ID="492475357299" +ECR_AWS_REGION="us-west-2" +DOCKER_REGISTRY_URL="${ECR_AWS_ACCOUNT_ID}.dkr.ecr.${ECR_AWS_REGION}.amazonaws.com" diff --git a/ops/pipeline/login-docker-registry.sh b/ops/pipeline/login-docker-registry.sh new file mode 100755 index 000000000000..a03987f484b8 --- /dev/null +++ b/ops/pipeline/login-docker-registry.sh @@ -0,0 +1,11 @@ +## Log into AWS ECR (Elastic Container Registry) to be able to pull containers from it +## Note. Requires valid AWS credentials + +set -euo pipefail + +source ops/pipeline/get-docker-registry-details.sh + +echo "aws ecr get-login-password --region ${ECR_AWS_REGION} |" \ + "docker login --username AWS --password-stdin ${DOCKER_REGISTRY_URL}" +aws ecr get-login-password --region ${ECR_AWS_REGION} \ + | docker login --username AWS --password-stdin ${DOCKER_REGISTRY_URL} diff --git a/ops/pipeline/run-clang-tidy.sh b/ops/pipeline/run-clang-tidy.sh index 676f302009ce..3f2019f3a330 100755 --- a/ops/pipeline/run-clang-tidy.sh +++ b/ops/pipeline/run-clang-tidy.sh @@ -1,9 +1,13 @@ #!/bin/bash -set -euox pipefail +set -euo pipefail -echo "--- Run clang-tidy" +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.clang_tidy:main +echo "--- Run clang-tidy" +set -x python3 ops/docker_run.py \ - --container-id xgb-ci.clang_tidy \ + --container-tag ${CONTAINER_TAG} \ -- python3 ops/script/run_clang_tidy.py --cuda-archs 75 diff --git a/ops/pipeline/test-cpp-gpu.sh b/ops/pipeline/test-cpp-gpu.sh index 9a0cd4743c18..9fdcd314264d 100755 --- a/ops/pipeline/test-cpp-gpu.sh +++ b/ops/pipeline/test-cpp-gpu.sh @@ -7,36 +7,34 @@ then echo "Usage: $0 {gpu,gpu-rmm,mgpu}" exit 1 fi -arg=$1 +suite=$1 -case "${arg}" in +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu:main + +case "${suite}" in gpu) echo "--- Run Google Tests, using a single GPU" - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ - -- nvidia-smi - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ + python3 ops/docker_run.py --container-tag ${CONTAINER_TAG} --use-gpus \ -- build/testxgboost ;; gpu-rmm) echo "--- Run Google Tests, using a single GPU, RMM enabled" - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ - -- nvidia-smi - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ + python3 ops/docker_run.py --container-tag ${CONTAINER_TAG} --use-gpus \ -- build/testxgboost --use-rmm-pool ;; mgpu) echo "--- Run Google Tests, using multiple GPUs" - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ - -- nvidia-smi - python3 ops/docker_run.py --container-id xgb-ci.gpu --use-gpus \ + python3 ops/docker_run.py --container-tag ${CONTAINER_TAG} --use-gpus \ --run-args='--shm-size=4g' \ -- build/testxgboost --gtest_filter=*MGPU* ;; *) - echo "Unrecognized arg: ${arg}" + echo "Unrecognized suite: ${suite}" exit 2 ;; esac diff --git a/ops/pipeline/test-jvm-gpu.sh b/ops/pipeline/test-jvm-gpu.sh index 380db97c787c..0f517832113f 100755 --- a/ops/pipeline/test-jvm-gpu.sh +++ b/ops/pipeline/test-jvm-gpu.sh @@ -23,10 +23,12 @@ do fi done +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.jvm_gpu_build:main + set -x -python3 ops/docker_run.py --container-id xgb-ci.jvm_gpu_build --use-gpus \ - -- nvidia-smi -python3 ops/docker_run.py --container-id xgb-ci.jvm_gpu_build --use-gpus \ +python3 ops/docker_run.py --container-tag ${CONTAINER_TAG} --use-gpus \ --run-args "-e SCALA_VERSION=${SCALA_VERSION} -e USE_CUDA=1 -e SKIP_NATIVE_BUILD=1 --shm-size=4g --privileged" \ -- ops/pipeline/build-test-jvm-packages-impl.sh diff --git a/ops/pipeline/test-python-wheel.sh b/ops/pipeline/test-python-wheel.sh index b4dd59b7cb0e..56d54fd65d02 100755 --- a/ops/pipeline/test-python-wheel.sh +++ b/ops/pipeline/test-python-wheel.sh @@ -19,7 +19,10 @@ else gpu_option="" fi +source ops/pipeline/get-docker-registry-details.sh +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" + set -x -python3 ops/docker_run.py --container-id "${container_id}" ${gpu_option} \ +python3 ops/docker_run.py --container-tag "${CONTAINER_TAG}" ${gpu_option} \ --run-args='--shm-size=4g --privileged' \ -- bash ops/pipeline/test-python-wheel-impl.sh "${suite}" From c0eefb52c7aafa85ba657bd9e6fe58ff3e120f07 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 15:19:34 -0800 Subject: [PATCH 07/27] Remove build_via_cmake.sh Also combine build-cuda.sh / build-cuda-with-rmm.sh --- .github/workflows/i386.yml | 38 +++------ .github/workflows/main.yml | 8 +- .github/workflows/misc.yml | 8 +- doc/contrib/ci.rst | 6 +- ops/pipeline/build-cpu-arm64-impl.sh | 32 +++++++ ops/pipeline/build-cpu-arm64.sh | 18 +--- ops/pipeline/build-cpu-impl.sh | 57 +++++++++++++ ops/pipeline/build-cpu.sh | 33 +++----- ops/pipeline/build-cuda-impl.sh | 51 +++++++++++ ops/pipeline/build-cuda-with-rmm.sh | 76 ----------------- ops/pipeline/build-cuda.sh | 89 ++++++++++++-------- ops/pipeline/build-test-cpu-nonomp.sh | 19 +++++ ops/pipeline/build-test-jvm-packages-impl.sh | 2 +- ops/pipeline/test-cpp-i386-impl.sh | 22 +++++ ops/pipeline/test-cpp-i386.sh | 13 +++ ops/script/build_via_cmake.sh | 54 ------------ 16 files changed, 281 insertions(+), 245 deletions(-) create mode 100755 ops/pipeline/build-cpu-arm64-impl.sh create mode 100755 ops/pipeline/build-cpu-impl.sh create mode 100755 ops/pipeline/build-cuda-impl.sh delete mode 100755 ops/pipeline/build-cuda-with-rmm.sh create mode 100755 ops/pipeline/build-test-cpu-nonomp.sh create mode 100755 ops/pipeline/test-cpp-i386-impl.sh create mode 100755 ops/pipeline/test-cpp-i386.sh delete mode 100755 ops/script/build_via_cmake.sh diff --git a/.github/workflows/i386.yml b/.github/workflows/i386.yml index 8b7c71a82bf8..26ceaf758f3a 100644 --- a/.github/workflows/i386.yml +++ b/.github/workflows/i386.yml @@ -3,7 +3,7 @@ name: XGBoost-i386-test on: [push, pull_request] permissions: - contents: read # to fetch code (actions/checkout) + contents: read # to fetch code (actions/checkout) concurrency: group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }} @@ -12,32 +12,16 @@ concurrency: jobs: build-32bit: name: Build 32-bit - runs-on: ubuntu-latest - services: - registry: - image: registry:2 - ports: - - 5000:5000 + runs-on: + - runs-on=${{ github.run_id }} + - runner=linux-amd64-cpu + - tag=i386-build-32bit steps: + # Restart Docker daemon so that it recognizes the ephemeral disks + - run: sudo systemctl restart docker - uses: actions/checkout@v4 with: - submodules: 'true' - - name: Set up Docker Buildx - uses: docker/setup-buildx-action@v3 - with: - driver-opts: network=host - - name: Build and push container - uses: docker/build-push-action@v6 - with: - context: . - file: ops/docker/dockerfile/Dockerfile.i386 - push: true - tags: localhost:5000/xgboost/build-32bit:latest - cache-from: type=gha - cache-to: type=gha,mode=max - - name: Build XGBoost - run: | - docker run --rm -v $PWD:/workspace -w /workspace \ - -e CXXFLAGS='-Wno-error=overloaded-virtual -Wno-error=maybe-uninitialized -Wno-error=redundant-move' \ - localhost:5000/xgboost/build-32bit:latest \ - bash ops/script/build_via_cmake.sh + submodules: "true" + - name: Log into Docker registry (AWS ECR) + run: bash ops/pipeline/login-docker-registry.sh + - run: bash ops/pipeline/test-cpp-i386.sh diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index e62cc3f35e59..d259105ce877 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -70,7 +70,8 @@ jobs: submodules: "true" - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - - run: bash ops/pipeline/build-cuda.sh + - run: | + bash ops/pipeline/build-cuda.sh xgb-ci.gpu_build_rockylinux8 disable-rmm - name: Stash files run: | bash ops/pipeline/stash-artifacts.sh stash build-cuda \ @@ -98,7 +99,7 @@ jobs: - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - run: | - bash ops/pipeline/build-cuda-with-rmm.sh xgb-ci.gpu_build_rockylinux8 + bash ops/pipeline/build-cuda.sh xgb-ci.gpu_build_rockylinux8 enable-rmm - name: Stash files run: | bash ops/pipeline/stash-artifacts.sh \ @@ -123,7 +124,8 @@ jobs: - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - run: | - bash ops/pipeline/build-cuda-with-rmm.sh xgb-ci.gpu_build_rockylinux8_dev_ver + bash ops/pipeline/build-cuda.sh \ + xgb-ci.gpu_build_rockylinux8_dev_ver enable-rmm build-manylinux2014: name: Build manylinux2014_${{ matrix.arch }} wheel diff --git a/.github/workflows/misc.yml b/.github/workflows/misc.yml index 0ced355d7bff..54d0078a6164 100644 --- a/.github/workflows/misc.yml +++ b/.github/workflows/misc.yml @@ -24,12 +24,8 @@ jobs: - name: Install system packages run: | sudo apt-get install -y --no-install-recommends ninja-build - - name: Build and install XGBoost - run: bash ops/script/build_via_cmake.sh -DUSE_OPENMP=OFF - - name: Run gtest binary - run: | - cd build - ctest --extra-verbose + - name: Build and test XGBoost + run: bash ops/pipeline/build-test-cpu-nonomp.sh c-api-demo: name: Test installing XGBoost lib + building the C API demo diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index 456f8ce1ae0d..d2636037b8a8 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -137,7 +137,7 @@ For example: # Run without GPU python3 ops/docker_run.py \ --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ - -- bash ops/pipeline/build-cpu-impl.sh + -- bash ops/pipeline/build-cpu-impl.sh cpu # Run with NVIDIA GPU python3 ops/docker_run.py \ @@ -479,7 +479,7 @@ Here is an example with ``docker_run.py``: # Run without GPU python3 ops/docker_run.py \ --container-tag 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ - -- bash ops/script/build_via_cmake.sh + -- bash ops/pipeline/build-cpu-impl.sh cpu # Run with NVIDIA GPU # Allocate extra space in /dev/shm to enable NCCL @@ -499,7 +499,7 @@ which are translated to the following ``docker run`` invocations: -e CI_BUILD_UID= -e CI_BUILD_USER= \ -e CI_BUILD_GID= -e CI_BUILD_GROUP= \ 492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.cpu:main \ - bash ops/script/build_via_cmake.sh + bash ops/pipeline/build-cpu-impl.sh cpu docker run --rm --pid=host --gpus all \ -w /workspace -v /path/to/xgboost:/workspace \ diff --git a/ops/pipeline/build-cpu-arm64-impl.sh b/ops/pipeline/build-cpu-arm64-impl.sh new file mode 100755 index 000000000000..ae0aa7d5b4ce --- /dev/null +++ b/ops/pipeline/build-cpu-arm64-impl.sh @@ -0,0 +1,32 @@ +#!/bin/bash +## Build and test XGBoost with ARM64 CPU +## Companion script for ops/pipeline/build-cpu-arm64.sh + +set -euox pipefail + +source activate aarch64_test + +echo "--- Build libxgboost from the source" +mkdir -p build +pushd build +cmake .. \ + -GNinja \ + -DCMAKE_PREFIX_PATH="${CONDA_PREFIX}" \ + -DUSE_OPENMP=ON \ + -DHIDE_CXX_SYMBOLS=ON \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ + -DBUILD_DEPRECATED_CLI=ON +time ninja -v + +echo "--- Run Google Test" +ctest --extra-verbose +popd + +echo "--- Build binary wheel" +pushd python-package +rm -rfv dist/* +pip wheel --no-deps -v . --wheel-dir dist/ +popd diff --git a/ops/pipeline/build-cpu-arm64.sh b/ops/pipeline/build-cpu-arm64.sh index fad473e58d06..2e0f0ea9ef4d 100755 --- a/ops/pipeline/build-cpu-arm64.sh +++ b/ops/pipeline/build-cpu-arm64.sh @@ -1,4 +1,5 @@ #!/bin/bash +## Build and test XGBoost with ARM64 CPU set -euo pipefail @@ -16,24 +17,9 @@ CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main echo "--- Build CPU code targeting ARM64" set -x -echo "--- Build libxgboost from the source" python3 ops/docker_run.py \ --container-tag ${BUILD_CONTAINER_TAG} \ - -- ops/script/build_via_cmake.sh \ - --conda-env=aarch64_test \ - -DUSE_OPENMP=ON \ - -DHIDE_CXX_SYMBOL=ON - -echo "--- Run Google Test" -python3 ops/docker_run.py \ - --container-tag ${BUILD_CONTAINER_TAG} \ - -- bash -c "cd build && ctest --extra-verbose" - -echo "--- Build binary wheel" -python3 ops/docker_run.py \ - --container-tag ${BUILD_CONTAINER_TAG} \ - -- bash -c \ - "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" + -- ops/pipeline/build-cpu-arm64-impl.sh python3 ops/script/rename_whl.py \ --wheel-path python-package/dist/*.whl \ --commit-hash ${GITHUB_SHA} \ diff --git a/ops/pipeline/build-cpu-impl.sh b/ops/pipeline/build-cpu-impl.sh new file mode 100755 index 000000000000..55e205d3edfa --- /dev/null +++ b/ops/pipeline/build-cpu-impl.sh @@ -0,0 +1,57 @@ +#!/bin/bash +## Build and test XGBoost with AMD64 CPU +## Companion script for ops/pipeline/build-cpu.sh + +set -euox pipefail + +if [[ "$#" -lt 1 ]] +then + echo "Usage: $0 {cpu,cpu-sanitizer}" + exit 1 +fi +suite="$1" + +mkdir -p build +pushd build + +case "${suite}" in + cpu) + echo "--- Build libxgboost from the source" + cmake .. \ + -GNinja \ + -DHIDE_CXX_SYMBOLS=ON \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ + -DBUILD_DEPRECATED_CLI=ON \ + -DCMAKE_PREFIX_PATH='/opt/grpc' \ + -DPLUGIN_FEDERATED=ON + time ninja -v + echo "--- Run Google Test" + ctest --extra-verbose + ;; + cpu-sanitizer) + echo "--- Run Google Test with sanitizer" + cmake .. \ + -GNinja \ + -DHIDE_CXX_SYMBOLS=ON \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ + -DBUILD_DEPRECATED_CLI=ON \ + -DUSE_SANITIZER=ON \ + -DENABLED_SANITIZERS="address;leak;undefined" \ + -DCMAKE_BUILD_TYPE=Debug \ + -DSANITIZER_PATH=/usr/lib/x86_64-linux-gnu/ + time ninja -v + ./testxgboost --gtest_filter=-*DeathTest* + ;; + *) + echo "Unrecognized argument: $suite" + exit 1 + ;; +esac + +popd diff --git a/ops/pipeline/build-cpu.sh b/ops/pipeline/build-cpu.sh index edcfd43d56ed..04fd4944eae7 100755 --- a/ops/pipeline/build-cpu.sh +++ b/ops/pipeline/build-cpu.sh @@ -1,4 +1,5 @@ #!/bin/bash +## Build and test XGBoost with AMD64 CPU set -euo pipefail @@ -16,32 +17,20 @@ set -x # include/dmlc/build_config_default.h. rm -fv dmlc-core/include/dmlc/build_config_default.h -# Sanitizer tests -echo "--- Run Google Test with sanitizer enabled" +# Test with sanitizer +export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer +export ASAN_OPTIONS='symbolize=1' +export UBSAN_OPTIONS='print_stacktrace=1:log_path=ubsan_error.log' # Work around https://github.com/google/sanitizers/issues/1614 sudo sysctl vm.mmap_rnd_bits=28 python3 ops/docker_run.py \ --container-tag ${CONTAINER_TAG} \ - -- ops/script/build_via_cmake.sh \ - -DUSE_SANITIZER=ON \ - -DENABLED_SANITIZERS="address;leak;undefined" \ - -DCMAKE_BUILD_TYPE=Debug \ - -DSANITIZER_PATH=/usr/lib/x86_64-linux-gnu/ -python3 ops/docker_run.py \ - --container-tag ${CONTAINER_TAG} \ - --run-args '-e ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer - -e ASAN_OPTIONS=symbolize=1 - -e UBSAN_OPTIONS=print_stacktrace=1:log_path=ubsan_error.log - --cap-add SYS_PTRACE' \ - -- bash -c \ - "cd build && ./testxgboost --gtest_filter=-*DeathTest*" + --run-args '-e ASAN_SYMBOLIZER_PATH -e ASAN_OPTIONS -e UBSAN_OPTIONS + --cap-add SYS_PTRACE' \ + -- bash ops/pipeline/build-cpu-impl.sh cpu-sanitizer -echo "--- Run Google Test" -python3 ops/docker_run.py \ - --container-tag ${CONTAINER_TAG} \ - -- ops/script/build_via_cmake.sh \ - -DCMAKE_PREFIX_PATH=/opt/grpc \ - -DPLUGIN_FEDERATED=ON +# Test without sanitizer +rm -rf build/ python3 ops/docker_run.py \ --container-tag ${CONTAINER_TAG} \ - -- bash -c "cd build && ctest --extra-verbose" + -- bash ops/pipeline/build-cpu-impl.sh cpu diff --git a/ops/pipeline/build-cuda-impl.sh b/ops/pipeline/build-cuda-impl.sh new file mode 100755 index 000000000000..198936852948 --- /dev/null +++ b/ops/pipeline/build-cuda-impl.sh @@ -0,0 +1,51 @@ +#!/bin/bash +## Build XGBoost with CUDA +## Companion script for ops/pipeline/build-cuda.sh + +set -euox pipefail + +if [[ "${BUILD_ONLY_SM75:-}" == 1 ]] +then + cmake_args='-DGPU_COMPUTE_VER=75' +else + cmake_args='' +fi + +if [[ "${USE_RMM:-}" == 1 ]] +then + cmake_prefix_path='/opt/grpc;/opt/rmm;/opt/rmm/lib64/rapids/cmake' + cmake_args="${cmake_args} -DPLUGIN_RMM=ON" +else + cmake_prefix_path='/opt/grpc;/workspace/cccl' +fi + +# Disable CMAKE_COMPILE_WARNING_AS_ERROR option temporarily until +# https://github.com/dmlc/xgboost/issues/10400 is fixed +echo "--- Build libxgboost from the source" +mkdir -p build +pushd build +cmake .. \ + -GNinja \ + -DCMAKE_PREFIX_PATH="${cmake_prefix_path}" \ + -DUSE_CUDA=ON \ + -DUSE_OPENMP=ON \ + -DHIDE_CXX_SYMBOLS=ON \ + -DPLUGIN_FEDERATED=ON \ + -DUSE_NCCL=ON \ + -DUSE_NCCL_LIB_PATH=ON \ + -DNCCL_INCLUDE_DIR=/usr/include \ + -DUSE_DLOPEN_NCCL=ON \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ + -DBUILD_DEPRECATED_CLI=ON \ + ${cmake_args} +time ninja -v +popd + +echo "--- Build binary wheel" +pushd python-package +rm -rfv dist/* +pip wheel --no-deps -v . --wheel-dir dist/ +popd diff --git a/ops/pipeline/build-cuda-with-rmm.sh b/ops/pipeline/build-cuda-with-rmm.sh deleted file mode 100755 index 024c9f351d1f..000000000000 --- a/ops/pipeline/build-cuda-with-rmm.sh +++ /dev/null @@ -1,76 +0,0 @@ -#!/bin/bash -## Build XGBoost with CUDA + RMM support - -set -euo pipefail - -if [[ -z "${GITHUB_SHA:-}" ]] -then - echo "Make sure to set environment variable GITHUB_SHA" - exit 1 -fi - -if [[ "$#" -lt 1 ]] -then - echo "Usage: $0 [container_id]" - exit 1 -fi -container_id="$1" - -source ops/pipeline/classify-git-branch.sh -source ops/pipeline/get-docker-registry-details.sh - -WHEEL_TAG=manylinux_2_28_x86_64 -BUILD_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" -MANYLINUX_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main" - -set -x - -echo "--- Build with CUDA with RMM" - -if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]] -then - arch_flag="-DGPU_COMPUTE_VER=75" -else - arch_flag="" -fi - -echo "--- Build libxgboost from the source" -python3 ops/docker_run.py \ - --container-tag "${BUILD_CONTAINER_TAG}" \ - -- ops/script/build_via_cmake.sh \ - -DCMAKE_PREFIX_PATH="/opt/grpc;/opt/rmm;/opt/rmm/lib64/rapids/cmake" \ - -DUSE_CUDA=ON \ - -DUSE_OPENMP=ON \ - -DHIDE_CXX_SYMBOLS=ON \ - -DPLUGIN_FEDERATED=ON \ - -DPLUGIN_RMM=ON \ - -DUSE_NCCL=ON \ - -DUSE_NCCL_LIB_PATH=ON \ - -DNCCL_INCLUDE_DIR=/usr/include \ - -DUSE_DLOPEN_NCCL=ON \ - ${arch_flag} - -echo "--- Build binary wheel" -python3 ops/docker_run.py \ - --container-tag "${BUILD_CONTAINER_TAG}" \ - -- bash -c \ - "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" -python3 ops/script/rename_whl.py \ - --wheel-path python-package/dist/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} - -echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" -python3 ops/docker_run.py \ - --container-tag "${MANYLINUX_CONTAINER_TAG}" \ - -- auditwheel repair \ - --plat ${WHEEL_TAG} python-package/dist/*.whl -python3 ops/script/rename_whl.py \ - --wheel-path wheelhouse/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} -mv -v wheelhouse/*.whl python-package/dist/ -if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then - echo "error: libgomp.so was not vendored in the wheel" - exit -1 -fi diff --git a/ops/pipeline/build-cuda.sh b/ops/pipeline/build-cuda.sh index 2170b8a681ac..1965c50563ed 100755 --- a/ops/pipeline/build-cuda.sh +++ b/ops/pipeline/build-cuda.sh @@ -9,46 +9,57 @@ then exit 1 fi +if [[ "$#" -lt 2 ]] +then + echo "Usage: $0 [container_id] {enable-rmm,disable-rmm}" + exit 2 +fi +container_id="$1" +rmm_flag="$2" + +# Validate RMM flag +case "${rmm_flag}" in + enable-rmm) + export USE_RMM=1 + ;; + disable-rmm) + export USE_RMM=0 + ;; + *) + echo "Unrecognized argument: $rmm_flag" + exit 3 + ;; +esac + source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh WHEEL_TAG=manylinux_2_28_x86_64 -BUILD_CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_rockylinux8:main -MANYLINUX_CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main +BUILD_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" +MANYLINUX_CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.${WHEEL_TAG}:main" echo "--- Build with CUDA" if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]] then - arch_flag="-DGPU_COMPUTE_VER=75" + export BUILD_ONLY_SM75=1 else - arch_flag="" + export BUILD_ONLY_SM75=0 +fi + +if [[ ${USE_RMM} == 0 ]] +then + # Work around https://github.com/NVIDIA/cccl/issues/1956 + # TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+ + git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet fi -echo "--- Build libxgboost from the source" set -x -# Work around https://github.com/NVIDIA/cccl/issues/1956 -# TODO(hcho3): Remove this once new CUDA version ships with CCCL 2.6.0+ -git clone https://github.com/NVIDIA/cccl.git -b v2.6.1 --quiet -python3 ops/docker_run.py \ - --container-tag ${BUILD_CONTAINER_TAG} \ - -- ops/script/build_via_cmake.sh \ - -DCMAKE_PREFIX_PATH="/opt/grpc;/workspace/cccl" \ - -DUSE_CUDA=ON \ - -DUSE_OPENMP=ON \ - -DHIDE_CXX_SYMBOLS=ON \ - -DPLUGIN_FEDERATED=ON \ - -DUSE_NCCL=ON \ - -DUSE_NCCL_LIB_PATH=ON \ - -DNCCL_INCLUDE_DIR=/usr/include \ - -DUSE_DLOPEN_NCCL=ON \ - ${arch_flag} -echo "--- Build binary wheel" python3 ops/docker_run.py \ --container-tag ${BUILD_CONTAINER_TAG} \ - -- bash -c \ - "cd python-package && rm -rf dist/* && pip wheel --no-deps -v . --wheel-dir dist/" + --run-args='-e BUILD_ONLY_SM75 -e USE_RMM' \ + -- ops/pipeline/build-cuda-impl.sh python3 ops/script/rename_whl.py \ --wheel-path python-package/dist/*.whl \ --commit-hash ${GITHUB_SHA} \ @@ -69,18 +80,22 @@ if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then exit -1 fi -# Generate the meta info which includes xgboost version and the commit info -python3 ops/script/format_wheel_meta.py \ - --wheel-path python-package/dist/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} \ - --meta-path python-package/dist/ - -echo "--- Upload Python wheel" -if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +if [[ $USE_RMM == 0 ]] then - aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ - --acl public-read --no-progress - aws s3 cp python-package/dist/meta.json s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ - --acl public-read --no-progress + # Generate the meta info which includes xgboost version and the commit info + echo "--- Generate meta info" + python3 ops/script/format_wheel_meta.py \ + --wheel-path python-package/dist/*.whl \ + --commit-hash ${GITHUB_SHA} \ + --platform-tag ${WHEEL_TAG} \ + --meta-path python-package/dist/ + + echo "--- Upload Python wheel" + if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] + then + aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ + --acl public-read --no-progress + aws s3 cp python-package/dist/meta.json s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ + --acl public-read --no-progress + fi fi diff --git a/ops/pipeline/build-test-cpu-nonomp.sh b/ops/pipeline/build-test-cpu-nonomp.sh new file mode 100755 index 000000000000..5bd6fa7f9d32 --- /dev/null +++ b/ops/pipeline/build-test-cpu-nonomp.sh @@ -0,0 +1,19 @@ +#!/bin/bash +## Ensure that XGBoost can function with OpenMP disabled + +set -euox pipefail + +mkdir -p build +pushd build +cmake .. \ + -GNinja \ + -DUSE_OPENMP=OFF \ + -DHIDE_CXX_SYMBOLS=ON \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ + -DBUILD_DEPRECATED_CLI=ON +time ninja -v +ctest --extra-verbose +popd diff --git a/ops/pipeline/build-test-jvm-packages-impl.sh b/ops/pipeline/build-test-jvm-packages-impl.sh index ed95ba3368ab..61550d61bbae 100755 --- a/ops/pipeline/build-test-jvm-packages-impl.sh +++ b/ops/pipeline/build-test-jvm-packages-impl.sh @@ -1,6 +1,6 @@ #!/bin/bash ## Build and test JVM packages. -## Companion script for build-test-jvm-packages.sh. +## Companion script for ops/pipeline/build-test-jvm-packages.sh. ## ## Note. This script takes in all inputs via environment variables. diff --git a/ops/pipeline/test-cpp-i386-impl.sh b/ops/pipeline/test-cpp-i386-impl.sh new file mode 100755 index 000000000000..1f7653fd5e1e --- /dev/null +++ b/ops/pipeline/test-cpp-i386-impl.sh @@ -0,0 +1,22 @@ +#!/bin/bash +## Run C++ tests for i386 +## Companion script for ops/pipeline/test-cpp-i386.sh + +set -euox pipefail + +export CXXFLAGS='-Wno-error=overloaded-virtual -Wno-error=maybe-uninitialized -Wno-error=redundant-move -Wno-narrowing' + +mkdir -p build +pushd build + +cmake .. \ + -GNinja \ + -DGOOGLE_TEST=ON \ + -DUSE_DMLC_GTEST=ON \ + -DENABLE_ALL_WARNINGS=ON \ + -DCMAKE_COMPILE_WARNING_AS_ERROR=ON +time ninja -v +# TODO(hcho3): Run gtest for i386 +# ./testxgboost + +popd diff --git a/ops/pipeline/test-cpp-i386.sh b/ops/pipeline/test-cpp-i386.sh new file mode 100755 index 000000000000..19223041c3fb --- /dev/null +++ b/ops/pipeline/test-cpp-i386.sh @@ -0,0 +1,13 @@ +#!/bin/bash +## Run C++ tests for i386 + +set -euo pipefail + +source ops/pipeline/get-docker-registry-details.sh + +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/xgb-ci.i386:main" + +set -x +python3 ops/docker_run.py \ + --container-tag ${CONTAINER_TAG} \ + -- bash ops/pipeline/test-cpp-i386-impl.sh diff --git a/ops/script/build_via_cmake.sh b/ops/script/build_via_cmake.sh deleted file mode 100755 index 00a571584ea4..000000000000 --- a/ops/script/build_via_cmake.sh +++ /dev/null @@ -1,54 +0,0 @@ -#!/bin/bash - -set -euo pipefail - -if [[ "$#" -lt 1 ]] -then - conda_env="" -else - conda_env="$1" -fi - -if [[ "${conda_env}" == --conda-env=* ]] -then - conda_env=$(echo "${conda_env}" | sed 's/^--conda-env=//g' -) - echo "Activating Conda environment ${conda_env}" - shift 1 - cmake_args="$@" - - # Workaround for file permission error - if [[ -n ${CI_BUILD_UID:-} ]] - then - gosu root chown -R "${CI_BUILD_UID}:${CI_BUILD_GID}" /opt/miniforge/envs - fi - - # Don't activate Conda env if it's already activated - if [[ -z ${CONDA_PREFIX:-} ]] - then - source activate ${conda_env} - fi - cmake_prefix_flag="-DCMAKE_PREFIX_PATH=$CONDA_PREFIX" -else - cmake_args="$@" - cmake_prefix_flag='' -fi - -rm -rf build -mkdir build -cd build -# Disable CMAKE_COMPILE_WARNING_AS_ERROR option temporarily until -# https://github.com/dmlc/xgboost/issues/10400 is fixed -set -x -cmake .. ${cmake_args} \ - -DGOOGLE_TEST=ON \ - -DUSE_DMLC_GTEST=ON \ - -DENABLE_ALL_WARNINGS=ON \ - -DCMAKE_COMPILE_WARNING_AS_ERROR=OFF \ - -GNinja \ - ${cmake_prefix_flag} \ - -DHIDE_CXX_SYMBOLS=ON \ - -DBUILD_DEPRECATED_CLI=ON -ninja clean -time ninja -v -cd .. -set +x From cf8bda423c0bf0d3a7826f76555cfb098594e0aa Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 17:11:58 -0800 Subject: [PATCH 08/27] Replace stash-artifacts.{sh,py} -> manage-artifacts.py Also: * Remove publish-artifacts.sh * Upload artifacts to /{commit_id}/ prefix --- .github/workflows/jvm_tests.yml | 54 +++++--- .github/workflows/main.yml | 63 ++++----- .github/workflows/windows.yml | 16 ++- ops/pipeline/build-cpu-arm64.sh | 9 ++ ops/pipeline/build-cuda.sh | 8 +- ops/pipeline/build-gpu-rpkg-impl.sh | 2 +- ops/pipeline/build-gpu-rpkg.sh | 9 ++ ops/pipeline/build-jvm-manylinux2014.sh | 11 ++ ops/pipeline/build-manylinux2014.sh | 9 ++ ops/pipeline/build-win64-gpu.ps1 | 6 +- ops/pipeline/manage-artifacts.py | 163 ++++++++++++++++++++++++ ops/pipeline/publish-artifact.sh | 23 ---- ops/pipeline/stash-artifacts.ps1 | 49 ------- ops/pipeline/stash-artifacts.py | 144 --------------------- ops/pipeline/stash-artifacts.sh | 36 ------ ops/pipeline/test-python-wheel-impl.sh | 2 +- 16 files changed, 283 insertions(+), 321 deletions(-) create mode 100644 ops/pipeline/manage-artifacts.py delete mode 100755 ops/pipeline/publish-artifact.sh delete mode 100644 ops/pipeline/stash-artifacts.ps1 delete mode 100644 ops/pipeline/stash-artifacts.py delete mode 100755 ops/pipeline/stash-artifacts.sh diff --git a/.github/workflows/jvm_tests.yml b/.github/workflows/jvm_tests.yml index 965ea49ccad7..b059c530b01a 100644 --- a/.github/workflows/jvm_tests.yml +++ b/.github/workflows/jvm_tests.yml @@ -40,12 +40,6 @@ jobs: - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-jvm-manylinux2014.sh ${{ matrix.arch }} - - name: Upload libxgboost4j.so - run: | - libname=lib/libxgboost4j_linux_${{ matrix.arch }}_${{ github.sha }}.so - mv -v lib/libxgboost4j.so ${libname} - bash ops/pipeline/publish-artifact.sh ${libname} \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/libxgboost4j/ build-jvm-gpu: name: Build libxgboost4j.so with CUDA @@ -64,7 +58,10 @@ jobs: - run: bash ops/pipeline/build-jvm-gpu.sh - name: Stash files run: | - bash ops/pipeline/stash-artifacts.sh stash build-jvm-gpu lib/libxgboost4j.so + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-jvm-gpu \ + lib/libxgboost4j.so build-jvm-mac: name: "Build libxgboost4j.dylib for ${{ matrix.description }}" @@ -75,11 +72,11 @@ jobs: include: - description: "MacOS (Apple Silicon)" script: ops/pipeline/build-jvm-macos-apple-silicon.sh - libname: libxgboost4j_m1_${{ github.sha }}.dylib + libname: libxgboost4j_m1.dylib runner: macos-14 - description: "MacOS (Intel)" script: ops/pipeline/build-jvm-macos-intel.sh - libname: libxgboost4j_intel_${{ github.sha }}.dylib + libname: libxgboost4j_intel.dylib runner: macos-13 steps: - uses: actions/checkout@v4 @@ -89,8 +86,10 @@ jobs: - name: Upload libxgboost4j.dylib run: | mv -v lib/libxgboost4j.dylib ${{ matrix.libname }} - bash ops/pipeline/publish-artifact.sh ${{ matrix.libname }} \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/libxgboost4j/ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${{ env.BRANCH_NAME }}/${{ github.sha }} --make-public \ + ${{ matrix.libname }} env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_IAM_S3_UPLOADER }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_IAM_S3_UPLOADER }} @@ -112,13 +111,18 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | - bash ops/pipeline/stash-artifacts.sh unstash build-jvm-gpu lib/libxgboost4j.so + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-jvm-gpu \ + --dest-dir lib \ + libxgboost4j.so - run: bash ops/pipeline/build-jvm-doc.sh - name: Upload JVM doc run: | - bash ops/pipeline/publish-artifact.sh \ - jvm-packages/${{ env.BRANCH_NAME }}.tar.bz2 \ - s3://xgboost-docs/ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-docs \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + jvm-packages/${{ env.BRANCH_NAME }}.tar.bz2 build-test-jvm-packages: name: Build and test JVM packages (Linux, Scala ${{ matrix.scala_version }}) @@ -144,8 +148,10 @@ jobs: SCALA_VERSION: ${{ matrix.scala_version }} - name: Stash files run: | - bash ops/pipeline/stash-artifacts.sh stash \ - build-test-jvm-packages lib/libxgboost4j.so + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-test-jvm-packages \ + lib/libxgboost4j.so if: matrix.scala_version == '2.13' build-test-jvm-packages-other-os: @@ -213,7 +219,11 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | - bash ops/pipeline/stash-artifacts.sh unstash build-jvm-gpu lib/libxgboost4j.so + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-jvm-gpu \ + --dest-dir lib \ + libxgboost4j.so - run: bash ops/pipeline/test-jvm-gpu.sh env: SCALA_VERSION: ${{ matrix.scala_version }} @@ -247,9 +257,11 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - name: Unstash files run: | - bash ops/pipeline/stash-artifacts.sh \ - unstash ${{ matrix.variant.artifact_from }} \ - lib/libxgboost4j.so + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${{ matrix.variant.artifact_from }} \ + --dest-dir lib \ + libxgboost4j.so ls -lh lib/libxgboost4j.so - name: Deploy JVM packages to S3 run: | diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index d259105ce877..fd1b94c7af4c 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -30,7 +30,11 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-cpu.sh - name: Stash CLI executable - run: bash ops/pipeline/stash-artifacts.sh stash build-cpu ./xgboost + run: | + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-cpu \ + ./xgboost build-cpu-arm64: name: Build CPU ARM64 + manylinux_2_28_aarch64 wheel @@ -49,12 +53,10 @@ jobs: - run: bash ops/pipeline/build-cpu-arm64.sh - name: Stash files run: | - bash ops/pipeline/stash-artifacts.sh stash build-cpu-arm64 \ - ./xgboost python-package/dist/*.whl - - name: Upload Python wheel - run: | - bash ops/pipeline/publish-artifact.sh python-package/dist/*.whl \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-cpu-arm64 \ + ./xgboost python-package/dist/*.whl build-cuda: name: Build CUDA + manylinux_2_28_x86_64 wheel @@ -74,15 +76,10 @@ jobs: bash ops/pipeline/build-cuda.sh xgb-ci.gpu_build_rockylinux8 disable-rmm - name: Stash files run: | - bash ops/pipeline/stash-artifacts.sh stash build-cuda \ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-cuda \ build/testxgboost ./xgboost python-package/dist/*.whl - - name: Upload Python wheel - run: | - for file in python-package/dist/*.whl python-package/dist/meta.json - do - bash ops/pipeline/publish-artifact.sh "${file}" \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/ - done build-cuda-with-rmm: name: Build CUDA with RMM @@ -102,12 +99,10 @@ jobs: bash ops/pipeline/build-cuda.sh xgb-ci.gpu_build_rockylinux8 enable-rmm - name: Stash files run: | - bash ops/pipeline/stash-artifacts.sh \ - stash build-cuda-with-rmm build/testxgboost - - name: Upload Python wheel - run: | - bash ops/pipeline/publish-artifact.sh python-package/dist/*.whl \ - s3://xgboost-nightly-builds/experimental_build_with_rmm/ + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-cuda-with-rmm \ + build/testxgboost build-cuda-with-rmm-dev: name: Build CUDA with RMM (dev) @@ -151,13 +146,6 @@ jobs: - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-manylinux2014.sh ${{ matrix.arch }} - - name: Upload Python wheel - run: | - for wheel in python-package/dist/*.whl - do - bash ops/pipeline/publish-artifact.sh "${wheel}" \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/ - done build-gpu-rpkg: name: Build GPU-enabled R package @@ -174,10 +162,6 @@ jobs: - name: Log into Docker registry (AWS ECR) run: bash ops/pipeline/login-docker-registry.sh - run: bash ops/pipeline/build-gpu-rpkg.sh - - name: Upload R tarball - run: | - bash ops/pipeline/publish-artifact.sh xgboost_r_gpu_linux_*.tar.gz \ - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/ test-cpp-gpu: @@ -213,8 +197,11 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - name: Unstash gtest run: | - bash ops/pipeline/stash-artifacts.sh unstash ${{ matrix.artifact_from }} \ - build/testxgboost + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${{ matrix.artifact_from }} \ + --dest-dir build \ + testxgboost chmod +x build/testxgboost - run: bash ops/pipeline/test-cpp-gpu.sh ${{ matrix.suite }} @@ -260,8 +247,12 @@ jobs: run: bash ops/pipeline/login-docker-registry.sh - name: Unstash Python wheel run: | - bash ops/pipeline/stash-artifacts.sh unstash ${{ matrix.artifact_from }} \ - python-package/dist/*.whl ./xgboost + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/${{ matrix.artifact_from }} \ + --dest-dir wheelhouse \ + *.whl xgboost + mv -v wheelhouse/xgboost . chmod +x ./xgboost - name: Run Python tests, ${{ matrix.description }} run: bash ops/pipeline/test-python-wheel.sh ${{ matrix.suite }} ${{ matrix.container }} diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml index f97daf761abf..41f3d5be53f7 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/windows.yml @@ -30,9 +30,12 @@ jobs: submodules: "true" - run: powershell ops/pipeline/build-win64-gpu.ps1 - name: Stash files + shell: powershell run: | - powershell ops/pipeline/stash-artifacts.ps1 stash build-win64-gpu ` - build/testxgboost.exe xgboost.exe ` + conda activate + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-win64-gpu \ (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) test-win64-gpu: @@ -47,7 +50,12 @@ jobs: with: submodules: "true" - name: Unstash files + shell: powershell run: | - powershell ops/pipeline/stash-artifacts.ps1 unstash build-win64-gpu ` - build/testxgboost.exe xgboost.exe python-package/dist/*.whl + conda activate + python3 ops/pipeline/manage-artifacts.py download \ + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ + --prefix cache/${{ github.run_id }}/build-win64-gpu \ + --dest-dir python-package/dist \ + *.whl - run: powershell ops/pipeline/test-win64-gpu.ps1 diff --git a/ops/pipeline/build-cpu-arm64.sh b/ops/pipeline/build-cpu-arm64.sh index 2e0f0ea9ef4d..1c23d4dfe348 100755 --- a/ops/pipeline/build-cpu-arm64.sh +++ b/ops/pipeline/build-cpu-arm64.sh @@ -39,3 +39,12 @@ if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then echo "error: libgomp.so was not vendored in the wheel" exit -1 fi + +echo "--- Upload Python wheel" +if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +then + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + python-package/dist/*.whl +fi diff --git a/ops/pipeline/build-cuda.sh b/ops/pipeline/build-cuda.sh index 1965c50563ed..172fa9f85f16 100755 --- a/ops/pipeline/build-cuda.sh +++ b/ops/pipeline/build-cuda.sh @@ -93,9 +93,9 @@ then echo "--- Upload Python wheel" if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] then - aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ - --acl public-read --no-progress - aws s3 cp python-package/dist/meta.json s3://xgboost-nightly-builds/${BRANCH_NAME}/ \ - --acl public-read --no-progress + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + python-package/dist/*.whl python-package/dist/meta.json fi fi diff --git a/ops/pipeline/build-gpu-rpkg-impl.sh b/ops/pipeline/build-gpu-rpkg-impl.sh index 2815b8f448f1..2b803b926271 100755 --- a/ops/pipeline/build-gpu-rpkg-impl.sh +++ b/ops/pipeline/build-gpu-rpkg-impl.sh @@ -33,4 +33,4 @@ cp -v lib/xgboost.so xgboost_rpack/src/ echo 'all:' > xgboost_rpack/src/Makefile echo 'all:' > xgboost_rpack/src/Makefile.win mv xgboost_rpack/ xgboost/ -tar cvzf xgboost_r_gpu_linux_${commit_hash}.tar.gz xgboost/ +tar cvzf xgboost_r_gpu_linux.tar.gz xgboost/ diff --git a/ops/pipeline/build-gpu-rpkg.sh b/ops/pipeline/build-gpu-rpkg.sh index a96a2a4a0247..07a08ff15385 100755 --- a/ops/pipeline/build-gpu-rpkg.sh +++ b/ops/pipeline/build-gpu-rpkg.sh @@ -8,6 +8,7 @@ then exit 1 fi +source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.gpu_build_r_rockylinux8:main @@ -18,3 +19,11 @@ python3 ops/docker_run.py \ --container-tag ${CONTAINER_TAG} \ -- ops/pipeline/build-gpu-rpkg-impl.sh \ ${GITHUB_SHA} + +if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +then + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + xgboost_r_gpu_linux.tar.gz +fi diff --git a/ops/pipeline/build-jvm-manylinux2014.sh b/ops/pipeline/build-jvm-manylinux2014.sh index 4eaae23bf7bc..068fb5fb0c44 100755 --- a/ops/pipeline/build-jvm-manylinux2014.sh +++ b/ops/pipeline/build-jvm-manylinux2014.sh @@ -12,6 +12,7 @@ fi arch=$1 container_id="xgb-ci.manylinux2014_${arch}" +source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" @@ -26,3 +27,13 @@ python3 ops/docker_run.py \ "cd build && cmake .. -DJVM_BINDINGS=ON -DUSE_OPENMP=ON && make -j$(nproc)" ldd lib/libxgboost4j.so objdump -T lib/libxgboost4j.so | grep GLIBC_ | sed 's/.*GLIBC_\([.0-9]*\).*/\1/g' | sort -Vu + +if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +then + libname=lib/libxgboost4j_linux_${arch}.so + mv -v lib/libxgboost4j.so ${libname} + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + ${libname} +fi diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index b572fed0186a..ae2b7598bf8b 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -16,6 +16,7 @@ fi arch="$1" +source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh WHEEL_TAG="manylinux2014_${arch}" @@ -65,3 +66,11 @@ python3 ops/script/rename_whl.py \ --platform-tag ${WHEEL_TAG} rm -v python-package/dist/xgboost_cpu-*.whl mv -v wheelhouse/xgboost_cpu-*.whl python-package/dist/ + +if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] +then + python3 ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \ + python-package/dist/*.whl +fi diff --git a/ops/pipeline/build-win64-gpu.ps1 b/ops/pipeline/build-win64-gpu.ps1 index 76cc955059b8..26c9c0cfcbd1 100644 --- a/ops/pipeline/build-win64-gpu.ps1 +++ b/ops/pipeline/build-win64-gpu.ps1 @@ -40,7 +40,9 @@ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } Write-Host "--- Upload Python wheel" cd .. if ( $is_release_branch -eq 1 ) { - aws s3 cp (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) ` - s3://xgboost-nightly-builds/$Env:BRANCH_NAME/ --acl public-read --no-progress + python ops/pipeline/manage-artifacts.py upload ` + --s3-bucket 'xgboost-nightly-builds' ` + --prefix "$Env:BRANCH_NAME/$Env:GITHUB_SHA" --make-public ` + (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) if ($LASTEXITCODE -ne 0) { throw "Last command failed" } } diff --git a/ops/pipeline/manage-artifacts.py b/ops/pipeline/manage-artifacts.py new file mode 100644 index 000000000000..e847fd8c8824 --- /dev/null +++ b/ops/pipeline/manage-artifacts.py @@ -0,0 +1,163 @@ +""" +Upload an artifact to an S3 bucket for later use +Note. This script takes in all inputs via environment variables + except the path to the artifact(s). +""" + +import argparse +import os +import subprocess +import sys +from pathlib import Path +from urllib.parse import SplitResult, urlsplit, urlunsplit + + +def resolve(x: Path) -> Path: + return x.expanduser().resolve() + + +def path_equals(a: Path, b: Path) -> bool: + return resolve(a) == resolve(b) + + +def compute_s3_url(*, s3_bucket: str, prefix: str, artifact: str) -> str: + if prefix == "": + return f"s3://{s3_bucket}/{artifact}" + return f"s3://{s3_bucket}/{prefix}/{artifact}" + + +def aws_s3_upload(*, src: Path, dest: str, make_public=bool) -> None: + cli_args = ["aws", "s3", "cp", "--no-progress", str(src), dest] + if make_public: + cli_args.extend(["--acl", "public-read"]) + print(" ".join(cli_args)) + subprocess.run( + cli_args, + check=True, + encoding="utf-8", + ) + + +def aws_s3_download(*, src: str, dest_dir: Path) -> None: + cli_args = ["aws", "s3", "cp", "--no-progress", src, str(dest_dir)] + print(" ".join(cli_args)) + subprocess.run( + cli_args, + check=True, + encoding="utf-8", + ) + + +def aws_s3_download_with_wildcard(*, src: str, dest_dir: Path) -> None: + parsed_src = urlsplit(src) + src_dir = urlunsplit( + SplitResult( + scheme="s3", + netloc=parsed_src.netloc, + path=os.path.dirname(parsed_src.path), + query="", + fragment="", + ) + ) + src_glob = os.path.basename(parsed_src.path) + cli_args = [ + "aws", + "s3", + "cp", + "--recursive", + "--no-progress", + "--exclude", + "'*'", + "--include", + src_glob, + src_dir, + str(dest_dir), + ] + print(" ".join(cli_args)) + subprocess.run( + cli_args, + check=True, + encoding="utf-8", + ) + + +def upload(*, args: argparse.Namespace) -> None: + print(f"Uploading artifacts to prefix {args.prefix}...") + for artifact in args.artifacts: + artifact_path = Path(artifact) + s3_url = compute_s3_url( + s3_bucket=args.s3_bucket, prefix=args.prefix, artifact=artifact_path.name + ) + aws_s3_upload(src=artifact_path, dest=s3_url, make_public=args.make_public) + + +def download(*, args: argparse.Namespace) -> None: + print(f"Downloading artifacts from prefix {args.prefix}...") + dest_dir = Path(args.dest_dir) + print(f"mkdir -p {str(dest_dir)}") + dest_dir.mkdir(parents=True, exist_ok=True) + for artifact in args.artifacts: + s3_url = compute_s3_url( + s3_bucket=args.s3_bucket, prefix=args.prefix, artifact=artifact + ) + if "*" in artifact: + aws_s3_download_with_wildcard(src=s3_url, dest_dir=dest_dir) + else: + aws_s3_download(src=s3_url, dest_dir=dest_dir) + + +if __name__ == "__main__": + # Ensure that the current working directory is the project root + if not (Path.cwd() / "ops").is_dir() or not path_equals( + Path(__file__).parent.parent, Path.cwd() / "ops" + ): + x = Path(__file__).name + raise RuntimeError(f"Script {x} must be run at the project's root directory") + + root_parser = argparse.ArgumentParser() + subparser_factory = root_parser.add_subparsers(required=True, dest="command") + parsers = {} + for command in ["upload", "download"]: + parsers[command] = subparser_factory.add_parser(command) + parsers[command].add_argument( + "--s3-bucket", + type=str, + required=True, + help="Name of the S3 bucket to store the artifact", + ) + parsers[command].add_argument( + "--prefix", + type=str, + required=True, + help=( + "Where the artifact(s) would be stored. The artifact(s) will be stored at " + "s3://[s3-bucket]/[prefix]/[filename]." + ), + ) + parsers[command].add_argument( + "artifacts", + type=str, + nargs="+", + metavar="artifact", + help=f"Artifact(s) to {command}", + ) + + parsers["upload"].add_argument( + "--make-public", action="store_true", help="Make artifact publicly accessible" + ) + parsers["download"].add_argument( + "--dest-dir", type=str, required=True, help="Where to download artifact(s)" + ) + + if len(sys.argv) == 1: + print("1. Upload artifact(s)") + parsers["upload"].print_help() + print("\n2. Download artifact(s)") + parsers["download"].print_help() + sys.exit(1) + + parsed_args = root_parser.parse_args() + if parsed_args.command == "upload": + upload(args=parsed_args) + elif parsed_args.command == "download": + download(args=parsed_args) diff --git a/ops/pipeline/publish-artifact.sh b/ops/pipeline/publish-artifact.sh deleted file mode 100755 index adcb3c521d2a..000000000000 --- a/ops/pipeline/publish-artifact.sh +++ /dev/null @@ -1,23 +0,0 @@ -#!/bin/bash - -## Publish artifacts in an S3 bucket -## Meant to be used inside GitHub Actions - -set -euo pipefail - -source ops/pipeline/enforce-ci.sh - -if [[ $# -ne 2 ]] -then - echo "Usage: $0 [artifact] [s3_url]" - exit 1 -fi - -artifact="$1" -s3_url="$2" - -if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] -then - echo "aws s3 cp ${artifact} ${s3_url} --acl public-read --no-progress" - aws s3 cp "${artifact}" "${s3_url}" --acl public-read --no-progress -fi diff --git a/ops/pipeline/stash-artifacts.ps1 b/ops/pipeline/stash-artifacts.ps1 deleted file mode 100644 index 9b9989bf376d..000000000000 --- a/ops/pipeline/stash-artifacts.ps1 +++ /dev/null @@ -1,49 +0,0 @@ -[CmdletBinding()] -Param( - [Parameter( - Mandatory=$true, - Position=0 - )][string]$command, - [Parameter( - Mandatory=$true, - Position=1 - )][string]$remote_prefix, - [Parameter( - Mandatory=$true, - Position=2, - ValueFromRemainingArguments=$true - )][string[]]$artifacts -) - -## Convenience wrapper for ops/pipeline/stash-artifacts.py -## Meant to be used inside GitHub Actions - -$ErrorActionPreference = "Stop" - -. ops/pipeline/enforce-ci.ps1 - -foreach ($env in "GITHUB_REPOSITORY", "GITHUB_RUN_ID", "RUNS_ON_S3_BUCKET_CACHE") { - $val = [Environment]::GetEnvironmentVariable($env) - if ($val -eq $null) { - Write-Host "Error: $env must be set." - exit 1 - } -} - -$artifact_stash_prefix = "cache/${Env:GITHUB_REPOSITORY}/stash/${Env:GITHUB_RUN_ID}" - -conda activate - -Write-Host @" -python ops/pipeline/stash-artifacts.py ` - --command "${command}" ` - --s3-bucket "${Env:RUNS_ON_S3_BUCKET_CACHE}" ` - --prefix "${artifact_stash_prefix}/${remote_prefix}" ` - -- $artifacts -"@ -python ops/pipeline/stash-artifacts.py ` - --command "${command}" ` - --s3-bucket "${Env:RUNS_ON_S3_BUCKET_CACHE}" ` - --prefix "${artifact_stash_prefix}/${remote_prefix}" ` - -- $artifacts -if ($LASTEXITCODE -ne 0) { throw "Last command failed" } diff --git a/ops/pipeline/stash-artifacts.py b/ops/pipeline/stash-artifacts.py deleted file mode 100644 index 151e187513da..000000000000 --- a/ops/pipeline/stash-artifacts.py +++ /dev/null @@ -1,144 +0,0 @@ -""" -Stash an artifact in an S3 bucket for later use - -Note. This script takes in all inputs via environment variables - except the path to the artifact(s). -""" - -import argparse -import os -import subprocess -from pathlib import Path -from urllib.parse import SplitResult, urlsplit, urlunsplit - - -def resolve(x: Path) -> Path: - return x.expanduser().resolve() - - -def path_equals(a: Path, b: Path) -> bool: - return resolve(a) == resolve(b) - - -def compute_s3_url(s3_bucket: str, prefix: str, artifact: Path) -> str: - filename = artifact.name - relative_path = resolve(artifact).relative_to(Path.cwd()) - if resolve(artifact.parent) == resolve(Path.cwd()): - full_prefix = prefix - else: - full_prefix = f"{prefix}/{str(relative_path.parent)}" - return f"s3://{s3_bucket}/{full_prefix}/{filename}" - - -def aws_s3_upload(src: Path, dest: str) -> None: - cli_args = ["aws", "s3", "cp", "--no-progress", str(src), dest] - print(" ".join(cli_args)) - subprocess.run( - cli_args, - check=True, - encoding="utf-8", - ) - - -def aws_s3_download(src: str, dest: Path) -> None: - cli_args = ["aws", "s3", "cp", "--no-progress", src, str(dest)] - print(" ".join(cli_args)) - subprocess.run( - cli_args, - check=True, - encoding="utf-8", - ) - - -def aws_s3_download_with_wildcard(src: str, dest: Path) -> None: - parsed_src = urlsplit(src) - src_dir = urlunsplit( - SplitResult( - scheme="s3", - netloc=parsed_src.netloc, - path=os.path.dirname(parsed_src.path), - query="", - fragment="", - ) - ) - dest_dir = dest.parent - src_glob = os.path.basename(parsed_src.path) - cli_args = [ - "aws", - "s3", - "cp", - "--recursive", - "--no-progress", - "--exclude", - "'*'", - "--include", - src_glob, - src_dir, - str(dest_dir), - ] - print(" ".join(cli_args)) - subprocess.run( - cli_args, - check=True, - encoding="utf-8", - ) - - -def upload(args: argparse.Namespace) -> None: - print(f"Stashing artifacts to prefix {args.prefix}...") - for artifact in args.artifacts: - artifact_path = Path(artifact) - s3_url = compute_s3_url(args.s3_bucket, args.prefix, artifact_path) - aws_s3_upload(artifact_path, s3_url) - - -def download(args: argparse.Namespace) -> None: - print(f"Unstashing artifacts from prefix {args.prefix}...") - for artifact in args.artifacts: - artifact_path = Path(artifact) - print(f"mkdir -p {str(artifact_path.parent)}") - artifact_path.parent.mkdir(parents=True, exist_ok=True) - s3_url = compute_s3_url(args.s3_bucket, args.prefix, artifact_path) - if "*" in artifact: - aws_s3_download_with_wildcard(s3_url, artifact_path) - else: - aws_s3_download(s3_url, artifact_path) - - -if __name__ == "__main__": - # Ensure that the current working directory is the project root - if not (Path.cwd() / "ops").is_dir() or not path_equals( - Path(__file__).parent.parent, Path.cwd() / "ops" - ): - x = Path(__file__).name - raise RuntimeError(f"Script {x} must be run at the project's root directory") - - parser = argparse.ArgumentParser() - parser.add_argument( - "--command", - type=str, - choices=["stash", "unstash"], - required=True, - help="Whether to stash or unstash the artifact", - ) - parser.add_argument( - "--s3-bucket", - type=str, - required=True, - help="Name of the S3 bucket to store the artifact", - ) - parser.add_argument( - "--prefix", - type=str, - required=True, - help=( - "Where the artifact would be stored. The artifact will be stored in " - "s3://[s3-bucket]/[prefix]." - ), - ) - parser.add_argument("artifacts", type=str, nargs="+", metavar="artifact") - parsed_args = parser.parse_args() - if parsed_args.command == "stash": - upload(parsed_args) - elif parsed_args.command == "unstash": - download(parsed_args) diff --git a/ops/pipeline/stash-artifacts.sh b/ops/pipeline/stash-artifacts.sh deleted file mode 100755 index 98c9695c4227..000000000000 --- a/ops/pipeline/stash-artifacts.sh +++ /dev/null @@ -1,36 +0,0 @@ -#!/bin/bash - -## Convenience wrapper for ops/pipeline/stash-artifacts.py -## Meant to be used inside GitHub Actions - -set -euo pipefail - -source ops/pipeline/enforce-ci.sh - -if [[ "$#" -lt 3 ]] -then - echo "Usage: $0 {stash,unstash} [remote_prefix] [artifact] [artifact ...]" - exit 1 -fi - -command="$1" -remote_prefix="$2" -shift 2 - -for arg in "GITHUB_REPOSITORY" "GITHUB_RUN_ID" "RUNS_ON_S3_BUCKET_CACHE" -do - if [[ -z "${!arg:-}" ]] - then - echo "Error: $arg must be set." - exit 2 - fi -done - -artifact_stash_prefix="cache/${GITHUB_REPOSITORY}/stash/${GITHUB_RUN_ID}" - -set -x -python3 ops/pipeline/stash-artifacts.py \ - --command "${command}" \ - --s3-bucket "${RUNS_ON_S3_BUCKET_CACHE}" \ - --prefix "${artifact_stash_prefix}/${remote_prefix}" \ - -- "$@" diff --git a/ops/pipeline/test-python-wheel-impl.sh b/ops/pipeline/test-python-wheel-impl.sh index 75bfa5fbaffb..837ff03b24d7 100755 --- a/ops/pipeline/test-python-wheel-impl.sh +++ b/ops/pipeline/test-python-wheel-impl.sh @@ -34,7 +34,7 @@ export PYSPARK_DRIVER_PYTHON=$(which python) export PYSPARK_PYTHON=$(which python) export SPARK_TESTING=1 -pip install -v ./python-package/dist/*.whl +pip install -v ./wheelhouse/*.whl case "$suite" in gpu) From 80b212cbc2c2f23a4bf6f9a8730036846a6da6ce Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 17:57:14 -0800 Subject: [PATCH 09/27] Remove rename_whl.py --- ops/pipeline/build-cpu-arm64.sh | 16 +++--- ops/pipeline/build-cuda.sh | 15 +++--- ops/pipeline/build-manylinux2014.sh | 18 +++---- ops/pipeline/build-python-wheels-macos.sh | 10 ++-- ops/pipeline/build-win64-gpu.ps1 | 7 ++- ops/script/format_wheel_meta.py | 1 + ops/script/rename_whl.py | 62 ----------------------- 7 files changed, 29 insertions(+), 100 deletions(-) delete mode 100644 ops/script/rename_whl.py diff --git a/ops/pipeline/build-cpu-arm64.sh b/ops/pipeline/build-cpu-arm64.sh index 1c23d4dfe348..248119445e17 100755 --- a/ops/pipeline/build-cpu-arm64.sh +++ b/ops/pipeline/build-cpu-arm64.sh @@ -20,19 +20,14 @@ set -x python3 ops/docker_run.py \ --container-tag ${BUILD_CONTAINER_TAG} \ -- ops/pipeline/build-cpu-arm64-impl.sh -python3 ops/script/rename_whl.py \ - --wheel-path python-package/dist/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ --container-tag ${BUILD_CONTAINER_TAG} \ - -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/*.whl -python3 ops/script/rename_whl.py \ - --wheel-path wheelhouse/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} + -- auditwheel repair --only-plat \ + --plat ${WHEEL_TAG} python-package/dist/*.whl +python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ + wheelhouse/*.whl mv -v wheelhouse/*.whl python-package/dist/ if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then @@ -40,6 +35,9 @@ if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then exit -1 fi +# Check size of wheel +pydistcheck --config python-package/pyproject.toml python-package/dist/*.whl + echo "--- Upload Python wheel" if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]] then diff --git a/ops/pipeline/build-cuda.sh b/ops/pipeline/build-cuda.sh index 172fa9f85f16..5e2f2401f1eb 100755 --- a/ops/pipeline/build-cuda.sh +++ b/ops/pipeline/build-cuda.sh @@ -60,26 +60,23 @@ python3 ops/docker_run.py \ --container-tag ${BUILD_CONTAINER_TAG} \ --run-args='-e BUILD_ONLY_SM75 -e USE_RMM' \ -- ops/pipeline/build-cuda-impl.sh -python3 ops/script/rename_whl.py \ - --wheel-path python-package/dist/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ --container-tag ${MANYLINUX_CONTAINER_TAG} \ - -- auditwheel repair \ + -- auditwheel repair --only-plat \ --plat ${WHEEL_TAG} python-package/dist/*.whl -python3 ops/script/rename_whl.py \ - --wheel-path wheelhouse/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} +python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ + wheelhouse/*.whl mv -v wheelhouse/*.whl python-package/dist/ if ! unzip -l ./python-package/dist/*.whl | grep libgomp > /dev/null; then echo "error: libgomp.so was not vendored in the wheel" exit -1 fi +# Check size of wheel +pydistcheck --config python-package/pyproject.toml python-package/dist/*.whl + if [[ $USE_RMM == 0 ]] then # Generate the meta info which includes xgboost version and the commit info diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index ae2b7598bf8b..fbc349568e72 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -38,11 +38,10 @@ git checkout python-package/pyproject.toml python-package/xgboost/core.py python3 ops/docker_run.py \ --container-tag "${CONTAINER_TAG}" \ - -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/*.whl -python3 ops/script/rename_whl.py \ - --wheel-path wheelhouse/*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} + -- auditwheel repair --only-plat \ + --plat ${WHEEL_TAG} python-package/dist/*.whl +python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ + wheelhouse/*.whl rm -rf python-package/dist/ mkdir python-package/dist/ mv -v wheelhouse/*.whl python-package/dist/ @@ -59,11 +58,10 @@ git checkout python-package/pyproject.toml # discard the patch python3 ops/docker_run.py \ --container-tag "${CONTAINER_TAG}" \ - -- auditwheel repair --plat ${WHEEL_TAG} python-package/dist/xgboost_cpu-*.whl -python3 ops/script/rename_whl.py \ - --wheel-path wheelhouse/xgboost_cpu-*.whl \ - --commit-hash ${GITHUB_SHA} \ - --platform-tag ${WHEEL_TAG} + -- auditwheel repair --only-plat \ + --plat ${WHEEL_TAG} python-package/dist/xgboost_cpu-*.whl +python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ + wheelhouse/xgboost_cpu-*.whl rm -v python-package/dist/xgboost_cpu-*.whl mv -v wheelhouse/xgboost_cpu-*.whl python-package/dist/ diff --git a/ops/pipeline/build-python-wheels-macos.sh b/ops/pipeline/build-python-wheels-macos.sh index 697514c0c3ad..ca452a613a64 100755 --- a/ops/pipeline/build-python-wheels-macos.sh +++ b/ops/pipeline/build-python-wheels-macos.sh @@ -13,13 +13,13 @@ commit_id=$2 if [[ "$platform_id" == macosx_* ]]; then if [[ "$platform_id" == macosx_arm64 ]]; then # MacOS, Apple Silicon - wheel_tag=macosx_12_0_arm64 + WHEEL_TAG=macosx_12_0_arm64 cpython_ver=310 cibw_archs=arm64 export MACOSX_DEPLOYMENT_TARGET=12.0 elif [[ "$platform_id" == macosx_x86_64 ]]; then # MacOS, Intel - wheel_tag=macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64 + WHEEL_TAG=macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64 cpython_ver=310 cibw_archs=x86_64 export MACOSX_DEPLOYMENT_TARGET=10.15 @@ -42,10 +42,8 @@ export CIBW_REPAIR_WHEEL_COMMAND_MACOS="delocate-wheel --require-archs {delocate python -m pip install cibuildwheel python -m cibuildwheel python-package --output-dir wheelhouse -python ops/script/rename_whl.py \ - --wheel-path wheelhouse/*.whl \ - --commit-hash ${commit_id} \ - --platform-tag ${wheel_tag} +python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ + wheelhouse/*.whl # List dependencies of libxgboost.dylib mkdir tmp diff --git a/ops/pipeline/build-win64-gpu.ps1 b/ops/pipeline/build-win64-gpu.ps1 index 26c9c0cfcbd1..7e32e28036bc 100644 --- a/ops/pipeline/build-win64-gpu.ps1 +++ b/ops/pipeline/build-win64-gpu.ps1 @@ -31,10 +31,9 @@ pip install --user -v "pip>=23" pip --version pip wheel --no-deps -v . --wheel-dir dist/ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } -python ../ops/script/rename_whl.py ` - --wheel-path (Get-ChildItem dist/*.whl | Select-Object -Expand FullName) ` - --commit-hash $Env:GITHUB_SHA ` - --platform-tag win_amd64 +python -m wheel tags --python-tag py3 --abi-tag none ` + --platform win_amd64 --remove \ + (Get-ChildItem dist/*.whl | Select-Object -Expand FullName) if ($LASTEXITCODE -ne 0) { throw "Last command failed" } Write-Host "--- Upload Python wheel" diff --git a/ops/script/format_wheel_meta.py b/ops/script/format_wheel_meta.py index a7def879905e..8b37e81bc9a7 100644 --- a/ops/script/format_wheel_meta.py +++ b/ops/script/format_wheel_meta.py @@ -27,6 +27,7 @@ def main(args: argparse.Namespace) -> None: version = tokens[1].split("+")[0] meta_info = { + "wheel_path": f"{args.commit_hash}/{wheel_name}", "wheel_name": wheel_name, "platform_tag": args.platform_tag, "version": version, diff --git a/ops/script/rename_whl.py b/ops/script/rename_whl.py deleted file mode 100644 index d4467720c738..000000000000 --- a/ops/script/rename_whl.py +++ /dev/null @@ -1,62 +0,0 @@ -import argparse -import pathlib - - -def main(args: argparse.Namespace) -> None: - wheel_path = pathlib.Path(args.wheel_path).expanduser().resolve() - if not wheel_path.exists(): - raise ValueError(f"Wheel cannot be found at path {wheel_path}") - if not wheel_path.is_file(): - raise ValueError(f"Path {wheel_path} is not a valid file") - wheel_dir, wheel_name = wheel_path.parent, wheel_path.name - - tokens = wheel_name.split("-") - assert len(tokens) == 5 - version = tokens[1].split("+")[0] - keywords = { - "pkg_name": tokens[0], - "version": version, - "commit_id": args.commit_hash, - "platform_tag": args.platform_tag, - } - new_wheel_name = ( - "{pkg_name}-{version}+{commit_id}-py3-none-{platform_tag}.whl".format( - **keywords - ) - ) - new_wheel_path = wheel_dir / new_wheel_name - print(f"Renaming {wheel_name} to {new_wheel_name}...") - if new_wheel_name == wheel_name: - print("Skipping, as the old name is identical to the new name.") - else: - if new_wheel_path.is_file(): - new_wheel_path.unlink() - wheel_path.rename(new_wheel_path) - - filesize = new_wheel_path.stat().st_size / 1024 / 1024 # MiB - print(f"Wheel size: {filesize:.2f} MiB") - - if filesize > 300: - raise RuntimeError( - f"Limit of wheel size set by PyPI is exceeded. {new_wheel_name}: {filesize:.2f} MiB" - ) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser( - description="Format a Python wheel's name using the git commit hash and platform tag" - ) - parser.add_argument( - "--wheel-path", type=str, required=True, help="Path to the wheel" - ) - parser.add_argument( - "--commit-hash", type=str, required=True, help="Git commit hash" - ) - parser.add_argument( - "--platform-tag", - type=str, - required=True, - help="Platform tag (e.g. manylinux_2_28_x86_64)", - ) - parsed_args = parser.parse_args() - main(parsed_args) From 1ca27bbc3f8029afe0692da8be09855144236c49 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 19:12:48 -0800 Subject: [PATCH 10/27] Remove remaining uses of awscli --- .github/workflows/jvm_tests.yml | 10 +++++----- .github/workflows/python_wheels_macos.yml | 6 ++++-- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/.github/workflows/jvm_tests.yml b/.github/workflows/jvm_tests.yml index b059c530b01a..50f8a712f729 100644 --- a/.github/workflows/jvm_tests.yml +++ b/.github/workflows/jvm_tests.yml @@ -84,6 +84,7 @@ jobs: submodules: "true" - run: bash ${{ matrix.script }} - name: Upload libxgboost4j.dylib + if: github.ref == 'refs/heads/master' || contains(github.ref, 'refs/heads/release_') run: | mv -v lib/libxgboost4j.dylib ${{ matrix.libname }} python3 ops/pipeline/manage-artifacts.py upload \ @@ -186,11 +187,10 @@ jobs: mvn test -B -pl :xgboost4j_2.12 - name: Publish artifact xgboost4j.dll to S3 run: | - cd lib/ - Rename-Item -Path xgboost4j.dll -NewName xgboost4j_${{ github.sha }}.dll - python -m awscli s3 cp xgboost4j_${{ github.sha }}.dll ` - s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/libxgboost4j/ ` - --acl public-read --region us-west-2 + python ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${{ env.BRANCH_NAME }}/${{ github.sha }} --make-public \ + lib/xgboost4j.dll if: | (github.ref == 'refs/heads/master' || contains(github.ref, 'refs/heads/release_')) && matrix.os == 'windows-latest' diff --git a/.github/workflows/python_wheels_macos.yml b/.github/workflows/python_wheels_macos.yml index f58847c5f573..33eabbd09dca 100644 --- a/.github/workflows/python_wheels_macos.yml +++ b/.github/workflows/python_wheels_macos.yml @@ -46,8 +46,10 @@ jobs: - name: Upload Python wheel if: github.ref == 'refs/heads/master' || contains(github.ref, 'refs/heads/release_') run: | - python -m pip install awscli - python -m awscli s3 cp wheelhouse/*.whl s3://xgboost-nightly-builds/${{ env.BRANCH_NAME }}/ --acl public-read --region us-west-2 + python ops/pipeline/manage-artifacts.py upload \ + --s3-bucket xgboost-nightly-builds \ + --prefix ${{ env.BRANCH_NAME }}/${{ github.sha }} --make-public \ + wheelhouse/*.whl env: AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID_IAM_S3_UPLOADER }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY_IAM_S3_UPLOADER }} From 174fe9f42f5339bc70294f1ae079d4d15e747ed8 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 19:14:53 -0800 Subject: [PATCH 11/27] Typo --- ops/pipeline/manage-artifacts.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ops/pipeline/manage-artifacts.py b/ops/pipeline/manage-artifacts.py index e847fd8c8824..3f94678421b8 100644 --- a/ops/pipeline/manage-artifacts.py +++ b/ops/pipeline/manage-artifacts.py @@ -26,7 +26,7 @@ def compute_s3_url(*, s3_bucket: str, prefix: str, artifact: str) -> str: return f"s3://{s3_bucket}/{prefix}/{artifact}" -def aws_s3_upload(*, src: Path, dest: str, make_public=bool) -> None: +def aws_s3_upload(*, src: Path, dest: str, make_public: bool) -> None: cli_args = ["aws", "s3", "cp", "--no-progress", str(src), dest] if make_public: cli_args.extend(["--acl", "public-read"]) From 640fdc7fa3e89e467497d9f3e77c0328fb52090a Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 19:19:20 -0800 Subject: [PATCH 12/27] Fix --- ops/pipeline/build-cpu-arm64.sh | 4 ++-- ops/pipeline/build-win64-gpu.ps1 | 3 +-- python-package/pyproject.toml | 6 ++++++ 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/ops/pipeline/build-cpu-arm64.sh b/ops/pipeline/build-cpu-arm64.sh index 248119445e17..9801790baaaa 100755 --- a/ops/pipeline/build-cpu-arm64.sh +++ b/ops/pipeline/build-cpu-arm64.sh @@ -18,12 +18,12 @@ CONTAINER_TAG=${DOCKER_REGISTRY_URL}/xgb-ci.aarch64:main echo "--- Build CPU code targeting ARM64" set -x python3 ops/docker_run.py \ - --container-tag ${BUILD_CONTAINER_TAG} \ + --container-tag ${CONTAINER_TAG} \ -- ops/pipeline/build-cpu-arm64-impl.sh echo "--- Audit binary wheel to ensure it's compliant with ${WHEEL_TAG} standard" python3 ops/docker_run.py \ - --container-tag ${BUILD_CONTAINER_TAG} \ + --container-tag ${CONTAINER_TAG} \ -- auditwheel repair --only-plat \ --plat ${WHEEL_TAG} python-package/dist/*.whl python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ diff --git a/ops/pipeline/build-win64-gpu.ps1 b/ops/pipeline/build-win64-gpu.ps1 index 7e32e28036bc..2c3e35812315 100644 --- a/ops/pipeline/build-win64-gpu.ps1 +++ b/ops/pipeline/build-win64-gpu.ps1 @@ -27,10 +27,9 @@ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } Write-Host "--- Build binary wheel" cd ../python-package conda activate -pip install --user -v "pip>=23" -pip --version pip wheel --no-deps -v . --wheel-dir dist/ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } +ls -lh dist/ python -m wheel tags --python-tag py3 --abi-tag none ` --platform win_amd64 --remove \ (Get-ChildItem dist/*.whl | Select-Object -Expand FullName) diff --git a/python-package/pyproject.toml b/python-package/pyproject.toml index 565b61eb0669..cc5042997a6c 100644 --- a/python-package/pyproject.toml +++ b/python-package/pyproject.toml @@ -86,3 +86,9 @@ class-attribute-naming-style = "snake_case" # Allow single-letter variables variable-rgx = "[a-zA-Z_][a-z0-9_]{0,30}$" + +[tool.pydistcheck] +inspect = true +ignore = ["compiled-objects-have-debug-symbols"] +max_allowed_size_compressed = '300M' +max_allowed_size_uncompressed = '500M' From 4fccb291fc2eb65bae5864a20f475a5e5b844463 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 20:11:00 -0800 Subject: [PATCH 13/27] Install wheel on arm64 --- ops/pipeline/build-manylinux2014.sh | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index fbc349568e72..de9319a493e7 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -16,6 +16,11 @@ fi arch="$1" +if [[ "${arch:-}" == "aarch64" ]] +then + sudo pip3 install wheel +fi + source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh From a4a313d44a4e36d0f7284f2e9e6ed8d41c6239b5 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 20:28:34 -0800 Subject: [PATCH 14/27] Try python3 -m pip --- ops/pipeline/build-manylinux2014.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index de9319a493e7..48356cb4d1ac 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -18,7 +18,7 @@ arch="$1" if [[ "${arch:-}" == "aarch64" ]] then - sudo pip3 install wheel + sudo python3 -m pip install wheel fi source ops/pipeline/classify-git-branch.sh From eab03350166962da933b284b3f61ec57d2b64e9f Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 20:44:10 -0800 Subject: [PATCH 15/27] [MacOS] Trust cibuildwheel to produce correct tag --- ops/pipeline/build-python-wheels-macos.sh | 2 -- 1 file changed, 2 deletions(-) diff --git a/ops/pipeline/build-python-wheels-macos.sh b/ops/pipeline/build-python-wheels-macos.sh index ca452a613a64..ef1cdabaad56 100755 --- a/ops/pipeline/build-python-wheels-macos.sh +++ b/ops/pipeline/build-python-wheels-macos.sh @@ -42,8 +42,6 @@ export CIBW_REPAIR_WHEEL_COMMAND_MACOS="delocate-wheel --require-archs {delocate python -m pip install cibuildwheel python -m cibuildwheel python-package --output-dir wheelhouse -python3 -m wheel tags --python-tag py3 --abi-tag none --platform ${WHEEL_TAG} --remove \ - wheelhouse/*.whl # List dependencies of libxgboost.dylib mkdir tmp From f589943db1fa991c16f0be285148ef17c5ffc32e Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 20:46:36 -0800 Subject: [PATCH 16/27] Update patch --- ops/patch/cpu_only_pypkg.patch | 15 +++++++++------ ops/patch/manylinux2014_warning.patch | 6 +++--- ops/patch/remove_nccl_dep.patch | 4 ++-- 3 files changed, 14 insertions(+), 11 deletions(-) diff --git a/ops/patch/cpu_only_pypkg.patch b/ops/patch/cpu_only_pypkg.patch index 765ac5c098d0..66d669d161f8 100644 --- a/ops/patch/cpu_only_pypkg.patch +++ b/ops/patch/cpu_only_pypkg.patch @@ -34,10 +34,10 @@ index 1fc0bb5a0..f1c68470b 100644 +Note. ``xgboost-cpu`` does not provide an sdist (source distribution). You may install sdist +from https://pypi.org/project/xgboost/. diff --git python-package/pyproject.toml python-package/pyproject.toml -index 46c1451c2..c5dc908d9 100644 +index 32abff1c6..5206f2e31 100644 --- python-package/pyproject.toml +++ python-package/pyproject.toml -@@ -6,7 +6,7 @@ backend-path = ["."] +@@ -7,7 +7,7 @@ backend-path = ["."] build-backend = "packager.pep517" [project] @@ -46,10 +46,13 @@ index 46c1451c2..c5dc908d9 100644 description = "XGBoost Python Package" readme = { file = "README.rst", content-type = "text/x-rst" } authors = [ -@@ -82,3 +82,6 @@ class-attribute-naming-style = "snake_case" +@@ -71,6 +71,9 @@ disable = [ + dummy-variables-rgx = "(unused|)_.*" + reports = false - # Allow single-letter variables - variable-rgx = "[a-zA-Z_][a-z0-9_]{0,30}$" -+ +[tool.hatch.build.targets.wheel] +packages = ["xgboost/"] ++ + [tool.pylint.basic] + # Enforce naming convention + const-naming-style = "UPPER_CASE" diff --git a/ops/patch/manylinux2014_warning.patch b/ops/patch/manylinux2014_warning.patch index 679205988b7a..0302b5e10d6c 100644 --- a/ops/patch/manylinux2014_warning.patch +++ b/ops/patch/manylinux2014_warning.patch @@ -1,8 +1,8 @@ diff --git python-package/xgboost/core.py python-package/xgboost/core.py -index e8bc735e6..030972ef2 100644 +index 079246239..2f1764812 100644 --- python-package/xgboost/core.py +++ python-package/xgboost/core.py -@@ -262,6 +262,18 @@ Likely cause: +@@ -281,6 +281,18 @@ Likely cause: ) raise ValueError(msg) @@ -15,7 +15,7 @@ index e8bc735e6..030972ef2 100644 + "features such as GPU algorithms or federated learning are not available. " + "To use these features, please upgrade to a recent Linux distro with glibc " + "2.28+, and install the 'manylinux_2_28' variant.", -+ FutureWarning ++ FutureWarning, + ) + return lib diff --git a/ops/patch/remove_nccl_dep.patch b/ops/patch/remove_nccl_dep.patch index c5a8fe3acee1..80fd48cc1faf 100644 --- a/ops/patch/remove_nccl_dep.patch +++ b/ops/patch/remove_nccl_dep.patch @@ -1,8 +1,8 @@ diff --git python-package/pyproject.toml python-package/pyproject.toml -index 20d3f9974..953087ff4 100644 +index b9f08dda6..32abff1c6 100644 --- python-package/pyproject.toml +++ python-package/pyproject.toml -@@ -30,7 +30,6 @@ classifiers = [ +@@ -32,7 +32,6 @@ classifiers = [ dependencies = [ "numpy", "scipy", From 06666bdc926be6f7a25baa7bcc1e97feceef8bbf Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 21:54:59 -0800 Subject: [PATCH 17/27] No ls -lh on Windows --- ops/pipeline/build-win64-gpu.ps1 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ops/pipeline/build-win64-gpu.ps1 b/ops/pipeline/build-win64-gpu.ps1 index 2c3e35812315..7e94d0919e83 100644 --- a/ops/pipeline/build-win64-gpu.ps1 +++ b/ops/pipeline/build-win64-gpu.ps1 @@ -29,7 +29,7 @@ cd ../python-package conda activate pip wheel --no-deps -v . --wheel-dir dist/ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } -ls -lh dist/ +ls dist/ python -m wheel tags --python-tag py3 --abi-tag none ` --platform win_amd64 --remove \ (Get-ChildItem dist/*.whl | Select-Object -Expand FullName) From edfe1b8740150a565ee9f55e84e6c623d9ca579d Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Mon, 9 Dec 2024 23:38:00 -0800 Subject: [PATCH 18/27] Fix Windows --- ops/pipeline/build-win64-gpu.ps1 | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/ops/pipeline/build-win64-gpu.ps1 b/ops/pipeline/build-win64-gpu.ps1 index 7e94d0919e83..894304f2c4f9 100644 --- a/ops/pipeline/build-win64-gpu.ps1 +++ b/ops/pipeline/build-win64-gpu.ps1 @@ -29,9 +29,8 @@ cd ../python-package conda activate pip wheel --no-deps -v . --wheel-dir dist/ if ($LASTEXITCODE -ne 0) { throw "Last command failed" } -ls dist/ python -m wheel tags --python-tag py3 --abi-tag none ` - --platform win_amd64 --remove \ + --platform win_amd64 --remove ` (Get-ChildItem dist/*.whl | Select-Object -Expand FullName) if ($LASTEXITCODE -ne 0) { throw "Last command failed" } From ba78b42db39850aeedcead337c867621978b457f Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 00:49:05 -0800 Subject: [PATCH 19/27] Don't install wheel --- ops/pipeline/build-manylinux2014.sh | 5 ----- 1 file changed, 5 deletions(-) diff --git a/ops/pipeline/build-manylinux2014.sh b/ops/pipeline/build-manylinux2014.sh index 48356cb4d1ac..fbc349568e72 100755 --- a/ops/pipeline/build-manylinux2014.sh +++ b/ops/pipeline/build-manylinux2014.sh @@ -16,11 +16,6 @@ fi arch="$1" -if [[ "${arch:-}" == "aarch64" ]] -then - sudo python3 -m pip install wheel -fi - source ops/pipeline/classify-git-branch.sh source ops/pipeline/get-docker-registry-details.sh From 4b5ff84a81d8b5dc3397628646db036db261f4df Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 01:04:43 -0800 Subject: [PATCH 20/27] Don't use backslash on Windows --- .github/workflows/windows.yml | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml index 41f3d5be53f7..5858851bce26 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/windows.yml @@ -33,9 +33,9 @@ jobs: shell: powershell run: | conda activate - python3 ops/pipeline/manage-artifacts.py upload \ - --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ - --prefix cache/${{ github.run_id }}/build-win64-gpu \ + python3 ops/pipeline/manage-artifacts.py upload ` + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` + --prefix cache/${{ github.run_id }}/build-win64-gpu ` (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) test-win64-gpu: @@ -53,9 +53,9 @@ jobs: shell: powershell run: | conda activate - python3 ops/pipeline/manage-artifacts.py download \ - --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} \ - --prefix cache/${{ github.run_id }}/build-win64-gpu \ - --dest-dir python-package/dist \ + python3 ops/pipeline/manage-artifacts.py download ` + --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` + --prefix cache/${{ github.run_id }}/build-win64-gpu ` + --dest-dir python-package/dist ` *.whl - run: powershell ops/pipeline/test-win64-gpu.ps1 From a64e47e407944f9463f35219065021b1a857d9f3 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 01:19:32 -0800 Subject: [PATCH 21/27] Fix --- .github/workflows/windows.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml index 5858851bce26..46c1393b09fa 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/windows.yml @@ -33,7 +33,7 @@ jobs: shell: powershell run: | conda activate - python3 ops/pipeline/manage-artifacts.py upload ` + python ops/pipeline/manage-artifacts.py upload ` --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` --prefix cache/${{ github.run_id }}/build-win64-gpu ` (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) @@ -53,7 +53,7 @@ jobs: shell: powershell run: | conda activate - python3 ops/pipeline/manage-artifacts.py download ` + python ops/pipeline/manage-artifacts.py download ` --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` --prefix cache/${{ github.run_id }}/build-win64-gpu ` --dest-dir python-package/dist ` From 63d097b3bfc07fe4f8c1b74a5525152ffe057daa Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 02:01:27 -0800 Subject: [PATCH 22/27] Fix Windows --- .github/workflows/windows.yml | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml index 46c1393b09fa..c17279b13f0e 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/windows.yml @@ -36,6 +36,7 @@ jobs: python ops/pipeline/manage-artifacts.py upload ` --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` --prefix cache/${{ github.run_id }}/build-win64-gpu ` + build/testxgboost.exe xgboost.exe ` (Get-ChildItem python-package/dist/*.whl | Select-Object -Expand FullName) test-win64-gpu: @@ -56,6 +57,9 @@ jobs: python ops/pipeline/manage-artifacts.py download ` --s3-bucket ${{ env.RUNS_ON_S3_BUCKET_CACHE }} ` --prefix cache/${{ github.run_id }}/build-win64-gpu ` - --dest-dir python-package/dist ` - *.whl + --dest-dir build ` + *.whl testxgboost.exe xgboost.exe + Move-Item -Path build/xgboost.exe -Destination . + Move-Item -Path (Get-ChildItem build/*.whl | Select-Object -Expand FullName) ` + -Destination python-package/dist/ - run: powershell ops/pipeline/test-win64-gpu.ps1 From 47a764e141b47374d9c5ab8e798f938ec16ffa98 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 02:13:28 -0800 Subject: [PATCH 23/27] Cap scikit-learn<=1.5.2 --- ops/conda_env/macos_cpu_test.yml | 2 +- ops/conda_env/win64_test.yml | 2 +- ops/pipeline/test-python-wheel.sh | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/ops/conda_env/macos_cpu_test.yml b/ops/conda_env/macos_cpu_test.yml index 29ff99e3504f..8d2d2c9591c1 100644 --- a/ops/conda_env/macos_cpu_test.yml +++ b/ops/conda_env/macos_cpu_test.yml @@ -11,7 +11,7 @@ dependencies: - numpy - scipy - llvm-openmp -- scikit-learn>=1.4.1 +- scikit-learn>=1.4.1,<=1.5.2 - pandas - matplotlib - dask<=2024.10.0 diff --git a/ops/conda_env/win64_test.yml b/ops/conda_env/win64_test.yml index 32b9339e6fc0..2260c521f889 100644 --- a/ops/conda_env/win64_test.yml +++ b/ops/conda_env/win64_test.yml @@ -6,7 +6,7 @@ dependencies: - numpy - scipy - matplotlib -- scikit-learn +- scikit-learn<=1.5.2 - pandas - pytest - boto3 diff --git a/ops/pipeline/test-python-wheel.sh b/ops/pipeline/test-python-wheel.sh index 56d54fd65d02..84dbe1fb7c4d 100755 --- a/ops/pipeline/test-python-wheel.sh +++ b/ops/pipeline/test-python-wheel.sh @@ -20,7 +20,7 @@ else fi source ops/pipeline/get-docker-registry-details.sh -CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:PR-5" set -x python3 ops/docker_run.py --container-tag "${CONTAINER_TAG}" ${gpu_option} \ From 6e511c15fda6963da031fcc8d4bb2909d0c5f3d8 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 03:11:14 -0800 Subject: [PATCH 24/27] Fix Windows --- .github/workflows/windows.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/windows.yml b/.github/workflows/windows.yml index c17279b13f0e..53a1b5c0520b 100644 --- a/.github/workflows/windows.yml +++ b/.github/workflows/windows.yml @@ -60,6 +60,7 @@ jobs: --dest-dir build ` *.whl testxgboost.exe xgboost.exe Move-Item -Path build/xgboost.exe -Destination . + New-Item -ItemType Directory -Path python-package/dist/ -Force Move-Item -Path (Get-ChildItem build/*.whl | Select-Object -Expand FullName) ` -Destination python-package/dist/ - run: powershell ops/pipeline/test-win64-gpu.ps1 From 52751390d1f93db07661e6e4587aee8d8e61014b Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 03:41:54 -0800 Subject: [PATCH 25/27] Add missing step in doc --- doc/contrib/ci.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index d2636037b8a8..74c2baeec834 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -51,8 +51,10 @@ To make changes to the CI container, carry out the following steps: ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:PR-204``. 6. Now submit a pull request to `dmlc/xgboost `_. The CI will run tests using the new container. Verify that all tests pass. -7. Merge the pull request in ``dmlc/xgboost-devops``. -8. Merge the pull request in ``dmlc/xgboost``. +7. Merge the pull request in ``dmlc/xgboost-devops``. Wait the CI completes on the ``main`` branch. +8. Go back to the the pull request for ``dmlc/xgboost`` and change the container reference back + to ``:main``. +9. Merge the pull request in ``dmlc/xgboost``. .. _build_run_docker_locally: From d0cefb07e04623beac9ec99aa8036c657fc9f893 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 03:50:28 -0800 Subject: [PATCH 26/27] doc typo --- doc/contrib/ci.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index 74c2baeec834..c9c79231a2ec 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -51,8 +51,8 @@ To make changes to the CI container, carry out the following steps: ``492475357299.dkr.ecr.us-west-2.amazonaws.com/xgb-ci.gpu:PR-204``. 6. Now submit a pull request to `dmlc/xgboost `_. The CI will run tests using the new container. Verify that all tests pass. -7. Merge the pull request in ``dmlc/xgboost-devops``. Wait the CI completes on the ``main`` branch. -8. Go back to the the pull request for ``dmlc/xgboost`` and change the container reference back +7. Merge the pull request in ``dmlc/xgboost-devops``. Wait until the CI completes on the ``main`` branch. +8. Go back to the the pull request for ``dmlc/xgboost`` and change the container references back to ``:main``. 9. Merge the pull request in ``dmlc/xgboost``. From b016a0ccbed96158f9a5156b54dce49caa713899 Mon Sep 17 00:00:00 2001 From: Hyunsu Cho Date: Tue, 10 Dec 2024 11:28:19 -0800 Subject: [PATCH 27/27] Use latest container --- ops/pipeline/test-python-wheel.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ops/pipeline/test-python-wheel.sh b/ops/pipeline/test-python-wheel.sh index 84dbe1fb7c4d..56d54fd65d02 100755 --- a/ops/pipeline/test-python-wheel.sh +++ b/ops/pipeline/test-python-wheel.sh @@ -20,7 +20,7 @@ else fi source ops/pipeline/get-docker-registry-details.sh -CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:PR-5" +CONTAINER_TAG="${DOCKER_REGISTRY_URL}/${container_id}:main" set -x python3 ops/docker_run.py --container-tag "${CONTAINER_TAG}" ${gpu_option} \