From 34f036917d83a0805c46c32cf4035a5dffbd5c76 Mon Sep 17 00:00:00 2001 From: Leng Yue Date: Mon, 4 Mar 2024 03:10:38 -0800 Subject: [PATCH 01/10] Document `ddp_find_unused_parameters_true` in Fabric (#19564) --- docs/source-fabric/api/fabric_args.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/source-fabric/api/fabric_args.rst b/docs/source-fabric/api/fabric_args.rst index a126667259a63..4396aa26e908a 100644 --- a/docs/source-fabric/api/fabric_args.rst +++ b/docs/source-fabric/api/fabric_args.rst @@ -36,13 +36,16 @@ See also: :doc:`../fundamentals/accelerators` strategy ======== -Choose a training strategy: ``"dp"``, ``"ddp"``, ``"ddp_spawn"``, ``"xla"``, ``"deepspeed"``, ``"fsdp"````. +Choose a training strategy: ``"dp"``, ``"ddp"``, ``"ddp_spawn"``, ``"ddp_find_unused_parameters_true"``, ``"xla"``, ``"deepspeed"``, ``"fsdp"``. .. code-block:: python # Running with the DistributedDataParallel strategy on 4 GPUs fabric = Fabric(strategy="ddp", accelerator="gpu", devices=4) + # Running with the DDP strategy with find unused parameters enabled on 4 GPUs + fabric = Fabric(strategy="ddp_find_unused_parameters_true", accelerator="gpu", devices=4) + # Running with the DDP Spawn strategy using 4 CPU processes fabric = Fabric(strategy="ddp_spawn", accelerator="cpu", devices=4) From d9113b61cc120eafb835a97f357cf484db51ad02 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Mon, 4 Mar 2024 14:00:50 +0100 Subject: [PATCH 02/10] Add additional references in compile guides (#19550) --- docs/source-fabric/advanced/compile.rst | 19 ++++++++++++++++++- docs/source-pytorch/advanced/compile.rst | 23 ++++++++++++++++++++--- 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/docs/source-fabric/advanced/compile.rst b/docs/source-fabric/advanced/compile.rst index a36384ccc98fe..a8e1cc2db243c 100644 --- a/docs/source-fabric/advanced/compile.rst +++ b/docs/source-fabric/advanced/compile.rst @@ -3,7 +3,7 @@ Speed up models by compiling them ################################# Compiling your PyTorch model can result in significant speedups, especially on the latest generations of GPUs. -This guide shows you how to apply ``torch.compile`` correctly in your code. +This guide shows you how to apply `torch.compile `_ correctly in your code. .. note:: @@ -223,6 +223,9 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1. +If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch `_ for further investigation. + + ---- @@ -301,4 +304,18 @@ However, should you have issues compiling DDP and FSDP models, you can opt out o model = fabric.setup(model, _reapply_compile=False) +---- + + +******************** +Additional Resources +******************** + +Here are a few resources for further reading after you complete this tutorial: + +- `PyTorch 2.0 Paper `_ +- `GenAI with PyTorch 2.0 blog post series `_ +- `Training Production AI Models with PyTorch 2.0 `_ +- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach `_ + | diff --git a/docs/source-pytorch/advanced/compile.rst b/docs/source-pytorch/advanced/compile.rst index 6da769ee40279..73d5f4fbc2af4 100644 --- a/docs/source-pytorch/advanced/compile.rst +++ b/docs/source-pytorch/advanced/compile.rst @@ -3,7 +3,7 @@ Speed up models by compiling them ################################# Compiling your LightningModule can result in significant speedups, especially on the latest generations of GPUs. -This guide shows you how to apply ``torch.compile`` correctly in your code. +This guide shows you how to apply `torch.compile `_ correctly in your code. .. note:: @@ -192,6 +192,8 @@ However, when this is not possible, you can request PyTorch to compile the code A model compiled with ``dynamic=True`` will typically be slower than a model compiled with static shapes, but it will avoid the extreme cost of recompilation every iteration. On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically and you should no longer need to set this. +If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch `_ for further investigation. + ---- @@ -251,9 +253,9 @@ Always compare the speed and memory usage of the compiled model against the orig Limitations *********** -There are a few limitations you should be aware of when using ``torch.compile`` in conjunction with the Trainer: +There are a few limitations you should be aware of when using ``torch.compile`` **in conjunction with the Trainer**: -* ``torch.compile`` currently does not get reapplied over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment. +* The Trainer currently does not reapply ``torch.compile`` over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment. This limitation will be lifted in the future. * In some cases, using ``self.log()`` in your LightningModule will cause compilation errors. @@ -270,4 +272,19 @@ There are a few limitations you should be aware of when using ``torch.compile`` self.model = torch.compile(self.model) ... + +---- + + +******************** +Additional Resources +******************** + +Here are a few resources for further reading after you complete this tutorial: + +- `PyTorch 2.0 Paper `_ +- `GenAI with PyTorch 2.0 blog post series `_ +- `Training Production AI Models with PyTorch 2.0 `_ +- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach `_ + | From 13f15b38fc65e73dd707aab0db966e5c59bb11f9 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Mon, 4 Mar 2024 14:01:33 +0100 Subject: [PATCH 03/10] Support consolidating sharded checkpoints with the `fabric` CLI (#19560) --- .../checkpoint/distributed_checkpoint.rst | 4 +- src/lightning/fabric/CHANGELOG.md | 2 +- src/lightning/fabric/cli.py | 34 ++++++++++++++ tests/tests_fabric/test_cli.py | 47 +++++++++++++------ 4 files changed, 70 insertions(+), 17 deletions(-) diff --git a/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst b/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst index da380c26a7783..adb73d228831f 100644 --- a/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst +++ b/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst @@ -187,7 +187,7 @@ It is possible to convert a distributed checkpoint to a regular, single-file che .. code-block:: bash - python -m lightning.fabric.utilities.consolidate_checkpoint path/to/my/checkpoint + fabric consolidate path/to/my/checkpoint You will need to do this for example if you want to load the checkpoint into a script that doesn't use FSDP, or need to export the checkpoint to a different format for deployment, evaluation, etc. @@ -202,7 +202,7 @@ You will need to do this for example if you want to load the checkpoint into a s .. code-block:: bash - python -m lightning.fabric.utilities.consolidate_checkpoint my-checkpoint.ckpt + fabric consolidate my-checkpoint.ckpt This saves a new file ``my-checkpoint.ckpt.consolidated`` next to the sharded checkpoint which you can load normally in PyTorch: diff --git a/src/lightning/fabric/CHANGELOG.md b/src/lightning/fabric/CHANGELOG.md index 235c36b82f6b1..0cec850ed3483 100644 --- a/src/lightning/fabric/CHANGELOG.md +++ b/src/lightning/fabric/CHANGELOG.md @@ -9,7 +9,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). ### Added -- +- Enabled consolidating distributed checkpoints through `fabric consolidate` in the new CLI [#19560](https://github.com/Lightning-AI/pytorch-lightning/pull/19560)) - diff --git a/src/lightning/fabric/cli.py b/src/lightning/fabric/cli.py index 80256a0f088cc..d8c6fe47b6630 100644 --- a/src/lightning/fabric/cli.py +++ b/src/lightning/fabric/cli.py @@ -19,14 +19,17 @@ from argparse import Namespace from typing import Any, List, Optional +import torch from lightning_utilities.core.imports import RequirementCache from typing_extensions import get_args from lightning.fabric.accelerators import CPUAccelerator, CUDAAccelerator, MPSAccelerator from lightning.fabric.plugins.precision.precision import _PRECISION_INPUT_STR, _PRECISION_INPUT_STR_ALIAS from lightning.fabric.strategies import STRATEGY_REGISTRY +from lightning.fabric.utilities.consolidate_checkpoint import _process_cli_args from lightning.fabric.utilities.device_parser import _parse_gpu_ids from lightning.fabric.utilities.distributed import _suggested_max_num_threads +from lightning.fabric.utilities.load import _load_distributed_checkpoint _log = logging.getLogger(__name__) @@ -154,6 +157,37 @@ def _run(**kwargs: Any) -> None: script_args = list(kwargs.pop("script_args", [])) main(args=Namespace(**kwargs), script_args=script_args) + @_main.command( + "consolidate", + context_settings={ + "ignore_unknown_options": True, + }, + ) + @click.argument( + "checkpoint_folder", + type=click.Path(exists=True), + ) + @click.option( + "--output_file", + type=click.Path(exists=True), + default=None, + help=( + "Path to the file where the converted checkpoint should be saved. The file should not already exist." + " If no path is provided, the file will be saved next to the input checkpoint folder with the same name" + " and a '.consolidated' suffix." + ), + ) + def _consolidate(checkpoint_folder: str, output_file: Optional[str]) -> None: + """Convert a distributed/sharded checkpoint into a single file that can be loaded with `torch.load()`. + + Only supports FSDP sharded checkpoints at the moment. + + """ + args = Namespace(checkpoint_folder=checkpoint_folder, output_file=output_file) + config = _process_cli_args(args) + checkpoint = _load_distributed_checkpoint(config.checkpoint_folder) + torch.save(checkpoint, config.output_file) + def _set_env_variables(args: Namespace) -> None: """Set the environment variables for the new processes. diff --git a/tests/tests_fabric/test_cli.py b/tests/tests_fabric/test_cli.py index 5560d114d9369..0e58acb3c7267 100644 --- a/tests/tests_fabric/test_cli.py +++ b/tests/tests_fabric/test_cli.py @@ -20,7 +20,7 @@ from unittest.mock import Mock import pytest -from lightning.fabric.cli import _get_supported_strategies, _run +from lightning.fabric.cli import _consolidate, _get_supported_strategies, _run from tests_fabric.helpers.runif import RunIf @@ -33,7 +33,7 @@ def fake_script(tmp_path): @mock.patch.dict(os.environ, os.environ.copy(), clear=True) -def test_cli_env_vars_defaults(monkeypatch, fake_script): +def test_run_env_vars_defaults(monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script]) @@ -49,7 +49,7 @@ def test_cli_env_vars_defaults(monkeypatch, fake_script): @pytest.mark.parametrize("accelerator", ["cpu", "gpu", "cuda", pytest.param("mps", marks=RunIf(mps=True))]) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2) -def test_cli_env_vars_accelerator(_, accelerator, monkeypatch, fake_script): +def test_run_env_vars_accelerator(_, accelerator, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--accelerator", accelerator]) @@ -60,7 +60,7 @@ def test_cli_env_vars_accelerator(_, accelerator, monkeypatch, fake_script): @pytest.mark.parametrize("strategy", _get_supported_strategies()) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2) -def test_cli_env_vars_strategy(_, strategy, monkeypatch, fake_script): +def test_run_env_vars_strategy(_, strategy, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--strategy", strategy]) @@ -68,7 +68,7 @@ def test_cli_env_vars_strategy(_, strategy, monkeypatch, fake_script): assert os.environ["LT_STRATEGY"] == strategy -def test_cli_get_supported_strategies(): +def test_run_get_supported_strategies(): """Test to ensure that when new strategies get added, we must consider updating the list of supported ones in the CLI.""" assert len(_get_supported_strategies()) == 7 @@ -76,7 +76,7 @@ def test_cli_get_supported_strategies(): @pytest.mark.parametrize("strategy", ["ddp_spawn", "ddp_fork", "ddp_notebook", "deepspeed_stage_3_offload"]) -def test_cli_env_vars_unsupported_strategy(strategy, fake_script): +def test_run_env_vars_unsupported_strategy(strategy, fake_script): ioerr = StringIO() with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr): _run.main([fake_script, "--strategy", strategy]) @@ -87,7 +87,7 @@ def test_cli_env_vars_unsupported_strategy(strategy, fake_script): @pytest.mark.parametrize("devices", ["1", "2", "0,", "1,0", "-1"]) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2) -def test_cli_env_vars_devices_cuda(_, devices, monkeypatch, fake_script): +def test_run_env_vars_devices_cuda(_, devices, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--accelerator", "cuda", "--devices", devices]) @@ -98,7 +98,7 @@ def test_cli_env_vars_devices_cuda(_, devices, monkeypatch, fake_script): @RunIf(mps=True) @pytest.mark.parametrize("accelerator", ["mps", "gpu"]) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) -def test_cli_env_vars_devices_mps(accelerator, monkeypatch, fake_script): +def test_run_env_vars_devices_mps(accelerator, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--accelerator", accelerator]) @@ -108,7 +108,7 @@ def test_cli_env_vars_devices_mps(accelerator, monkeypatch, fake_script): @pytest.mark.parametrize("num_nodes", ["1", "2", "3"]) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) -def test_cli_env_vars_num_nodes(num_nodes, monkeypatch, fake_script): +def test_run_env_vars_num_nodes(num_nodes, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--num-nodes", num_nodes]) @@ -118,7 +118,7 @@ def test_cli_env_vars_num_nodes(num_nodes, monkeypatch, fake_script): @pytest.mark.parametrize("precision", ["64-true", "64", "32-true", "32", "16-mixed", "bf16-mixed"]) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) -def test_cli_env_vars_precision(precision, monkeypatch, fake_script): +def test_run_env_vars_precision(precision, monkeypatch, fake_script): monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock()) with pytest.raises(SystemExit) as e: _run.main([fake_script, "--precision", precision]) @@ -127,7 +127,7 @@ def test_cli_env_vars_precision(precision, monkeypatch, fake_script): @mock.patch.dict(os.environ, os.environ.copy(), clear=True) -def test_cli_torchrun_defaults(monkeypatch, fake_script): +def test_run_torchrun_defaults(monkeypatch, fake_script): torchrun_mock = Mock() monkeypatch.setitem(sys.modules, "torch.distributed.run", torchrun_mock) with pytest.raises(SystemExit) as e: @@ -155,7 +155,7 @@ def test_cli_torchrun_defaults(monkeypatch, fake_script): ) @mock.patch.dict(os.environ, os.environ.copy(), clear=True) @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=5) -def test_cli_torchrun_num_processes_launched(_, devices, expected, monkeypatch, fake_script): +def test_run_torchrun_num_processes_launched(_, devices, expected, monkeypatch, fake_script): torchrun_mock = Mock() monkeypatch.setitem(sys.modules, "torch.distributed.run", torchrun_mock) with pytest.raises(SystemExit) as e: @@ -171,7 +171,7 @@ def test_cli_torchrun_num_processes_launched(_, devices, expected, monkeypatch, ]) -def test_cli_through_fabric_entry_point(): +def test_run_through_fabric_entry_point(): result = subprocess.run("fabric run --help", capture_output=True, text=True, shell=True) message = "Usage: fabric run [OPTIONS] SCRIPT [SCRIPT_ARGS]" @@ -179,7 +179,7 @@ def test_cli_through_fabric_entry_point(): @pytest.mark.skipif("lightning.fabric" == "lightning_fabric", reason="standalone package") -def test_cli_through_lightning_entry_point(): +def test_run_through_lightning_entry_point(): result = subprocess.run("lightning run model --help", capture_output=True, text=True, shell=True) deprecation_message = ( @@ -189,3 +189,22 @@ def test_cli_through_lightning_entry_point(): message = "Usage: lightning run [OPTIONS] SCRIPT [SCRIPT_ARGS]" assert deprecation_message in result.stdout assert message in result.stdout or message in result.stderr + + +@mock.patch("lightning.fabric.cli._process_cli_args") +@mock.patch("lightning.fabric.cli._load_distributed_checkpoint") +@mock.patch("lightning.fabric.cli.torch.save") +def test_consolidate(save_mock, _, __, tmp_path): + ioerr = StringIO() + with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr): + _consolidate.main(["not exist"]) + assert e.value.code == 2 + assert "Path 'not exist' does not exist" in ioerr.getvalue() + + checkpoint_folder = tmp_path / "checkpoint" + checkpoint_folder.mkdir() + ioerr = StringIO() + with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr): + _consolidate.main([str(checkpoint_folder)]) + assert e.value.code == 0 + save_mock.assert_called_once() From 942e6507287bb0137e255f066dab32ebb65fa894 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 4 Mar 2024 14:51:25 +0100 Subject: [PATCH 04/10] Bump vite from 2.9.16 to 2.9.17 in /src/lightning/app/cli/react-ui-template/ui (#19319) Bump vite in /src/lightning/app/cli/react-ui-template/ui Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 2.9.16 to 2.9.17. - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v2.9.17/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v2.9.17/packages/vite) --- updated-dependencies: - dependency-name: vite dependency-type: direct:development ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --- src/lightning/app/cli/react-ui-template/ui/package.json | 2 +- src/lightning/app/cli/react-ui-template/ui/yarn.lock | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/lightning/app/cli/react-ui-template/ui/package.json b/src/lightning/app/cli/react-ui-template/ui/package.json index 71e2cc00f988d..d43665302c55a 100644 --- a/src/lightning/app/cli/react-ui-template/ui/package.json +++ b/src/lightning/app/cli/react-ui-template/ui/package.json @@ -24,7 +24,7 @@ "@vitejs/plugin-react": "^1.0.7", "prettier": "^2.5.1", "typescript": "^4.5.4", - "vite": "^2.9.16" + "vite": "^2.9.17" }, "main": "index.js", "license": "MIT" diff --git a/src/lightning/app/cli/react-ui-template/ui/yarn.lock b/src/lightning/app/cli/react-ui-template/ui/yarn.lock index 3ef6acfa43398..66458662bcdf5 100644 --- a/src/lightning/app/cli/react-ui-template/ui/yarn.lock +++ b/src/lightning/app/cli/react-ui-template/ui/yarn.lock @@ -1260,10 +1260,10 @@ update-browserslist-db@^1.0.4: escalade "^3.1.1" picocolors "^1.0.0" -vite@^2.9.16: - version "2.9.16" - resolved "https://registry.yarnpkg.com/vite/-/vite-2.9.16.tgz#daf7ba50f5cc37a7bf51b118ba06bc36e97898e9" - integrity sha512-X+6q8KPyeuBvTQV8AVSnKDvXoBMnTx8zxh54sOwmmuOdxkjMmEJXH2UEchA+vTMps1xw9vL64uwJOWryULg7nA== +vite@^2.9.17: + version "2.9.17" + resolved "https://registry.yarnpkg.com/vite/-/vite-2.9.17.tgz#6b770525e12fa2a2e3a0fa0d028d304f4f7dc7d4" + integrity sha512-XxcRzra6d7xrKXH66jZUgb+srThoPu+TLJc06GifUyKq9JmjHkc1Numc8ra0h56rju2jfVWw3B3fs5l3OFMvUw== dependencies: esbuild "^0.14.27" postcss "^8.4.13" From b19c3a961c79028d7c39a4f1ff1c2df991406d1d Mon Sep 17 00:00:00 2001 From: Jirka Borovec <6035284+Borda@users.noreply.github.com> Date: Mon, 4 Mar 2024 15:09:04 +0100 Subject: [PATCH 05/10] ci: pin `pytest` for package doctests (#19567) --- .github/workflows/ci-pkg-install.yml | 2 +- requirements/doctests.txt | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) create mode 100644 requirements/doctests.txt diff --git a/.github/workflows/ci-pkg-install.yml b/.github/workflows/ci-pkg-install.yml index 20530be8b017a..67a9b9f21b515 100644 --- a/.github/workflows/ci-pkg-install.yml +++ b/.github/workflows/ci-pkg-install.yml @@ -115,7 +115,7 @@ jobs: done - name: Install pytest doctest extension run: | - pip install -q "pytest-doctestplus>=0.9.0" + pip install -q -r requirements/doctests.txt pip list - name: DocTest package diff --git a/requirements/doctests.txt b/requirements/doctests.txt new file mode 100644 index 0000000000000..703f221660c70 --- /dev/null +++ b/requirements/doctests.txt @@ -0,0 +1,2 @@ +pytest ==7.4.0 +pytest-doctestplus ==1.0.0 From 527d071f4934e8a28239669316ba11d7e104cc79 Mon Sep 17 00:00:00 2001 From: Kashif Rasul Date: Mon, 4 Mar 2024 16:11:31 +0100 Subject: [PATCH 06/10] Bump bitsandbytes minimum version (#19520) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com> Co-authored-by: awaelchli Co-authored-by: Carlos MocholĂ­ --- requirements/fabric/strategies.txt | 2 +- requirements/pytorch/extra.txt | 2 +- src/lightning/fabric/plugins/precision/bitsandbytes.py | 7 +++---- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/requirements/fabric/strategies.txt b/requirements/fabric/strategies.txt index 0c7804183e393..6c302f21269e3 100644 --- a/requirements/fabric/strategies.txt +++ b/requirements/fabric/strategies.txt @@ -6,4 +6,4 @@ # note: is a bug around 0.10 with `MPS_Accelerator must implement all abstract methods` # shall be resolved by https://github.com/microsoft/DeepSpeed/issues/4372 deepspeed >=0.8.2, <=0.9.3; platform_system != "Windows" # strict -bitsandbytes ==0.41.0 # strict +bitsandbytes >=0.42.0,<0.43.0 diff --git a/requirements/pytorch/extra.txt b/requirements/pytorch/extra.txt index 39e3ff61d4e00..55960d7cd11cb 100644 --- a/requirements/pytorch/extra.txt +++ b/requirements/pytorch/extra.txt @@ -8,4 +8,4 @@ hydra-core >=1.0.5, <1.4.0 jsonargparse[signatures] >=4.27.5, <4.28.0 rich >=12.3.0, <13.6.0 tensorboardX >=2.2, <2.7.0 # min version is set by torch.onnx missing attribute -bitsandbytes ==0.41.0 # strict +bitsandbytes >=0.42.0,<0.43.0 diff --git a/src/lightning/fabric/plugins/precision/bitsandbytes.py b/src/lightning/fabric/plugins/precision/bitsandbytes.py index 2816f16fdf478..12a0ac3998b6e 100644 --- a/src/lightning/fabric/plugins/precision/bitsandbytes.py +++ b/src/lightning/fabric/plugins/precision/bitsandbytes.py @@ -39,8 +39,7 @@ log = logging.getLogger(__name__) -# TODO: unpin after resolving the `quant_state` format breaking changes -_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes==0.41.0") +_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes>=0.42.0") class BitsandbytesPrecision(Precision): @@ -344,7 +343,7 @@ def quantize( def to_empty(self, *, device: _DEVICE, recurse: bool = True) -> Self: if self.weight.dtype == torch.uint8: # was quantized # cannot init the quantized params directly - weight = torch.empty(self.weight.quant_state[1], device=device, dtype=torch.half) + weight = torch.empty(self.weight.quant_state.shape, device=device, dtype=torch.half) else: weight = torch.empty_like(self.weight.data, device=device) device = torch.device(device) @@ -366,7 +365,7 @@ def reset_parameters(self) -> None: linear_init_finished = isinstance(self.weight, bnb.nn.Params4bit) if linear_init_finished and self.weight.dtype == torch.uint8: # was quantized # cannot init the quantized params directly - weight = torch.empty(self.weight.quant_state[1], device=self.weight.device, dtype=torch.half) + weight = torch.empty(self.weight.quant_state.shape, device=self.weight.device, dtype=torch.half) else: weight = self.weight.data torch.nn.init.kaiming_uniform_(weight, a=math.sqrt(5)) From b3c869f6366e3ea87770c78d41edde75c1b26ca5 Mon Sep 17 00:00:00 2001 From: awaelchli Date: Mon, 4 Mar 2024 16:13:31 +0100 Subject: [PATCH 07/10] Revise checkpoint consolidation with PyTorch 2.3 (#19561) --- .../utilities/consolidate_checkpoint.py | 6 +- src/lightning/fabric/utilities/load.py | 69 ++++--------------- .../strategies/test_fsdp_integration.py | 6 +- .../utilities/test_consolidate_checkpoint.py | 8 +-- tests/tests_fabric/utilities/test_load.py | 22 ------ tests/tests_pytorch/strategies/test_fsdp.py | 3 +- 6 files changed, 23 insertions(+), 91 deletions(-) diff --git a/src/lightning/fabric/utilities/consolidate_checkpoint.py b/src/lightning/fabric/utilities/consolidate_checkpoint.py index b41e8f8a1312e..15d20d8d89ecc 100644 --- a/src/lightning/fabric/utilities/consolidate_checkpoint.py +++ b/src/lightning/fabric/utilities/consolidate_checkpoint.py @@ -4,7 +4,7 @@ import torch -from lightning.fabric.utilities.imports import _TORCH_GREATER_EQUAL_2_1 +from lightning.fabric.utilities.imports import _TORCH_GREATER_EQUAL_2_3 from lightning.fabric.utilities.load import _METADATA_FILENAME, _load_distributed_checkpoint _log = logging.getLogger(__name__) @@ -38,8 +38,8 @@ def _parse_cli_args() -> Namespace: def _process_cli_args(args: Namespace) -> Namespace: - if not _TORCH_GREATER_EQUAL_2_1: - _log.error("Processing distributed checkpoints requires PyTorch >= 2.1.") + if not _TORCH_GREATER_EQUAL_2_3: + _log.error("Processing distributed checkpoints requires PyTorch >= 2.3.") exit(1) checkpoint_folder = Path(args.checkpoint_folder) diff --git a/src/lightning/fabric/utilities/load.py b/src/lightning/fabric/utilities/load.py index bab8e29823903..29ccca9e4375f 100644 --- a/src/lightning/fabric/utilities/load.py +++ b/src/lightning/fabric/utilities/load.py @@ -16,7 +16,7 @@ from functools import partial from io import BytesIO from pathlib import Path -from typing import IO, TYPE_CHECKING, Any, Callable, Dict, Optional, OrderedDict, Sequence, Set, Tuple, Union +from typing import IO, TYPE_CHECKING, Any, Callable, Dict, Optional, OrderedDict, Sequence, Set, Union import torch from lightning_utilities.core.apply_func import apply_to_collection @@ -27,8 +27,7 @@ from lightning.fabric.utilities.imports import ( _TORCH_GREATER_EQUAL_2_0, - _TORCH_GREATER_EQUAL_2_1, - _TORCH_GREATER_EQUAL_2_2, + _TORCH_GREATER_EQUAL_2_3, ) from lightning.fabric.utilities.types import _PATH, _Stateful @@ -243,38 +242,20 @@ def _load_distributed_checkpoint(checkpoint_folder: Path) -> Dict[str, Any]: The current implementation assumes that the entire checkpoint fits in CPU memory. """ - if not _TORCH_GREATER_EQUAL_2_1: - raise ImportError("Processing distributed checkpoints requires PyTorch >= 2.1.") + if not _TORCH_GREATER_EQUAL_2_3: + raise ImportError("Processing distributed checkpoints requires PyTorch >= 2.3.") from torch.distributed.checkpoint import FileSystemReader - from torch.distributed.checkpoint.metadata import BytesStorageMetadata, TensorStorageMetadata + from torch.distributed.checkpoint.format_utils import _EmptyStateDictLoadPlanner + from torch.distributed.checkpoint.state_dict_loader import _load_state_dict - if _TORCH_GREATER_EQUAL_2_2: - from torch.distributed.checkpoint import load - else: - from torch.distributed.checkpoint import load_state_dict as load # deprecated - - reader = FileSystemReader(checkpoint_folder) - metadata = reader.read_metadata() - - # TODO: Add sequential save to avoid storing the entire checkpoint in memory checkpoint: Dict[str, Any] = {} - for tensor_name, sd_metadata in metadata.state_dict_metadata.items(): - if isinstance(sd_metadata, BytesStorageMetadata): - checkpoint[tensor_name] = "" - elif isinstance(sd_metadata, TensorStorageMetadata): - checkpoint[tensor_name] = torch.empty( - size=sd_metadata.size, - dtype=sd_metadata.properties.dtype, - device=torch.device("cpu"), - memory_format=sd_metadata.properties.memory_format, - layout=sd_metadata.properties.layout, - requires_grad=sd_metadata.properties.requires_grad, - pin_memory=sd_metadata.properties.pin_memory, - ) - - load(state_dict=checkpoint, storage_reader=reader, no_dist=True) - checkpoint = _unflatten_dict(checkpoint, key_map=metadata.planner_data) + _load_state_dict( + checkpoint, + storage_reader=FileSystemReader(checkpoint_folder), + planner=_EmptyStateDictLoadPlanner(), + no_dist=True, + ) # This is the extra file saved by Fabric, with user data separate from weights and optimizer states extra_file = checkpoint_folder / _METADATA_FILENAME @@ -282,29 +263,3 @@ def _load_distributed_checkpoint(checkpoint_folder: Path) -> Dict[str, Any]: checkpoint.update(extra) return checkpoint - - -def _unflatten_dict(checkpoint: Dict[str, Any], key_map: Dict[str, Tuple[str, ...]]) -> Dict[str, Any]: - """Converts the flat dictionary with keys 'x.y.z...' to a nested dictionary using the provided key map. - - Args: - checkpoint: The flat checkpoint dictionary. - key_map: A dictionary that maps the keys in flattened format 'x.y.z...' to a tuple representing - the index path into the nested dictonary that this function should construct. - - """ - assert checkpoint.keys() == key_map.keys() - converted: Dict[str, Any] = {} - for flat_key in checkpoint: - key_path = key_map[flat_key] - _set_nested_dict_value(converted, key_path, checkpoint[flat_key]) - return converted - - -def _set_nested_dict_value(nested_dict: Dict[str, Any], key_path: Tuple[str, ...], value: Any) -> None: - result = nested_dict - for key in key_path[:-1]: - if key not in result: - result[key] = {} - result = result[key] - result[key_path[-1]] = value diff --git a/tests/tests_fabric/strategies/test_fsdp_integration.py b/tests/tests_fabric/strategies/test_fsdp_integration.py index cae84957d1e66..4a971294a326d 100644 --- a/tests/tests_fabric/strategies/test_fsdp_integration.py +++ b/tests/tests_fabric/strategies/test_fsdp_integration.py @@ -621,8 +621,7 @@ def test_clip_gradients(clip_type, precision): optimizer.zero_grad() -# TODO: Support checkpoint consolidation with PyTorch >= 2.2 -@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.1.0", max_torch="2.2.0") +@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.3.0") def test_save_sharded_and_consolidate_and_load(tmp_path): """Test the consolidation of a FSDP-sharded checkpoint into a single file.""" @@ -639,7 +638,8 @@ def test_save_sharded_and_consolidate_and_load(tmp_path): state = {"model": model, "optimizer": optimizer, "steps": 1} # run one iteration to init the state of the optimizer - model(torch.rand(1, 32, device=fabric.device)).sum().backward() + loss = model(torch.rand(1, 32, device=fabric.device)).sum() + fabric.backward(loss) optimizer.step() checkpoint_path_sharded = fabric.broadcast(str(tmp_path / "checkpoint_sharded")) diff --git a/tests/tests_fabric/utilities/test_consolidate_checkpoint.py b/tests/tests_fabric/utilities/test_consolidate_checkpoint.py index 16feb3d3c1014..216b77e6b9299 100644 --- a/tests/tests_fabric/utilities/test_consolidate_checkpoint.py +++ b/tests/tests_fabric/utilities/test_consolidate_checkpoint.py @@ -39,15 +39,15 @@ def test_parse_cli_args(args, expected): def test_process_cli_args(tmp_path, caplog, monkeypatch): - # PyTorch version < 2.1 - monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_1", False) + # PyTorch version < 2.3 + monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_3", False) with caplog.at_level(logging.ERROR, logger="lightning.fabric.utilities.consolidate_checkpoint"), pytest.raises( SystemExit ): _process_cli_args(Namespace()) - assert "requires PyTorch >= 2.1." in caplog.text + assert "requires PyTorch >= 2.3." in caplog.text caplog.clear() - monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_1", True) + monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_3", True) # Checkpoint does not exist checkpoint_folder = Path("does/not/exist") diff --git a/tests/tests_fabric/utilities/test_load.py b/tests/tests_fabric/utilities/test_load.py index eb534bf1cdca1..574f8bf36247b 100644 --- a/tests/tests_fabric/utilities/test_load.py +++ b/tests/tests_fabric/utilities/test_load.py @@ -19,7 +19,6 @@ _materialize_tensors, _move_state_into, _NotYetLoadedTensor, - _unflatten_dict, ) from tests_fabric.helpers.runif import RunIf @@ -145,24 +144,3 @@ def load_state_dict(self, state_dict): assert source == {} assert destination["cocofruit"] == 2 assert destination["banana"].count == 100 - - -def test_unflatten_dict(): - assert _unflatten_dict({}, {}) == {} - - tensor0 = torch.rand(2, 2) - tensor1 = torch.tensor(3.0) - data = { - "model.layer.weight": tensor0, - "optimizer.state.layer.weight.exp_avg": {"test": tensor1}, - "optimizer.param_groups": "param_groups", - } - key_map = { - "model.layer.weight": ("model", "layer.weight"), - "optimizer.state.layer.weight.exp_avg": ("optimizer", "state", "layer.weight", "exp_avg"), - "optimizer.param_groups": ("optimizer", "param_groups"), - } - assert _unflatten_dict(data, key_map) == { - "model": {"layer.weight": tensor0}, - "optimizer": {"state": {"layer.weight": {"exp_avg": {"test": tensor1}}}, "param_groups": "param_groups"}, - } diff --git a/tests/tests_pytorch/strategies/test_fsdp.py b/tests/tests_pytorch/strategies/test_fsdp.py index f5513f49fd82c..eaba32a7839d5 100644 --- a/tests/tests_pytorch/strategies/test_fsdp.py +++ b/tests/tests_pytorch/strategies/test_fsdp.py @@ -1013,8 +1013,7 @@ def _run_setup_assertions(empty_init, expected_device): _run_setup_assertions(empty_init=True, expected_device=torch.device("cpu")) -# TODO: Support checkpoint consolidation with PyTorch >= 2.2 -@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.1.0", max_torch="2.2.0") +@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.3.0") def test_save_sharded_and_consolidate_and_load(tmp_path): """Test the consolidation of a FSDP-sharded checkpoint into a single file.""" From 7c7f0ee7433abc36e380e8253a6014d1ce69ca1a Mon Sep 17 00:00:00 2001 From: PL Ghost <75324987+pl-ghost@users.noreply.github.com> Date: Mon, 4 Mar 2024 21:20:18 +0100 Subject: [PATCH 08/10] Adding test for legacy checkpoint created with 2.2.1 (#19570) --- tests/legacy/back-compatible-versions.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/legacy/back-compatible-versions.txt b/tests/legacy/back-compatible-versions.txt index d150189111304..1d8e1abccfdd1 100644 --- a/tests/legacy/back-compatible-versions.txt +++ b/tests/legacy/back-compatible-versions.txt @@ -97,3 +97,4 @@ 2.1.2 2.1.3 2.2.0.post0 +2.2.1 From eb24a9ac6589e7cef996a793a0aa197ee8bda3ef Mon Sep 17 00:00:00 2001 From: awaelchli Date: Mon, 4 Mar 2024 21:44:51 +0100 Subject: [PATCH 09/10] Update changelog after 2.2.1 release (#19571) --- src/lightning/fabric/CHANGELOG.md | 9 ++++++++- src/lightning/pytorch/CHANGELOG.md | 15 +++++++++++---- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/src/lightning/fabric/CHANGELOG.md b/src/lightning/fabric/CHANGELOG.md index 0cec850ed3483..4c1814bf9f11a 100644 --- a/src/lightning/fabric/CHANGELOG.md +++ b/src/lightning/fabric/CHANGELOG.md @@ -46,13 +46,20 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). ### Fixed -- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446)) +- - - +## [2.2.1] - 2024-03-04 + +### Fixed + +- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446)) + + ## [2.2.0] - 2024-02-08 ### Added diff --git a/src/lightning/pytorch/CHANGELOG.md b/src/lightning/pytorch/CHANGELOG.md index 5074eaeef49fa..cf3772c8596c2 100644 --- a/src/lightning/pytorch/CHANGELOG.md +++ b/src/lightning/pytorch/CHANGELOG.md @@ -42,17 +42,24 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). ### Fixed -- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446)) +- +- -- Fixed the divisibility check for `Trainer.accumulate_grad_batches` and `Trainer.log_every_n_steps` in ThroughputMonitor ([#19470](https://github.com/Lightning-AI/lightning/pull/19470)) +- +- -- Fixed support for Remote Stop and Remote Abort with NeptuneLogger ([#19130](https://github.com/Lightning-AI/pytorch-lightning/pull/19130)) +## [2.2.1] - 2024-03-04 -- Fixed infinite recursion error in precision plugin graveyard ([#19542](https://github.com/Lightning-AI/pytorch-lightning/pull/19542)) +### Fixed + +- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446)) +- Fixed the divisibility check for `Trainer.accumulate_grad_batches` and `Trainer.log_every_n_steps` in ThroughputMonitor ([#19470](https://github.com/Lightning-AI/lightning/pull/19470)) +- Fixed support for Remote Stop and Remote Abort with NeptuneLogger ([#19130](https://github.com/Lightning-AI/pytorch-lightning/pull/19130)) +- Fixed infinite recursion error in precision plugin graveyard ([#19542](https://github.com/Lightning-AI/pytorch-lightning/pull/19542)) ## [2.2.0] - 2024-02-08 From b871f7a826db015b8b909cf86302c6c4cbf18ec1 Mon Sep 17 00:00:00 2001 From: Jirka Borovec <6035284+Borda@users.noreply.github.com> Date: Mon, 4 Mar 2024 21:51:19 +0100 Subject: [PATCH 10/10] docs: switch NGC link to Nemo (#19568) --- docs/source-pytorch/ecosystem/asr_nlp_tts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source-pytorch/ecosystem/asr_nlp_tts.rst b/docs/source-pytorch/ecosystem/asr_nlp_tts.rst index af7fd05af709a..5f0ea44117662 100644 --- a/docs/source-pytorch/ecosystem/asr_nlp_tts.rst +++ b/docs/source-pytorch/ecosystem/asr_nlp_tts.rst @@ -86,7 +86,7 @@ To install from a local clone of NeMo: ./reinstall.sh # from cloned NeMo's git root For Docker users, the NeMo container is available on -`NGC `_. +`NGC `_. .. code-block:: bash