From 34f036917d83a0805c46c32cf4035a5dffbd5c76 Mon Sep 17 00:00:00 2001
From: Leng Yue <lengyue@lengyue.me>
Date: Mon, 4 Mar 2024 03:10:38 -0800
Subject: [PATCH 01/10] Document `ddp_find_unused_parameters_true` in Fabric
 (#19564)

---
 docs/source-fabric/api/fabric_args.rst | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/source-fabric/api/fabric_args.rst b/docs/source-fabric/api/fabric_args.rst
index a126667259a63..4396aa26e908a 100644
--- a/docs/source-fabric/api/fabric_args.rst
+++ b/docs/source-fabric/api/fabric_args.rst
@@ -36,13 +36,16 @@ See also: :doc:`../fundamentals/accelerators`
 strategy
 ========
 
-Choose a training strategy: ``"dp"``, ``"ddp"``, ``"ddp_spawn"``, ``"xla"``, ``"deepspeed"``, ``"fsdp"````.
+Choose a training strategy: ``"dp"``, ``"ddp"``, ``"ddp_spawn"``, ``"ddp_find_unused_parameters_true"``, ``"xla"``, ``"deepspeed"``, ``"fsdp"``.
 
 .. code-block:: python
 
     # Running with the DistributedDataParallel strategy on 4 GPUs
     fabric = Fabric(strategy="ddp", accelerator="gpu", devices=4)
 
+    # Running with the DDP strategy with find unused parameters enabled on 4 GPUs
+    fabric = Fabric(strategy="ddp_find_unused_parameters_true", accelerator="gpu", devices=4)
+
     # Running with the DDP Spawn strategy using 4 CPU processes
     fabric = Fabric(strategy="ddp_spawn", accelerator="cpu", devices=4)
 

From d9113b61cc120eafb835a97f357cf484db51ad02 Mon Sep 17 00:00:00 2001
From: awaelchli <aedu.waelchli@gmail.com>
Date: Mon, 4 Mar 2024 14:00:50 +0100
Subject: [PATCH 02/10] Add additional references in compile guides (#19550)

---
 docs/source-fabric/advanced/compile.rst  | 19 ++++++++++++++++++-
 docs/source-pytorch/advanced/compile.rst | 23 ++++++++++++++++++++---
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/docs/source-fabric/advanced/compile.rst b/docs/source-fabric/advanced/compile.rst
index a36384ccc98fe..a8e1cc2db243c 100644
--- a/docs/source-fabric/advanced/compile.rst
+++ b/docs/source-fabric/advanced/compile.rst
@@ -3,7 +3,7 @@ Speed up models by compiling them
 #################################
 
 Compiling your PyTorch model can result in significant speedups, especially on the latest generations of GPUs.
-This guide shows you how to apply ``torch.compile`` correctly in your code.
+This guide shows you how to apply `torch.compile <https://pytorch.org/docs/2.2/generated/torch.compile.html>`_ correctly in your code.
 
 .. note::
 
@@ -223,6 +223,9 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a
     Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1.
 
 
+If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.
+
+
 ----
 
 
@@ -301,4 +304,18 @@ However, should you have issues compiling DDP and FSDP models, you can opt out o
     model = fabric.setup(model, _reapply_compile=False)
 
 
+----
+
+
+********************
+Additional Resources
+********************
+
+Here are a few resources for further reading after you complete this tutorial:
+
+- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
+- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
+- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
+- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_
+
 |
diff --git a/docs/source-pytorch/advanced/compile.rst b/docs/source-pytorch/advanced/compile.rst
index 6da769ee40279..73d5f4fbc2af4 100644
--- a/docs/source-pytorch/advanced/compile.rst
+++ b/docs/source-pytorch/advanced/compile.rst
@@ -3,7 +3,7 @@ Speed up models by compiling them
 #################################
 
 Compiling your LightningModule can result in significant speedups, especially on the latest generations of GPUs.
-This guide shows you how to apply ``torch.compile`` correctly in your code.
+This guide shows you how to apply `torch.compile <https://pytorch.org/docs/2.2/generated/torch.compile.html>`_ correctly in your code.
 
 .. note::
 
@@ -192,6 +192,8 @@ However, when this is not possible, you can request PyTorch to compile the code
 A model compiled with ``dynamic=True`` will typically be slower than a model compiled with static shapes, but it will avoid the extreme cost of recompilation every iteration.
 On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically and you should no longer need to set this.
 
+If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.
+
 
 ----
 
@@ -251,9 +253,9 @@ Always compare the speed and memory usage of the compiled model against the orig
 Limitations
 ***********
 
-There are a few limitations you should be aware of when using ``torch.compile`` in conjunction with the Trainer:
+There are a few limitations you should be aware of when using ``torch.compile`` **in conjunction with the Trainer**:
 
-* ``torch.compile`` currently does not get reapplied over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
+* The Trainer currently does not reapply ``torch.compile`` over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
   This limitation will be lifted in the future.
 
 * In some cases, using ``self.log()`` in your LightningModule will cause compilation errors.
@@ -270,4 +272,19 @@ There are a few limitations you should be aware of when using ``torch.compile``
               self.model = torch.compile(self.model)
               ...
 
+
+----
+
+
+********************
+Additional Resources
+********************
+
+Here are a few resources for further reading after you complete this tutorial:
+
+- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
+- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
+- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
+- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_
+
 |

From 13f15b38fc65e73dd707aab0db966e5c59bb11f9 Mon Sep 17 00:00:00 2001
From: awaelchli <aedu.waelchli@gmail.com>
Date: Mon, 4 Mar 2024 14:01:33 +0100
Subject: [PATCH 03/10] Support consolidating sharded checkpoints with the
 `fabric` CLI (#19560)

---
 .../checkpoint/distributed_checkpoint.rst     |  4 +-
 src/lightning/fabric/CHANGELOG.md             |  2 +-
 src/lightning/fabric/cli.py                   | 34 ++++++++++++++
 tests/tests_fabric/test_cli.py                | 47 +++++++++++++------
 4 files changed, 70 insertions(+), 17 deletions(-)

diff --git a/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst b/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst
index da380c26a7783..adb73d228831f 100644
--- a/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst
+++ b/docs/source-fabric/guide/checkpoint/distributed_checkpoint.rst
@@ -187,7 +187,7 @@ It is possible to convert a distributed checkpoint to a regular, single-file che
 
 .. code-block:: bash
 
-    python -m lightning.fabric.utilities.consolidate_checkpoint path/to/my/checkpoint
+    fabric consolidate path/to/my/checkpoint
 
 You will need to do this for example if you want to load the checkpoint into a script that doesn't use FSDP, or need to export the checkpoint to a different format for deployment, evaluation, etc.
 
@@ -202,7 +202,7 @@ You will need to do this for example if you want to load the checkpoint into a s
 
     .. code-block:: bash
 
-        python -m lightning.fabric.utilities.consolidate_checkpoint my-checkpoint.ckpt
+        fabric consolidate my-checkpoint.ckpt
 
     This saves a new file ``my-checkpoint.ckpt.consolidated`` next to the sharded checkpoint which you can load normally in PyTorch:
 
diff --git a/src/lightning/fabric/CHANGELOG.md b/src/lightning/fabric/CHANGELOG.md
index 235c36b82f6b1..0cec850ed3483 100644
--- a/src/lightning/fabric/CHANGELOG.md
+++ b/src/lightning/fabric/CHANGELOG.md
@@ -9,7 +9,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Added
 
--
+- Enabled consolidating distributed checkpoints through `fabric consolidate` in the new CLI [#19560](https://github.com/Lightning-AI/pytorch-lightning/pull/19560))
 
 -
 
diff --git a/src/lightning/fabric/cli.py b/src/lightning/fabric/cli.py
index 80256a0f088cc..d8c6fe47b6630 100644
--- a/src/lightning/fabric/cli.py
+++ b/src/lightning/fabric/cli.py
@@ -19,14 +19,17 @@
 from argparse import Namespace
 from typing import Any, List, Optional
 
+import torch
 from lightning_utilities.core.imports import RequirementCache
 from typing_extensions import get_args
 
 from lightning.fabric.accelerators import CPUAccelerator, CUDAAccelerator, MPSAccelerator
 from lightning.fabric.plugins.precision.precision import _PRECISION_INPUT_STR, _PRECISION_INPUT_STR_ALIAS
 from lightning.fabric.strategies import STRATEGY_REGISTRY
+from lightning.fabric.utilities.consolidate_checkpoint import _process_cli_args
 from lightning.fabric.utilities.device_parser import _parse_gpu_ids
 from lightning.fabric.utilities.distributed import _suggested_max_num_threads
+from lightning.fabric.utilities.load import _load_distributed_checkpoint
 
 _log = logging.getLogger(__name__)
 
@@ -154,6 +157,37 @@ def _run(**kwargs: Any) -> None:
         script_args = list(kwargs.pop("script_args", []))
         main(args=Namespace(**kwargs), script_args=script_args)
 
+    @_main.command(
+        "consolidate",
+        context_settings={
+            "ignore_unknown_options": True,
+        },
+    )
+    @click.argument(
+        "checkpoint_folder",
+        type=click.Path(exists=True),
+    )
+    @click.option(
+        "--output_file",
+        type=click.Path(exists=True),
+        default=None,
+        help=(
+            "Path to the file where the converted checkpoint should be saved. The file should not already exist."
+            " If no path is provided, the file will be saved next to the input checkpoint folder with the same name"
+            " and a '.consolidated' suffix."
+        ),
+    )
+    def _consolidate(checkpoint_folder: str, output_file: Optional[str]) -> None:
+        """Convert a distributed/sharded checkpoint into a single file that can be loaded with `torch.load()`.
+
+        Only supports FSDP sharded checkpoints at the moment.
+
+        """
+        args = Namespace(checkpoint_folder=checkpoint_folder, output_file=output_file)
+        config = _process_cli_args(args)
+        checkpoint = _load_distributed_checkpoint(config.checkpoint_folder)
+        torch.save(checkpoint, config.output_file)
+
 
 def _set_env_variables(args: Namespace) -> None:
     """Set the environment variables for the new processes.
diff --git a/tests/tests_fabric/test_cli.py b/tests/tests_fabric/test_cli.py
index 5560d114d9369..0e58acb3c7267 100644
--- a/tests/tests_fabric/test_cli.py
+++ b/tests/tests_fabric/test_cli.py
@@ -20,7 +20,7 @@
 from unittest.mock import Mock
 
 import pytest
-from lightning.fabric.cli import _get_supported_strategies, _run
+from lightning.fabric.cli import _consolidate, _get_supported_strategies, _run
 
 from tests_fabric.helpers.runif import RunIf
 
@@ -33,7 +33,7 @@ def fake_script(tmp_path):
 
 
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
-def test_cli_env_vars_defaults(monkeypatch, fake_script):
+def test_run_env_vars_defaults(monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script])
@@ -49,7 +49,7 @@ def test_cli_env_vars_defaults(monkeypatch, fake_script):
 @pytest.mark.parametrize("accelerator", ["cpu", "gpu", "cuda", pytest.param("mps", marks=RunIf(mps=True))])
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
 @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
-def test_cli_env_vars_accelerator(_, accelerator, monkeypatch, fake_script):
+def test_run_env_vars_accelerator(_, accelerator, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--accelerator", accelerator])
@@ -60,7 +60,7 @@ def test_cli_env_vars_accelerator(_, accelerator, monkeypatch, fake_script):
 @pytest.mark.parametrize("strategy", _get_supported_strategies())
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
 @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
-def test_cli_env_vars_strategy(_, strategy, monkeypatch, fake_script):
+def test_run_env_vars_strategy(_, strategy, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--strategy", strategy])
@@ -68,7 +68,7 @@ def test_cli_env_vars_strategy(_, strategy, monkeypatch, fake_script):
     assert os.environ["LT_STRATEGY"] == strategy
 
 
-def test_cli_get_supported_strategies():
+def test_run_get_supported_strategies():
     """Test to ensure that when new strategies get added, we must consider updating the list of supported ones in the
     CLI."""
     assert len(_get_supported_strategies()) == 7
@@ -76,7 +76,7 @@ def test_cli_get_supported_strategies():
 
 
 @pytest.mark.parametrize("strategy", ["ddp_spawn", "ddp_fork", "ddp_notebook", "deepspeed_stage_3_offload"])
-def test_cli_env_vars_unsupported_strategy(strategy, fake_script):
+def test_run_env_vars_unsupported_strategy(strategy, fake_script):
     ioerr = StringIO()
     with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr):
         _run.main([fake_script, "--strategy", strategy])
@@ -87,7 +87,7 @@ def test_cli_env_vars_unsupported_strategy(strategy, fake_script):
 @pytest.mark.parametrize("devices", ["1", "2", "0,", "1,0", "-1"])
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
 @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=2)
-def test_cli_env_vars_devices_cuda(_, devices, monkeypatch, fake_script):
+def test_run_env_vars_devices_cuda(_, devices, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--accelerator", "cuda", "--devices", devices])
@@ -98,7 +98,7 @@ def test_cli_env_vars_devices_cuda(_, devices, monkeypatch, fake_script):
 @RunIf(mps=True)
 @pytest.mark.parametrize("accelerator", ["mps", "gpu"])
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
-def test_cli_env_vars_devices_mps(accelerator, monkeypatch, fake_script):
+def test_run_env_vars_devices_mps(accelerator, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--accelerator", accelerator])
@@ -108,7 +108,7 @@ def test_cli_env_vars_devices_mps(accelerator, monkeypatch, fake_script):
 
 @pytest.mark.parametrize("num_nodes", ["1", "2", "3"])
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
-def test_cli_env_vars_num_nodes(num_nodes, monkeypatch, fake_script):
+def test_run_env_vars_num_nodes(num_nodes, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--num-nodes", num_nodes])
@@ -118,7 +118,7 @@ def test_cli_env_vars_num_nodes(num_nodes, monkeypatch, fake_script):
 
 @pytest.mark.parametrize("precision", ["64-true", "64", "32-true", "32", "16-mixed", "bf16-mixed"])
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
-def test_cli_env_vars_precision(precision, monkeypatch, fake_script):
+def test_run_env_vars_precision(precision, monkeypatch, fake_script):
     monkeypatch.setitem(sys.modules, "torch.distributed.run", Mock())
     with pytest.raises(SystemExit) as e:
         _run.main([fake_script, "--precision", precision])
@@ -127,7 +127,7 @@ def test_cli_env_vars_precision(precision, monkeypatch, fake_script):
 
 
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
-def test_cli_torchrun_defaults(monkeypatch, fake_script):
+def test_run_torchrun_defaults(monkeypatch, fake_script):
     torchrun_mock = Mock()
     monkeypatch.setitem(sys.modules, "torch.distributed.run", torchrun_mock)
     with pytest.raises(SystemExit) as e:
@@ -155,7 +155,7 @@ def test_cli_torchrun_defaults(monkeypatch, fake_script):
 )
 @mock.patch.dict(os.environ, os.environ.copy(), clear=True)
 @mock.patch("lightning.fabric.accelerators.cuda.num_cuda_devices", return_value=5)
-def test_cli_torchrun_num_processes_launched(_, devices, expected, monkeypatch, fake_script):
+def test_run_torchrun_num_processes_launched(_, devices, expected, monkeypatch, fake_script):
     torchrun_mock = Mock()
     monkeypatch.setitem(sys.modules, "torch.distributed.run", torchrun_mock)
     with pytest.raises(SystemExit) as e:
@@ -171,7 +171,7 @@ def test_cli_torchrun_num_processes_launched(_, devices, expected, monkeypatch,
     ])
 
 
-def test_cli_through_fabric_entry_point():
+def test_run_through_fabric_entry_point():
     result = subprocess.run("fabric run --help", capture_output=True, text=True, shell=True)
 
     message = "Usage: fabric run [OPTIONS] SCRIPT [SCRIPT_ARGS]"
@@ -179,7 +179,7 @@ def test_cli_through_fabric_entry_point():
 
 
 @pytest.mark.skipif("lightning.fabric" == "lightning_fabric", reason="standalone package")
-def test_cli_through_lightning_entry_point():
+def test_run_through_lightning_entry_point():
     result = subprocess.run("lightning run model --help", capture_output=True, text=True, shell=True)
 
     deprecation_message = (
@@ -189,3 +189,22 @@ def test_cli_through_lightning_entry_point():
     message = "Usage: lightning run [OPTIONS] SCRIPT [SCRIPT_ARGS]"
     assert deprecation_message in result.stdout
     assert message in result.stdout or message in result.stderr
+
+
+@mock.patch("lightning.fabric.cli._process_cli_args")
+@mock.patch("lightning.fabric.cli._load_distributed_checkpoint")
+@mock.patch("lightning.fabric.cli.torch.save")
+def test_consolidate(save_mock, _, __, tmp_path):
+    ioerr = StringIO()
+    with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr):
+        _consolidate.main(["not exist"])
+    assert e.value.code == 2
+    assert "Path 'not exist' does not exist" in ioerr.getvalue()
+
+    checkpoint_folder = tmp_path / "checkpoint"
+    checkpoint_folder.mkdir()
+    ioerr = StringIO()
+    with pytest.raises(SystemExit) as e, contextlib.redirect_stderr(ioerr):
+        _consolidate.main([str(checkpoint_folder)])
+    assert e.value.code == 0
+    save_mock.assert_called_once()

From 942e6507287bb0137e255f066dab32ebb65fa894 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 4 Mar 2024 14:51:25 +0100
Subject: [PATCH 04/10] Bump vite from 2.9.16 to 2.9.17 in
 /src/lightning/app/cli/react-ui-template/ui (#19319)

Bump vite in /src/lightning/app/cli/react-ui-template/ui

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 2.9.16 to 2.9.17.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v2.9.17/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v2.9.17/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
 src/lightning/app/cli/react-ui-template/ui/package.json | 2 +-
 src/lightning/app/cli/react-ui-template/ui/yarn.lock    | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/lightning/app/cli/react-ui-template/ui/package.json b/src/lightning/app/cli/react-ui-template/ui/package.json
index 71e2cc00f988d..d43665302c55a 100644
--- a/src/lightning/app/cli/react-ui-template/ui/package.json
+++ b/src/lightning/app/cli/react-ui-template/ui/package.json
@@ -24,7 +24,7 @@
     "@vitejs/plugin-react": "^1.0.7",
     "prettier": "^2.5.1",
     "typescript": "^4.5.4",
-    "vite": "^2.9.16"
+    "vite": "^2.9.17"
   },
   "main": "index.js",
   "license": "MIT"
diff --git a/src/lightning/app/cli/react-ui-template/ui/yarn.lock b/src/lightning/app/cli/react-ui-template/ui/yarn.lock
index 3ef6acfa43398..66458662bcdf5 100644
--- a/src/lightning/app/cli/react-ui-template/ui/yarn.lock
+++ b/src/lightning/app/cli/react-ui-template/ui/yarn.lock
@@ -1260,10 +1260,10 @@ update-browserslist-db@^1.0.4:
     escalade "^3.1.1"
     picocolors "^1.0.0"
 
-vite@^2.9.16:
-  version "2.9.16"
-  resolved "https://registry.yarnpkg.com/vite/-/vite-2.9.16.tgz#daf7ba50f5cc37a7bf51b118ba06bc36e97898e9"
-  integrity sha512-X+6q8KPyeuBvTQV8AVSnKDvXoBMnTx8zxh54sOwmmuOdxkjMmEJXH2UEchA+vTMps1xw9vL64uwJOWryULg7nA==
+vite@^2.9.17:
+  version "2.9.17"
+  resolved "https://registry.yarnpkg.com/vite/-/vite-2.9.17.tgz#6b770525e12fa2a2e3a0fa0d028d304f4f7dc7d4"
+  integrity sha512-XxcRzra6d7xrKXH66jZUgb+srThoPu+TLJc06GifUyKq9JmjHkc1Numc8ra0h56rju2jfVWw3B3fs5l3OFMvUw==
   dependencies:
     esbuild "^0.14.27"
     postcss "^8.4.13"

From b19c3a961c79028d7c39a4f1ff1c2df991406d1d Mon Sep 17 00:00:00 2001
From: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Date: Mon, 4 Mar 2024 15:09:04 +0100
Subject: [PATCH 05/10] ci: pin `pytest` for package doctests (#19567)

---
 .github/workflows/ci-pkg-install.yml | 2 +-
 requirements/doctests.txt            | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)
 create mode 100644 requirements/doctests.txt

diff --git a/.github/workflows/ci-pkg-install.yml b/.github/workflows/ci-pkg-install.yml
index 20530be8b017a..67a9b9f21b515 100644
--- a/.github/workflows/ci-pkg-install.yml
+++ b/.github/workflows/ci-pkg-install.yml
@@ -115,7 +115,7 @@ jobs:
           done
       - name: Install pytest doctest extension
         run: |
-          pip install -q "pytest-doctestplus>=0.9.0"
+          pip install -q -r requirements/doctests.txt
           pip list
 
       - name: DocTest package
diff --git a/requirements/doctests.txt b/requirements/doctests.txt
new file mode 100644
index 0000000000000..703f221660c70
--- /dev/null
+++ b/requirements/doctests.txt
@@ -0,0 +1,2 @@
+pytest ==7.4.0
+pytest-doctestplus ==1.0.0

From 527d071f4934e8a28239669316ba11d7e104cc79 Mon Sep 17 00:00:00 2001
From: Kashif Rasul <kashif.rasul@gmail.com>
Date: Mon, 4 Mar 2024 16:11:31 +0100
Subject: [PATCH 06/10] Bump bitsandbytes minimum version (#19520)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
---
 requirements/fabric/strategies.txt                     | 2 +-
 requirements/pytorch/extra.txt                         | 2 +-
 src/lightning/fabric/plugins/precision/bitsandbytes.py | 7 +++----
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/requirements/fabric/strategies.txt b/requirements/fabric/strategies.txt
index 0c7804183e393..6c302f21269e3 100644
--- a/requirements/fabric/strategies.txt
+++ b/requirements/fabric/strategies.txt
@@ -6,4 +6,4 @@
 # note: is a bug around 0.10 with `MPS_Accelerator must implement all abstract methods`
 #  shall be resolved by https://github.com/microsoft/DeepSpeed/issues/4372
 deepspeed >=0.8.2, <=0.9.3; platform_system != "Windows"  # strict
-bitsandbytes ==0.41.0  # strict
+bitsandbytes >=0.42.0,<0.43.0
diff --git a/requirements/pytorch/extra.txt b/requirements/pytorch/extra.txt
index 39e3ff61d4e00..55960d7cd11cb 100644
--- a/requirements/pytorch/extra.txt
+++ b/requirements/pytorch/extra.txt
@@ -8,4 +8,4 @@ hydra-core >=1.0.5, <1.4.0
 jsonargparse[signatures] >=4.27.5, <4.28.0
 rich >=12.3.0, <13.6.0
 tensorboardX >=2.2, <2.7.0  # min version is set by torch.onnx missing attribute
-bitsandbytes ==0.41.0  # strict
+bitsandbytes >=0.42.0,<0.43.0
diff --git a/src/lightning/fabric/plugins/precision/bitsandbytes.py b/src/lightning/fabric/plugins/precision/bitsandbytes.py
index 2816f16fdf478..12a0ac3998b6e 100644
--- a/src/lightning/fabric/plugins/precision/bitsandbytes.py
+++ b/src/lightning/fabric/plugins/precision/bitsandbytes.py
@@ -39,8 +39,7 @@
 
 log = logging.getLogger(__name__)
 
-# TODO: unpin after resolving the `quant_state` format breaking changes
-_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes==0.41.0")
+_BITSANDBYTES_AVAILABLE = RequirementCache("bitsandbytes>=0.42.0")
 
 
 class BitsandbytesPrecision(Precision):
@@ -344,7 +343,7 @@ def quantize(
         def to_empty(self, *, device: _DEVICE, recurse: bool = True) -> Self:
             if self.weight.dtype == torch.uint8:  # was quantized
                 # cannot init the quantized params directly
-                weight = torch.empty(self.weight.quant_state[1], device=device, dtype=torch.half)
+                weight = torch.empty(self.weight.quant_state.shape, device=device, dtype=torch.half)
             else:
                 weight = torch.empty_like(self.weight.data, device=device)
             device = torch.device(device)
@@ -366,7 +365,7 @@ def reset_parameters(self) -> None:
             linear_init_finished = isinstance(self.weight, bnb.nn.Params4bit)
             if linear_init_finished and self.weight.dtype == torch.uint8:  # was quantized
                 # cannot init the quantized params directly
-                weight = torch.empty(self.weight.quant_state[1], device=self.weight.device, dtype=torch.half)
+                weight = torch.empty(self.weight.quant_state.shape, device=self.weight.device, dtype=torch.half)
             else:
                 weight = self.weight.data
             torch.nn.init.kaiming_uniform_(weight, a=math.sqrt(5))

From b3c869f6366e3ea87770c78d41edde75c1b26ca5 Mon Sep 17 00:00:00 2001
From: awaelchli <aedu.waelchli@gmail.com>
Date: Mon, 4 Mar 2024 16:13:31 +0100
Subject: [PATCH 07/10] Revise checkpoint consolidation with PyTorch 2.3
 (#19561)

---
 .../utilities/consolidate_checkpoint.py       |  6 +-
 src/lightning/fabric/utilities/load.py        | 69 ++++---------------
 .../strategies/test_fsdp_integration.py       |  6 +-
 .../utilities/test_consolidate_checkpoint.py  |  8 +--
 tests/tests_fabric/utilities/test_load.py     | 22 ------
 tests/tests_pytorch/strategies/test_fsdp.py   |  3 +-
 6 files changed, 23 insertions(+), 91 deletions(-)

diff --git a/src/lightning/fabric/utilities/consolidate_checkpoint.py b/src/lightning/fabric/utilities/consolidate_checkpoint.py
index b41e8f8a1312e..15d20d8d89ecc 100644
--- a/src/lightning/fabric/utilities/consolidate_checkpoint.py
+++ b/src/lightning/fabric/utilities/consolidate_checkpoint.py
@@ -4,7 +4,7 @@
 
 import torch
 
-from lightning.fabric.utilities.imports import _TORCH_GREATER_EQUAL_2_1
+from lightning.fabric.utilities.imports import _TORCH_GREATER_EQUAL_2_3
 from lightning.fabric.utilities.load import _METADATA_FILENAME, _load_distributed_checkpoint
 
 _log = logging.getLogger(__name__)
@@ -38,8 +38,8 @@ def _parse_cli_args() -> Namespace:
 
 
 def _process_cli_args(args: Namespace) -> Namespace:
-    if not _TORCH_GREATER_EQUAL_2_1:
-        _log.error("Processing distributed checkpoints requires PyTorch >= 2.1.")
+    if not _TORCH_GREATER_EQUAL_2_3:
+        _log.error("Processing distributed checkpoints requires PyTorch >= 2.3.")
         exit(1)
 
     checkpoint_folder = Path(args.checkpoint_folder)
diff --git a/src/lightning/fabric/utilities/load.py b/src/lightning/fabric/utilities/load.py
index bab8e29823903..29ccca9e4375f 100644
--- a/src/lightning/fabric/utilities/load.py
+++ b/src/lightning/fabric/utilities/load.py
@@ -16,7 +16,7 @@
 from functools import partial
 from io import BytesIO
 from pathlib import Path
-from typing import IO, TYPE_CHECKING, Any, Callable, Dict, Optional, OrderedDict, Sequence, Set, Tuple, Union
+from typing import IO, TYPE_CHECKING, Any, Callable, Dict, Optional, OrderedDict, Sequence, Set, Union
 
 import torch
 from lightning_utilities.core.apply_func import apply_to_collection
@@ -27,8 +27,7 @@
 
 from lightning.fabric.utilities.imports import (
     _TORCH_GREATER_EQUAL_2_0,
-    _TORCH_GREATER_EQUAL_2_1,
-    _TORCH_GREATER_EQUAL_2_2,
+    _TORCH_GREATER_EQUAL_2_3,
 )
 from lightning.fabric.utilities.types import _PATH, _Stateful
 
@@ -243,38 +242,20 @@ def _load_distributed_checkpoint(checkpoint_folder: Path) -> Dict[str, Any]:
     The current implementation assumes that the entire checkpoint fits in CPU memory.
 
     """
-    if not _TORCH_GREATER_EQUAL_2_1:
-        raise ImportError("Processing distributed checkpoints requires PyTorch >= 2.1.")
+    if not _TORCH_GREATER_EQUAL_2_3:
+        raise ImportError("Processing distributed checkpoints requires PyTorch >= 2.3.")
 
     from torch.distributed.checkpoint import FileSystemReader
-    from torch.distributed.checkpoint.metadata import BytesStorageMetadata, TensorStorageMetadata
+    from torch.distributed.checkpoint.format_utils import _EmptyStateDictLoadPlanner
+    from torch.distributed.checkpoint.state_dict_loader import _load_state_dict
 
-    if _TORCH_GREATER_EQUAL_2_2:
-        from torch.distributed.checkpoint import load
-    else:
-        from torch.distributed.checkpoint import load_state_dict as load  # deprecated
-
-    reader = FileSystemReader(checkpoint_folder)
-    metadata = reader.read_metadata()
-
-    # TODO: Add sequential save to avoid storing the entire checkpoint in memory
     checkpoint: Dict[str, Any] = {}
-    for tensor_name, sd_metadata in metadata.state_dict_metadata.items():
-        if isinstance(sd_metadata, BytesStorageMetadata):
-            checkpoint[tensor_name] = "<bytes_io>"
-        elif isinstance(sd_metadata, TensorStorageMetadata):
-            checkpoint[tensor_name] = torch.empty(
-                size=sd_metadata.size,
-                dtype=sd_metadata.properties.dtype,
-                device=torch.device("cpu"),
-                memory_format=sd_metadata.properties.memory_format,
-                layout=sd_metadata.properties.layout,
-                requires_grad=sd_metadata.properties.requires_grad,
-                pin_memory=sd_metadata.properties.pin_memory,
-            )
-
-    load(state_dict=checkpoint, storage_reader=reader, no_dist=True)
-    checkpoint = _unflatten_dict(checkpoint, key_map=metadata.planner_data)
+    _load_state_dict(
+        checkpoint,
+        storage_reader=FileSystemReader(checkpoint_folder),
+        planner=_EmptyStateDictLoadPlanner(),
+        no_dist=True,
+    )
 
     # This is the extra file saved by Fabric, with user data separate from weights and optimizer states
     extra_file = checkpoint_folder / _METADATA_FILENAME
@@ -282,29 +263,3 @@ def _load_distributed_checkpoint(checkpoint_folder: Path) -> Dict[str, Any]:
     checkpoint.update(extra)
 
     return checkpoint
-
-
-def _unflatten_dict(checkpoint: Dict[str, Any], key_map: Dict[str, Tuple[str, ...]]) -> Dict[str, Any]:
-    """Converts the flat dictionary with keys 'x.y.z...' to a nested dictionary using the provided key map.
-
-    Args:
-        checkpoint: The flat checkpoint dictionary.
-        key_map: A dictionary that maps the keys in flattened format 'x.y.z...' to a tuple representing
-            the index path into the nested dictonary that this function should construct.
-
-    """
-    assert checkpoint.keys() == key_map.keys()
-    converted: Dict[str, Any] = {}
-    for flat_key in checkpoint:
-        key_path = key_map[flat_key]
-        _set_nested_dict_value(converted, key_path, checkpoint[flat_key])
-    return converted
-
-
-def _set_nested_dict_value(nested_dict: Dict[str, Any], key_path: Tuple[str, ...], value: Any) -> None:
-    result = nested_dict
-    for key in key_path[:-1]:
-        if key not in result:
-            result[key] = {}
-        result = result[key]
-    result[key_path[-1]] = value
diff --git a/tests/tests_fabric/strategies/test_fsdp_integration.py b/tests/tests_fabric/strategies/test_fsdp_integration.py
index cae84957d1e66..4a971294a326d 100644
--- a/tests/tests_fabric/strategies/test_fsdp_integration.py
+++ b/tests/tests_fabric/strategies/test_fsdp_integration.py
@@ -621,8 +621,7 @@ def test_clip_gradients(clip_type, precision):
     optimizer.zero_grad()
 
 
-# TODO: Support checkpoint consolidation with PyTorch >= 2.2
-@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.1.0", max_torch="2.2.0")
+@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.3.0")
 def test_save_sharded_and_consolidate_and_load(tmp_path):
     """Test the consolidation of a FSDP-sharded checkpoint into a single file."""
 
@@ -639,7 +638,8 @@ def test_save_sharded_and_consolidate_and_load(tmp_path):
     state = {"model": model, "optimizer": optimizer, "steps": 1}
 
     # run one iteration to init the state of the optimizer
-    model(torch.rand(1, 32, device=fabric.device)).sum().backward()
+    loss = model(torch.rand(1, 32, device=fabric.device)).sum()
+    fabric.backward(loss)
     optimizer.step()
 
     checkpoint_path_sharded = fabric.broadcast(str(tmp_path / "checkpoint_sharded"))
diff --git a/tests/tests_fabric/utilities/test_consolidate_checkpoint.py b/tests/tests_fabric/utilities/test_consolidate_checkpoint.py
index 16feb3d3c1014..216b77e6b9299 100644
--- a/tests/tests_fabric/utilities/test_consolidate_checkpoint.py
+++ b/tests/tests_fabric/utilities/test_consolidate_checkpoint.py
@@ -39,15 +39,15 @@ def test_parse_cli_args(args, expected):
 
 
 def test_process_cli_args(tmp_path, caplog, monkeypatch):
-    # PyTorch version < 2.1
-    monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_1", False)
+    # PyTorch version < 2.3
+    monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_3", False)
     with caplog.at_level(logging.ERROR, logger="lightning.fabric.utilities.consolidate_checkpoint"), pytest.raises(
         SystemExit
     ):
         _process_cli_args(Namespace())
-    assert "requires PyTorch >= 2.1." in caplog.text
+    assert "requires PyTorch >= 2.3." in caplog.text
     caplog.clear()
-    monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_1", True)
+    monkeypatch.setattr(lightning.fabric.utilities.consolidate_checkpoint, "_TORCH_GREATER_EQUAL_2_3", True)
 
     # Checkpoint does not exist
     checkpoint_folder = Path("does/not/exist")
diff --git a/tests/tests_fabric/utilities/test_load.py b/tests/tests_fabric/utilities/test_load.py
index eb534bf1cdca1..574f8bf36247b 100644
--- a/tests/tests_fabric/utilities/test_load.py
+++ b/tests/tests_fabric/utilities/test_load.py
@@ -19,7 +19,6 @@
     _materialize_tensors,
     _move_state_into,
     _NotYetLoadedTensor,
-    _unflatten_dict,
 )
 
 from tests_fabric.helpers.runif import RunIf
@@ -145,24 +144,3 @@ def load_state_dict(self, state_dict):
     assert source == {}
     assert destination["cocofruit"] == 2
     assert destination["banana"].count == 100
-
-
-def test_unflatten_dict():
-    assert _unflatten_dict({}, {}) == {}
-
-    tensor0 = torch.rand(2, 2)
-    tensor1 = torch.tensor(3.0)
-    data = {
-        "model.layer.weight": tensor0,
-        "optimizer.state.layer.weight.exp_avg": {"test": tensor1},
-        "optimizer.param_groups": "param_groups",
-    }
-    key_map = {
-        "model.layer.weight": ("model", "layer.weight"),
-        "optimizer.state.layer.weight.exp_avg": ("optimizer", "state", "layer.weight", "exp_avg"),
-        "optimizer.param_groups": ("optimizer", "param_groups"),
-    }
-    assert _unflatten_dict(data, key_map) == {
-        "model": {"layer.weight": tensor0},
-        "optimizer": {"state": {"layer.weight": {"exp_avg": {"test": tensor1}}}, "param_groups": "param_groups"},
-    }
diff --git a/tests/tests_pytorch/strategies/test_fsdp.py b/tests/tests_pytorch/strategies/test_fsdp.py
index f5513f49fd82c..eaba32a7839d5 100644
--- a/tests/tests_pytorch/strategies/test_fsdp.py
+++ b/tests/tests_pytorch/strategies/test_fsdp.py
@@ -1013,8 +1013,7 @@ def _run_setup_assertions(empty_init, expected_device):
         _run_setup_assertions(empty_init=True, expected_device=torch.device("cpu"))
 
 
-# TODO: Support checkpoint consolidation with PyTorch >= 2.2
-@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.1.0", max_torch="2.2.0")
+@RunIf(min_cuda_gpus=2, standalone=True, min_torch="2.3.0")
 def test_save_sharded_and_consolidate_and_load(tmp_path):
     """Test the consolidation of a FSDP-sharded checkpoint into a single file."""
 

From 7c7f0ee7433abc36e380e8253a6014d1ce69ca1a Mon Sep 17 00:00:00 2001
From: PL Ghost <75324987+pl-ghost@users.noreply.github.com>
Date: Mon, 4 Mar 2024 21:20:18 +0100
Subject: [PATCH 08/10] Adding test for legacy checkpoint created with 2.2.1
 (#19570)

---
 tests/legacy/back-compatible-versions.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/legacy/back-compatible-versions.txt b/tests/legacy/back-compatible-versions.txt
index d150189111304..1d8e1abccfdd1 100644
--- a/tests/legacy/back-compatible-versions.txt
+++ b/tests/legacy/back-compatible-versions.txt
@@ -97,3 +97,4 @@
 2.1.2
 2.1.3
 2.2.0.post0
+2.2.1

From eb24a9ac6589e7cef996a793a0aa197ee8bda3ef Mon Sep 17 00:00:00 2001
From: awaelchli <aedu.waelchli@gmail.com>
Date: Mon, 4 Mar 2024 21:44:51 +0100
Subject: [PATCH 09/10] Update changelog after 2.2.1 release (#19571)

---
 src/lightning/fabric/CHANGELOG.md  |  9 ++++++++-
 src/lightning/pytorch/CHANGELOG.md | 15 +++++++++++----
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/lightning/fabric/CHANGELOG.md b/src/lightning/fabric/CHANGELOG.md
index 0cec850ed3483..4c1814bf9f11a 100644
--- a/src/lightning/fabric/CHANGELOG.md
+++ b/src/lightning/fabric/CHANGELOG.md
@@ -46,13 +46,20 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Fixed
 
-- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446))
+-
 
 -
 
 -
 
 
+## [2.2.1] - 2024-03-04
+
+### Fixed
+
+- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446))
+
+
 ## [2.2.0] - 2024-02-08
 
 ### Added
diff --git a/src/lightning/pytorch/CHANGELOG.md b/src/lightning/pytorch/CHANGELOG.md
index 5074eaeef49fa..cf3772c8596c2 100644
--- a/src/lightning/pytorch/CHANGELOG.md
+++ b/src/lightning/pytorch/CHANGELOG.md
@@ -42,17 +42,24 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Fixed
 
-- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446))
+-
 
+-
 
-- Fixed the divisibility check for `Trainer.accumulate_grad_batches` and `Trainer.log_every_n_steps` in ThroughputMonitor ([#19470](https://github.com/Lightning-AI/lightning/pull/19470))
+-
 
+-
 
-- Fixed support for Remote Stop and Remote Abort with NeptuneLogger ([#19130](https://github.com/Lightning-AI/pytorch-lightning/pull/19130))
 
+## [2.2.1] - 2024-03-04
 
-- Fixed infinite recursion error in precision plugin graveyard ([#19542](https://github.com/Lightning-AI/pytorch-lightning/pull/19542))
 
+### Fixed
+
+- Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually ([#19446](https://github.com/Lightning-AI/lightning/pull/19446))
+- Fixed the divisibility check for `Trainer.accumulate_grad_batches` and `Trainer.log_every_n_steps` in ThroughputMonitor ([#19470](https://github.com/Lightning-AI/lightning/pull/19470))
+- Fixed support for Remote Stop and Remote Abort with NeptuneLogger ([#19130](https://github.com/Lightning-AI/pytorch-lightning/pull/19130))
+- Fixed infinite recursion error in precision plugin graveyard ([#19542](https://github.com/Lightning-AI/pytorch-lightning/pull/19542))
 
 
 ## [2.2.0] - 2024-02-08

From b871f7a826db015b8b909cf86302c6c4cbf18ec1 Mon Sep 17 00:00:00 2001
From: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Date: Mon, 4 Mar 2024 21:51:19 +0100
Subject: [PATCH 10/10] docs: switch NGC link to Nemo (#19568)

---
 docs/source-pytorch/ecosystem/asr_nlp_tts.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source-pytorch/ecosystem/asr_nlp_tts.rst b/docs/source-pytorch/ecosystem/asr_nlp_tts.rst
index af7fd05af709a..5f0ea44117662 100644
--- a/docs/source-pytorch/ecosystem/asr_nlp_tts.rst
+++ b/docs/source-pytorch/ecosystem/asr_nlp_tts.rst
@@ -86,7 +86,7 @@ To install from a local clone of NeMo:
     ./reinstall.sh # from cloned NeMo's git root
 
 For Docker users, the NeMo container is available on
-`NGC <https://ngc.nvidia.com/catalog/containers/nvidia:nemo>`_.
+`NGC <https://catalog.ngc.nvidia.com/orgs/nvidia/collections/nemotrainingframework>`_.
 
 .. code-block:: bash