Add quick docs for deepspeed infinity (#8323)

Lightning-AI · Jul 7, 2021 · 01f594b · 01f594b
1 parent fc12fe7
commit 01f594b
Showing 1 changed file with 39 additions and 1 deletion.
diff --git a/docs/source/advanced/advanced_gpu.rst b/docs/source/advanced/advanced_gpu.rst
@@ -416,11 +416,49 @@ DeepSpeed ZeRO Stage 3 Offloads optimizer state, gradients to the host CPU to re
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DeepSpeedPlugin(stage=3, cpu_offload=True, cpu_offload_params=True),
+        plugins=DeepSpeedPlugin(
+            stage=3,
+            offload_optimizer=True,
+            offload_parameters=True,
+        ),
+        precision=16
+    )
+    trainer.fit(model)
+
+
+DeepSpeed Infinity (NVMe Offloading)
+""""""""""""""""""""""""""""""""""""
+
+Additionally, DeepSpeed supports offloading to NVMe drives for even larger models, utilizing the large memory space found in NVMes. DeepSpeed `reports <https://www.microsoft.com/en-us/research/blog/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training/>`__ the ability to fine-tune 1 Trillion+ parameters using NVMe Offloading on one 8 GPU machine. Below shows how to enable this, assuming the NVMe drive is mounted in a directory called ``/local_nvme``.
+
+.. code-block:: python
+
+    from pytorch_lightning import Trainer
+    from pytorch_lightning.plugins import DeepSpeedPlugin
+
+    # Enable CPU Offloading
+    model = MyModel()
+    trainer = Trainer(gpus=4, plugins='deepspeed_stage_3_offload', precision=16)
+    trainer.fit(model)
+
+    # Enable CPU Offloading, and offload parameters to CPU
+    model = MyModel()
+    trainer = Trainer(
+        gpus=4,
+        plugins=DeepSpeedPlugin(
+            stage=3,
+            offload_optimizer=True,
+            offload_parameters=True,
+            remote_device='nvme',
+            offload_params_device='nvme',
+            offload_optimizer_device='nvme',
+            nvme_path = '/local_nvme'
+        ),
         precision=16
     )
     trainer.fit(model)
 
+When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/microsoft/DeepSpeed/issues/998>`__ for more information on how to parse the information.
 
 .. _deepspeed-activation-checkpointing: