Skip to content

Commit

Permalink
Add quick docs for deepspeed infinity (#8323)
Browse files Browse the repository at this point in the history
  • Loading branch information
Sean Naren authored Jul 7, 2021
1 parent fc12fe7 commit 01f594b
Showing 1 changed file with 39 additions and 1 deletion.
40 changes: 39 additions & 1 deletion docs/source/advanced/advanced_gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -416,11 +416,49 @@ DeepSpeed ZeRO Stage 3 Offloads optimizer state, gradients to the host CPU to re
model = MyModel()
trainer = Trainer(
gpus=4,
plugins=DeepSpeedPlugin(stage=3, cpu_offload=True, cpu_offload_params=True),
plugins=DeepSpeedPlugin(
stage=3,
offload_optimizer=True,
offload_parameters=True,
),
precision=16
)
trainer.fit(model)
DeepSpeed Infinity (NVMe Offloading)
""""""""""""""""""""""""""""""""""""

Additionally, DeepSpeed supports offloading to NVMe drives for even larger models, utilizing the large memory space found in NVMes. DeepSpeed `reports <https://www.microsoft.com/en-us/research/blog/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training/>`__ the ability to fine-tune 1 Trillion+ parameters using NVMe Offloading on one 8 GPU machine. Below shows how to enable this, assuming the NVMe drive is mounted in a directory called ``/local_nvme``.

.. code-block:: python
from pytorch_lightning import Trainer
from pytorch_lightning.plugins import DeepSpeedPlugin
# Enable CPU Offloading
model = MyModel()
trainer = Trainer(gpus=4, plugins='deepspeed_stage_3_offload', precision=16)
trainer.fit(model)
# Enable CPU Offloading, and offload parameters to CPU
model = MyModel()
trainer = Trainer(
gpus=4,
plugins=DeepSpeedPlugin(
stage=3,
offload_optimizer=True,
offload_parameters=True,
remote_device='nvme',
offload_params_device='nvme',
offload_optimizer_device='nvme',
nvme_path = '/local_nvme'
),
precision=16
)
trainer.fit(model)
When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/microsoft/DeepSpeed/issues/998>`__ for more information on how to parse the information.

.. _deepspeed-activation-checkpointing:

Expand Down

0 comments on commit 01f594b

Please sign in to comment.