Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP integration #6152

Closed
wants to merge 77 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
78f1eb4
Add initial FSDP integration
Feb 23, 2021
c36e00a
Fix error in refactor
Feb 23, 2021
59dbb83
update
tchaton Feb 24, 2021
19a1440
Revert "update"
Feb 24, 2021
3b38615
Address reviews
Feb 24, 2021
5ff06ab
Fix doc string
Feb 24, 2021
36434f0
Even moar code review
Feb 24, 2021
c61a190
Add deprecation
Feb 24, 2021
1c4f011
Merge branch 'master' into feat/fsdp
Feb 25, 2021
02599e6
Fix name of test
Feb 25, 2021
e79977a
Integrate nesting, fix bugs across implementation
Mar 1, 2021
d15d4b5
Merge branch 'master' into feat/fsdp
Mar 2, 2021
ebf1818
Formatting types
Mar 2, 2021
290e8fd
Add additional tests for accelerator model
Mar 2, 2021
5c5f762
Fix import
Mar 2, 2021
d28438b
Few test fixes, expose params
Mar 3, 2021
ab591a8
Allow training_type_plugin to delay optimizer configure
Mar 3, 2021
23ccdb8
Merge branch 'feat/fsdp_2n' into feat/fsdp
Mar 3, 2021
a60f2c0
Add missing references to trainer, add a CPU accelerator based test
Mar 3, 2021
3d4e6df
Merge branch 'feat/fsdp_2n' into feat/fsdp
Mar 4, 2021
516bd04
Update for latest API changes to fairscale
Mar 9, 2021
9f8864f
Add base hook for model parallel
Mar 23, 2021
eac5344
fix callback signature
kaushikb11 Mar 25, 2021
32df0cb
Simplify hook
Mar 25, 2021
282a133
Add hook logic
Mar 25, 2021
7a94e72
add tests
kaushikb11 Mar 25, 2021
8091481
add property setter
kaushikb11 Mar 25, 2021
633fc77
add logic for being called once
kaushikb11 Mar 25, 2021
c99a36f
Update changelog
kaushikb11 Mar 25, 2021
a68c8d7
Merge branch 'master' into feat/model_parallel_hook
kaushikb11 Mar 25, 2021
9529a22
Fix
kaushikb11 Mar 25, 2021
3c1c782
fix return type
kaushikb11 Mar 25, 2021
7daba43
Merge branch 'master' into feat/fsdp
Mar 25, 2021
87ec222
Fix property name
Mar 25, 2021
966b2e5
Merge branch 'feat/model_parallel_hook' into feat/fsdp
Mar 25, 2021
5f6e039
Updaet wrapper, use latest fixes for hooks
Mar 25, 2021
b512e72
Swap hook order
Mar 25, 2021
8ba82df
Merge branch 'master' into feat/fsdp
Mar 29, 2021
1e5ca37
Small changes
Mar 29, 2021
936dc1a
Fixes
Mar 29, 2021
a6de18e
Remove activation checkpointing
Apr 1, 2021
8684f94
Turn off auto wrap by default
Apr 1, 2021
76091ae
Move to trainer.model
Apr 7, 2021
226d498
fix reference
Apr 7, 2021
cd63c10
Merge branch 'master' into feat/fsdp
Apr 7, 2021
b881e2f
Remove flag
Apr 7, 2021
e8959be
Fix imports
Apr 7, 2021
52478ac
Fix versions, update docs
Apr 7, 2021
b7f1896
Fix clip gradients
Apr 8, 2021
a62f8d8
Merge branch 'master' into feat/fsdp
Apr 10, 2021
69c33f1
Merge branch 'master' into feat/fsdp
Apr 14, 2021
9fa26c0
Fixes
Apr 14, 2021
56f23ce
pull
Apr 14, 2021
9ca3f0c
Few changes across the board
Apr 14, 2021
b53ba36
Fix imports
Apr 14, 2021
0da5249
Set none
Apr 14, 2021
90c6479
Swap to warnings
Apr 14, 2021
69d8178
Remove fairscale from container
Apr 14, 2021
a459d10
pull
Apr 14, 2021
a7842d9
Update dockers/base-cuda/Dockerfile
Apr 14, 2021
48ee83f
Add defaults, add test to ensure nested wrapper is set correctly
Apr 15, 2021
57a696c
Remove deprecation as this will be removed completely
Apr 15, 2021
36889b8
Check for nested FSDP wrappers, and omit wrapping algorithm
Apr 16, 2021
89b8cb5
Merge branch 'master' into feat/fsdp
Apr 16, 2021
0c1d2de
Update pytorch_lightning/trainer/connectors/accelerator_connector.py
Apr 21, 2021
592bb28
Address code review points
Apr 21, 2021
4e230c9
Merge branch 'master' into feat/fsdp
Apr 26, 2021
ca8e586
Add back missing model that was removed from clipping signature
Apr 26, 2021
54f501d
Do not pass model through, accelerator does it
Apr 26, 2021
02925cc
Merge branch 'master' into feat/fsdp
Apr 27, 2021
b67f1a9
Fix merge
Apr 27, 2021
132eb64
Fix imports
Apr 27, 2021
e6ce3cf
Changes to precision plugin
Apr 27, 2021
01153af
Require 2 GPU for multi gpu test
Apr 27, 2021
6cfe57d
Merge branch 'master' into feat/fsdp
May 2, 2021
efa81ab
Use callback in test, swap to DynamicLossScaler from fairscale to tes…
May 4, 2021
78d52b5
Disable loss scaler for now
May 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'master' into feat/fsdp
  • Loading branch information
SeanNaren committed Apr 7, 2021
commit cd63c1037bc06f6b80f78675c09695837c1fd739
3 changes: 1 addition & 2 deletions pytorch_lightning/accelerators/accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,8 +322,7 @@ def clip_gradients(
gradient_clip_algorithm: GradClipAlgorithmType = GradClipAlgorithmType.NORM,
) -> None:
"""clips all the optimizer parameters to the given value"""

self.precision_plugin.clip_gradients(self.model, optimizer, clip_val)
self.precision_plugin.clip_gradients(self.model, optimizer, clip_val, gradient_clip_algorithm)

def on_train_epoch_end(self, outputs: Sequence[_STEP_OUTPUT_TYPE]) -> None:
"""Hook to do something on the end of an training epoch
Expand Down
6 changes: 5 additions & 1 deletion pytorch_lightning/plugins/precision/deepspeed_precision.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,11 @@ def backward(
return closure_loss

def clip_gradients(
self, model: Any, optimizer: 'Optimizer', clip_val: Union[int, float], norm_type: float = 2.0
self,
model: 'LightningModule',
optimizer: 'Optimizer',
clip_val: Union[int, float],
gradient_clip_algorithm: GradClipAlgorithmType = GradClipAlgorithmType.NORM,
) -> None:
"""
DeepSpeed handles clipping gradients via the training type plugin.
Expand Down
8 changes: 6 additions & 2 deletions pytorch_lightning/plugins/precision/precision_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,13 @@ def post_optimizer_step(self, optimizer: 'Optimizer', optimizer_idx: int) -> Non
"""Hook to do something after each optimizer step."""

def clip_gradients(
self, model: Any, optimizer: 'Optimizer', clip_val: Union[int, float], norm_type: float = 2.0
self,
model: 'LightningModule',
optimizer: 'Optimizer',
clip_val: Union[int, float],
gradient_clip_algorithm: GradClipAlgorithmType = GradClipAlgorithmType.NORM,
) -> None:
"""Clips the gradients to a specific value"""
"""Clips the gradients"""
if clip_val is None:
return

Expand Down
9 changes: 5 additions & 4 deletions pytorch_lightning/plugins/precision/sharded_native_amp.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,11 @@ def __init__(self) -> None:
super().__init__()
self.scaler = ShardedGradScaler()

def clip_gradients(
self, model: Any, optimizer: 'Optimizer', clip_val: Union[int, float], norm_type: float = 2.0
def clip_grad_by_norm(
self,
optimizer: 'Optimizer',
clip_val: Union[int, float],
norm_type: float = 2.0
) -> None:
if clip_val <= 0:
return
optimizer = cast(OSS, optimizer)
optimizer.clip_grad_norm(clip_val, norm_type=norm_type)
3 changes: 0 additions & 3 deletions pytorch_lightning/plugins/training_type/ddp.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,9 +270,6 @@ def init_ddp_connection(self, global_rank: int, world_size: int) -> None:
torch_distrib.init_process_group(self.torch_distributed_backend, rank=global_rank, world_size=world_size)

def pre_dispatch(self):
if self.sync_batchnorm:
self.model = self.configure_sync_batchnorm(self.model)

if self.move_to_device_in_prefetch:
# move the model to the correct device
self.model_to_device()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,8 @@ def use_dp(self) -> bool:
def use_ddp(self) -> bool:
return self._distrib_type in (
DistributedType.DDP, DistributedType.DDP_SPAWN, DistributedType.DDP_SHARDED,
DistributedType.DDP_SHARDED_SPAWN, DistributedType.FULLY_SHARDED, DistributedType.DEEPSPEED
DistributedType.DDP_SHARDED_SPAWN, DistributedType.FULLY_SHARDED, DistributedType.DEEPSPEED,
DistributedType.TPU_SPAWN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me what use_ddp stands for at this point with all these distributed types supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the primary usage is knowing whether to use a distributed sampler, but this logic should ideally be re-written to be a property of the training type plugin. I was hoping to get to that in #6090

)

@property
Expand Down
2 changes: 2 additions & 0 deletions requirements/extra.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ torchtext>=0.5
onnxruntime>=1.3.0
hydra-core>=1.0
fairscale>=0.3.2
jsonargparse[signatures]>=3.3.1
deepspeed>=0.3.13
You are viewing a condensed version of this merge commit. You can view the full changes here.