Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set smarter default for DDP sharded for performance optimization #6937

Merged

Conversation

shuyingsunshine21
Copy link
Contributor

@shuyingsunshine21 shuyingsunshine21 commented Apr 9, 2021

What does this PR do?

Fixes #6992

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Shuying Sun and others added 30 commits March 23, 2021 12:06
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
…oint_consolidate

Update test_all_gather_grad.py
…1-checkpoint_consolidate"

This reverts commit c5053da, reversing
changes made to 0d23d75.
This reverts commit 70fe5da.
This reverts commit a9aae99.
Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@@ -42,6 +47,12 @@ def _reinit_optimizers_with_oss(self):
if not isinstance(optimizer, OSS):
optim_class = type(optimizer)
zero_optimizer = OSS(params=optimizer.param_groups, optim=optim_class, **optimizer.defaults)
if _FAIRSCALE_OSS_FP16_BROADCAST_AVAILABLE:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carmocca carmocca changed the title [DRAFT]Set smarter default for DDP sharded for performance optimization [WIP] Set smarter default for DDP sharded for performance optimization Apr 13, 2021
@shuyingsunshine21 shuyingsunshine21 marked this pull request as ready for review April 13, 2021 21:07
@ananthsub
Copy link
Contributor

i think the tests are failing since reduce_buffer_size isn't available based on the fairscale version lightning has pinned for CI. @SeanNaren is this PR blocked until we deprecate the ddp sequential plugin and use the latest fairscale pypi version for testing?

@SeanNaren
Copy link
Contributor

i think the tests are failing since reduce_buffer_size isn't available based on the fairscale version lightning has pinned for CI. @SeanNaren is this PR blocked until we deprecate the ddp sequential plugin and use the latest fairscale pypi version for testing?

I'm going to make a PR to update to the latest fairscale version, and start the deprecation!

@SeanNaren SeanNaren mentioned this pull request Apr 14, 2021
11 tasks
@ananthsub ananthsub added this to the 1.3 milestone Apr 17, 2021
@ananthsub
Copy link
Contributor

this should be good to go now that #7017 is merged?

@mergify mergify bot removed the has conflicts label Apr 26, 2021
@pep8speaks
Copy link

pep8speaks commented Apr 26, 2021

Hello @shuyingsunshine21! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-26 20:25:46 UTC

Shuying Sun added 2 commits April 26, 2021 12:55
CHANGELOG.md Outdated
@@ -144,7 +144,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Changed warnings and recommendations for dataloaders in `ddp_spawn` ([#6762](https://github.com/PyTorchLightning/pytorch-lightning/pull/6762/))


- `pl.seed_everyting` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))
- `pl.seed_eveing` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `pl.seed_eveing` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))
- `pl.seed_everything` will now also set the seed on the `DistributedSampler` ([#7024](https://github.com/PyTorchLightning/pytorch-lightning/pull/7024))

# For multi-node training, compressing the model shards in fp16 before broadcasting
# improves performance. When using PyTorch AMP, it will not degrade
# the model performance.
zero_optimizer.broadcast_fp16 = is_fp16 and self.num_nodes > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this? It being True, for 16bit precision and multi node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this? It being True, for 16bit precision and multi node.

was wondering do we have multi nodes testing example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if I should add one if multi-node testing is currently disabled. Maybe i could add in the same file for now (which might be easier for re-enable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that should work. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we are in process of adding multi-node back...

# For multi-node training, compressing the model shards in fp16 before broadcasting
# improves performance. When using PyTorch AMP, it will not degrade
# the model performance.
zero_optimizer.broadcast_fp16 = is_fp16 and self.num_nodes > 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we are in process of adding multi-node back...

@SeanNaren
Copy link
Contributor

Might be overkill for now, but maybe a way to mock the multi-node for now for one GPU could be a potential stop gap here, but haven't delved too deep into how this would look!

@kaushikb11 kaushikb11 changed the title [WIP] Set smarter default for DDP sharded for performance optimization Set smarter default for DDP sharded for performance optimization Apr 26, 2021
@kaushikb11 kaushikb11 merged commit 52a5cee into Lightning-AI:master Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance Optimization for DDP sharded
9 participants