Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deepspeed doc] install issues + 1-gpu deployment #9582

Merged
merged 3 commits into from
Jan 14, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Jan 14, 2021

This PR extends the DeepSpeed/FairScale integration documentation to:

  • add extensive general troubleshooting for CUDA-extensions (applies to fairscale, deepspeed, apex or any other python pytorch extension with CUDA C++ code) - these are very likely to be encountered by our users - all notes are based on my first hand encounters with these issues - 2 of which I run into yesterday while trying to build fairscale and deepspeed on Sylvain's hardware which he let me use to run the recent benchmarks. so I figured others are likely to have similar issues and neither fairscale nor deepspeed have these documented anywhere.
  • adds deployment for 1 gpu DeepSpeed notes
  • reformats sub-headers so that it's easier to link to specific sections

@sgugger

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks for writing down the fixes to the issues you fixed, that's helpful!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for expanding the doc! Just left some nits (mostly about the correct case of some names).

stas00 and others added 2 commits January 14, 2021 10:00
Co-authored-by: Lysandre Debut <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
@stas00
Copy link
Contributor Author

stas00 commented Jan 14, 2021

Thank you for your awesome suggestions and tweaks - all done.

@stas00 stas00 merged commit 82498cb into huggingface:master Jan 14, 2021
@stas00 stas00 deleted the ds-docs2 branch January 14, 2021 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants