-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed the model size calculation using ByteCounter
#10123
Conversation
Hey @rohitgr7, I believe we should also have a version for DeepSpeed FSDP which reduces the number of bytes across processes. Best, |
will have a look. maybe in a separate one? |
@tchaton couldn't find a way to support this for Deepspeed/FSDP. Discussed this with @SeanNaren too and seems like its right not practically possible since weights are sharded and are not kept in a readable format. I'd suggest for now we can merge this for v1.5 and later we can explore more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Could you add it to the class that it's not supported by DeepSpeed and Sharded plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Codecov Report
@@ Coverage Diff @@
## master #10123 +/- ##
========================================
- Coverage 92% 92% -1%
========================================
Files 181 182 +1
Lines 16408 17108 +700
========================================
+ Hits 15175 15700 +525
- Misses 1233 1408 +175 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! can we add an issue around FSDP/DeepSpeed support for summarization of model size? As @tchaton mentioned it is possible, just will require some thought/implementation
ByteCounter
to calculate model sizeByteCounter
What does this PR do?
Fixes #10074
all credits to @ruro :)
Tests are already there for
get_model_size_mb
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃