-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepspeed enhancements and fixes #676
deepspeed enhancements and fixes #676
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all those fixes!
Hmm, I tried to remove the fake dataloader workaround that was discussed at https://discuss.huggingface.co/t/when-using-deepspeed-why-do-i-need-to-pass-dataloaders-to-the-accelerator-prepare/22432 and it's super cumbersome. It appears that the only way to get to the batch size is from dataloader? why can't it be derived from a Specifically to this PR:
To remind, the intention of creating I was trying to remove the originally used workaround
Should it perhaps say:
or something of a kind? otherwise the feature is there but nobody knows about it. The API doc could also say that, except it's private so there is no documentation. Thank you for reading. |
Hello Stas,
Would the alternative suggestion work?
|
Hi Sourab! It appears that Perhaps there should be another wrapper that a user should call explicitly for deepspeed with args like bs early in the code, so that no
That would definitely work. My first reaction is that suggestion could potentially be much more problematic should the user set the value in the ds config file and it might be an unexpected override (even though if written correctly it should be the same value). Somehow this feels like replacing one hack with another hack. I think the dummy dataset wrapped dataloader is a much cleaner way over the above, especially if the code isn't necessarily always using the deepspeed backend. If this is the best that can be done, and there is no simpler way, let's just leave it as is. |
There is already a way to do this
|
should |
Hello @stas00 , in the example above, it is the user code, I was just mentioning/showcasing that |
Thank you for clarifying, @pacman100! It's crystal clear now. |
What does this PR do?
backward
, fixing the related bug.