-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error when trying to train raindrop classification on multiple gpu #147
Comments
Hi there 👋, Thank you so much for your attention to PyPOTS! If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community. I have received your message and will respond ASAP. Thank you for your patience! 😃 Best, |
Hey Max, thank you for reporting this issue! Please allow me to make a confirmation with you first, Raindrop can run smoothly with a single GPU on your machine but it failed when on multiple GPUs. Right? |
Yes, that's right. On one GPU it runs without errors (when I just set device='cuda'). Thank's for your answer and your work :) |
I just pushed a commit to branch |
Yes, I began to work on one approach to solve it but this is perfect, I
will try it out and give you feedback.
Currently I am on vacation so I will respond you next week.
Thanks a lot for your work!
Wenjie Du ***@***.***> schrieb am Fr. 30. Juni 2023 um 14:29:
… I just pushed a commit to branch fix_raindrop to fix this bug. I've
tested on my local machine. Could you please try it as well and then give
me your feedback? Please first install code from the given branch with the
command pip install
https://github.com/WenjieDu/PyPOTS/archive/fix_raindrop.zip then run your
test.
—
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5DIWSLKK5ZFCXIL766UROLXN3BDZANCNFSM6AAAAAAZQP7FHM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Many thanks. After confirming it works well for you, I'll merge PR #149 into the |
Hi Max, did you have a chance to give it a shot? |
Yes, came back yesterday. I did multiple tests and all seems to work fine.
Excellent work 👍
Wenjie Du ***@***.***> schrieb am Di. 4. Juli 2023 um 07:19:
… Hi Max, did you have a chance to give it a shot?
—
Reply to this email directly, view it on GitHub
<#147 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5DIWSPUZRSBZJUFZK2TRBDXOORXVANCNFSM6AAAAAAZQP7FHM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Great! Thanks for your reply. Will merge this PR. |
1. System Info + Information
system info: torch 2.0.1, pypots 0.1.1 - gpu: 8x RTX 4090
problem: when training the raindrop model as usual i wanted to make use of all my gpus. I did everything as in the documentation but got the following error after changing the device variable to a list-
Thanks a lot!
2. Reproduction
raindrop = Raindrop( n_steps = X.shape[1], n_features = X.shape[2], ... num_workers = 8, ... device = ['cuda:0', 'cuda:1'], ... )
4. Expected behavior
no error
The text was updated successfully, but these errors were encountered: