-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks: process still remain in the back if even the code is finished. #2590
Comments
The following reproduces can be sure to happen the memory leaks in my code (100%) for your referece:
Each running will remain a process that would not be exited: My enviroment:
I'm not sure that the reason from pytorch or lightning. |
Does it also happen with |
Hi, I have tried the different backends settings:
In addition, if |
I found this in the code:
this does not look right. LightningModule also hase a self.device attribute, these calls could cause the data left on the wrong device and maybe cause the memory leak? |
Just I have closed another issue about how to put new tensors into the right device : #2585
Here |
there is no attribute |
@Borda Hi, |
I can confirm this is still an issue. I also run into it very often when I kill ddp training. The problem is that the kill signal (like keyboard interrupt for example) is not sent to the children processes in ddp, and they keep running. |
@awaelchli Thanks for your great effort, and it's indeed a critical issue. |
Hi!
I'm new to Lightning, and have experienced one day. However, I found that there are some critical issues, especially in multi-gpu in Memory Leaks:
(1) Even if the code finished and exited, the process is still in the background.
(2) After I kill those process manually one by one, there seems still some processes occupying the GPU memories: for example:
BTW, there are some other issues in multi-gpu settings.
The text was updated successfully, but these errors were encountered: