-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #1359] hipModuleLoad/hipModuleUnload is leaking file descriptor #2223
Comments
One quick workaround that comes to mind is to replace |
I think the "leak" can be legitimate due to the lazy binding of executable code. Suppose you have a dll, and you loaded one symbol from it for use. Then could the dll be ever fully unloaded? No, because the lazy loader cannot guarantee the symbol might decide to jump at a random address of that dll. Therefore, a single lazily-loaded symbol will hold the dll file descriptor forever. |
Thanks @dmikushin for the quick reply!
This is a regression from ROCm 5.4.3 and begin to manifest in ROCm 5.5, I guess something indeed has changed in |
Fix from hip runtime/ROCr is proposed and should be merged soon. |
ROCm 5.6.1 should have it fixed, and all release thereafter, like ROCm 5.7 etc |
(Thanks to @JehandadKhan to find the issue and to create the minimal reproduce example), creating this issue on GitHub on his behalf)
During MIOpen exhaustive tuning (e.g. MIOPEN_FIND_ENFORCE=3), MIOpen needs to open a large number of modules, even though we proactively close the ones not been used (e.g. #1221), the program still fails with
This issue was previously discussed in #1359 #1360
Using the minimal reproducible example below:
module_api.zip
(need hip enabled docker, in this case any public docker should be able to reproduce the issue)
ROCm version where this issue can be reproduced:
ROCm 5.5
cannot reproduce in ROCm 5.4.3
and we will see
The test code is very simple, essentially:
however, using the following monitoring command:
we can observe the # of FD that was left open (leaked) keep increasing till the HIP program crashes.
hipModuleUnload
leaking FD?ulimit
, so is there any HIP limitation on the # of FD?CC: @atamazov @averinevg @DrizztDoUrden @dmikushin @CAHEK7
The text was updated successfully, but these errors were encountered: