-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hook madvise when using patcher memory hooks #4026
Conversation
The current VMA cache implementation backing rcache/grdma can run into a deadlock situation in multi-threaded code when madvise is hooked and the c library uses locks. In this case we may run into the following situation: Thread 1: ... free () <- Holding libc lock madvice_hook () vma_iteration () <- Blocked waiting for vma lock Thread 2: ... vma_insert () <- Holding vma lock vma_item_new () malloc () <- Blocked waiting for libc lock To fix this problem we chose to remove the madvise () hook but that fix is causing issue open-mpi#3685. This commit aims to greatly reduce the chance that the deadlock will be hit by putting vma items into a free list. This moves the allocation outside the vma lock. In general there are a relatively small number of vma items so the default is to allocate 2048 vma items. This default is configurable but it is likely the number is too large not too small. Signed-off-by: Nathan Hjelm <[email protected]>
It is not possible to use the patcher based memory hooks without hooking madvise (MADV_DONTNEED). This commit updates the patcher memory hooks to always hook madvise. This should be safe with recent rcache updates. References open-mpi#3685. Close when merged into v2.0.x, v2.x, and v3.0.x. Signed-off-by: Nathan Hjelm <[email protected]>
:bot:retest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch lacks of any type of documentation. This is a critical area, one where we already had a lot of issues, and having a decent level of documentation is a necessity.
@@ -131,6 +132,17 @@ mca_rcache_base_vma_item_t *mca_rcache_base_vma_new (mca_rcache_base_vma_module_ | |||
return vma_item; | |||
} | |||
|
|||
/* NTH: this function MUST not allocate or deallocate memory */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know thatOBJ_RELEASE is not putting the refcount to 0 (and free the memory) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should make it clear. It is impossible (due to the lock) for the OBJ_RELEASE in that loop to be the last release on an object. I will add a comment. As for the documentation I did it through the commit message. I do no intend to let the VMA tree exist in its current form for much longer. It has serious threading performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait. Sorry about that. The comment was on a function that was deleted before the final version of the patch. Will eliminate the incorrect comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsquyres Yes. Will open a PR to clear the erroneous comment now.
It is not possible to use the patcher based memory hooks without
hooking madvise (MADV_DONTNEED). This commit updates the patcher
memory hooks to always hook madvise. This should be safe with recent
rcache updates.
References #3685. Close when merged into v2.0.x, v2.x, and v3.0.x.