Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook madvise when using patcher memory hooks #4026

Merged
merged 2 commits into from
Aug 7, 2017
Merged

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 4, 2017

It is not possible to use the patcher based memory hooks without
hooking madvise (MADV_DONTNEED). This commit updates the patcher
memory hooks to always hook madvise. This should be safe with recent
rcache updates.

References #3685. Close when merged into v2.0.x, v2.x, and v3.0.x.

hjelmn added 2 commits August 4, 2017 13:26
The current VMA cache implementation backing rcache/grdma can run into
a deadlock situation in multi-threaded code when madvise is hooked and
the c library uses locks. In this case we may run into the following
situation:

Thread 1:

    ...
    free ()           <- Holding libc lock
    madvice_hook ()
    vma_iteration ()  <- Blocked waiting for vma lock

Thread 2:
    ...
    vma_insert ()     <- Holding vma lock
    vma_item_new ()
    malloc ()         <- Blocked waiting for libc lock

To fix this problem we chose to remove the madvise () hook but that
fix is causing issue open-mpi#3685. This commit aims to greatly reduce the
chance that the deadlock will be hit by putting vma items into a free
list. This moves the allocation outside the vma lock. In general there
are a relatively small number of vma items so the default is to
allocate 2048 vma items. This default is configurable but it is likely
the number is too large not too small.

Signed-off-by: Nathan Hjelm <[email protected]>
It is not possible to use the patcher based memory hooks without
hooking madvise (MADV_DONTNEED). This commit updates the patcher
memory hooks to always hook madvise. This should be safe with recent
rcache updates.

References open-mpi#3685. Close when merged into v2.0.x, v2.x, and v3.0.x.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member Author

hjelmn commented Aug 4, 2017

:bot:retest:

@hjelmn hjelmn merged commit 8137623 into open-mpi:master Aug 7, 2017
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch lacks of any type of documentation. This is a critical area, one where we already had a lot of issues, and having a decent level of documentation is a necessity.

@@ -131,6 +132,17 @@ mca_rcache_base_vma_item_t *mca_rcache_base_vma_new (mca_rcache_base_vma_module_
return vma_item;
}

/* NTH: this function MUST not allocate or deallocate memory */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know thatOBJ_RELEASE is not putting the refcount to 0 (and free the memory) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should make it clear. It is impossible (due to the lock) for the OBJ_RELEASE in that loop to be the last release on an object. I will add a comment. As for the documentation I did it through the commit message. I do no intend to let the VMA tree exist in its current form for much longer. It has serious threading performance issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. Sorry about that. The comment was on a function that was deleted before the final version of the patch. Will eliminate the incorrect comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjelmn Does this mean you'll have a follow-on commit here on master, and then add that commit to #4041 and #4042?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsquyres Yes. Will open a PR to clear the erroneous comment now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants