-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osc/pt2pt multithread fixes #2615
Conversation
* The `opal_list_remove_first` is protected by the accumulation lock not the module lock. So make sure to use the same lock to protect the `opal_list_append` to prevent current access, which can cause a segv and/or lost enties in the list. Signed-off-by: Joshua Hursey <[email protected]>
* The `ompi_osc_pt2pt_sync_pscw_peer` call needs ot be protected when called, and the `sync` structure need to be protected in the `start` function. - This prevents the `ompi_osc_pt2pt_sync_pscw_peer` function from processing while `start` is also in progress on the same window in two different threads (e.g., `progress` and `main` thread) - This seems to happen when the 'main' thread is part way through the `start` function then the progress thread starts processing the `post` message received from another peer for this window. Both functions try to access the `peer_list` portion of the structure and a NULL is stepped on in the `ompi_osc_pt2pt_sync_pscw_peer` function. - This patch locks the `sync` structure for the duration of the `start` and ensures that whenever `ompi_osc_pt2pt_sync_pscw_peer` is called the caller is holding the `sync->lock`. This provides exclusivity to this structure ensuring that the latter function sees a fully updated structure. Signed-off-by: Joshua Hursey <[email protected]>
@jjhursey I am addressed these issues. Looks like I forgot to change some things when I added the synchronization structure. Will have a PR up soon. |
@jjhursey Sorry it took so long for me to look at this. The commits here helped me narrow down some of the more egregious errors... Did I ever mention I intensely dislike active target? We really need to remove that from the standard. What an odd concept. |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/ac6a1f2f8a43992f5432c6f872195c8d |
This PR has grown stale. We can reopen if we need it later. @markalle If the problem is still present let's make sure there is an Issue to track progress. |
Fixes for Issue #2614
Currently a work in progress.