Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Argobots support #4421

Merged
merged 5 commits into from
Apr 13, 2020

Conversation

shintaro-iwasaki
Copy link
Contributor

@shintaro-iwasaki shintaro-iwasaki commented Mar 30, 2020

Pull Request Description

This PR fixes Argobots support.

  1. Fix mpl/include/mpl_thread_argobots.h
    This change will not affect the default Pthreads behavior.

  2. Call MPID_Thread_init() in MPI_T_init_thread()
    Any MPID_Thread_ operations should be called after MPID_Thread_init(). This will not affect the default Pthreads behavior since the Pthreads version does nothing in MPID_Thread_init().

    • Note MPI_T_init_thread() can be called before MPI_Init_thread(), so MPI_T_init_thread() needs its own MPID_Thread_init().
  3. Add yield() to a busy loop in test/mpi/threads/comm/idup_deadlock
    Nonpreemptive threads (e.g., Argobots threads) can cause a deadlock if there is a busy loop. This PR adds a new yield function (MTest_thread_yield()) and inserts it into a busy loop in test/mpi/threads/comm/idup_deadlock.
    This will not affect the default Pthreads behavior since the Pthreads version does nothing in MTest_thread_yield().

Note

Regarding 3. MTest_thread_yield(), there are many options.

  1. Add yield
    1.a. [What this PR does] Add MTest_thread_yield(), which calls yield() only when the underlying thread is nonpreemptive.
    1.b. Add MTest_thread_scheduling_point() (instead of MTest_thread_yield()). The functionality is the same as that of 1.a.
    1.c. Add MTest_thread_yield(), which calls yield() even with Pthreads (e.g., pthread_yield()).
  2. Add another synchronization method that has a scheduling point when the underlying thread is nonpreemptive (e.g., MTest_thread_lock()).

Expected Impact

No performance difference for the default Pthreads version.

Author Checklist

  • Reference appropriate issues (with "Fixes" or "See" as appropriate)
    Fixes test/mpich/threads: MTest_init_thread_pkg() is not called. #4422.
  • Remove xfail from the test suite when fixing a test
  • Commits are self-contained and do not do two things at once
  • Commit message is of the form: module: short description and follows good practice
  • Passes whitespace checkers
  • Passes warning tests
  • Passes all tests
  • Add comments such that someone without knowledge of the code could understand
  • Add Devel Docs in the doc/ directory for any new code design

@shintaro-iwasaki
Copy link
Contributor Author

test:jenkins/ch3/tcp

@shintaro-iwasaki
Copy link
Contributor Author

test:jenkins/ch4/argobots

@shintaro-iwasaki
Copy link
Contributor Author

test:jenkins/ch4/ofi

@shintaro-iwasaki
Copy link
Contributor Author

shintaro-iwasaki commented Mar 30, 2020

The Argobots test failure is caused by an mtest issue. I created a new issue #4422
Once #4422 is solved, it should work well; at least the Argobots version works in my testing environment.

@@ -1192,6 +1192,8 @@ extern MPID_Thread_mutex_t mpi_t_mutex;
do { \
int err_; \
MPIR_T_THREAD_CHECK_BEGIN \
MPID_Thread_init(&err_); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can argobot init be called multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is okay to call it multiple times (internally there's a refcount).
If it wouldn't be the case, MPL_thread_init() will take care of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concern is mentioned in src/mpl/include/mpl_thread_argobots.h.

@shintaro-iwasaki
Copy link
Contributor Author

I added test: use refcount to initialize/finalize Argobots runtime, which addresses #4422 by adopting the solution 0 (removing MTest_init_thread_pkg() from Argobots).

@shintaro-iwasaki
Copy link
Contributor Author

test:jenkins/ch4/argobots

@shintaro-iwasaki
Copy link
Contributor Author

test:jenkins/ch4/argobots
test:jenkins/ch4/ofi

/* Reference counter */

static void MTest_abt_incr_refcount(void);
static void MTest_abt_decr_refcount(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is only included in mtest_thread.c, but it is concerning to have static implementation inside a .h header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of coding practice, I believe non-static/inline definitions should be avoided since it often causes a "multiple definition" error.

In this specific case, static means these functions cannot be called by individual tests.


static void MTest_abt_incr_refcount(void)
{
MTest_abt_acquire_spinlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it work without the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock initialization might happen in created threads, so this spinlock is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is beyond simple. It adds the lazy initialization logic, then the concern of checking and initialization inside the spawned threads, then the concern of using spin-lock, and then the concern of atomics. I much prefer solution #2 at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I got it. I will adopt solution #2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed by #4429, so this PR no longer contains it.

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots
test:mpich/ch4/ofi

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots

1 similar comment
@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots

When multithreaded MPI is enabled, MPI_T_init_thread() needs to initialize a
mutex, mpi_t_mutex.  This mutex should be created after initializing MPL/thread,
which is especially necessary when the underlying threads are not Pthreads
(e.g., Argobots threads).

This patch adds initialization and finalization of MPL/thread in
MPI_T_init_thread() and MPI_T_finalize().
When underlying threads are nonpreemptive (e.g., Argobots threads), a busy loop
can cause a deadlock.  This patch fixes a busy loop in threads/idup_deadlock by
adding MTest_thread_yield(), a new function that is translated into yield when
threads are nonpreemptive.
The default stack size of Argobots threads is too small and can cause a stack
overflow bug.  This patch fixes it.
@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots
test:mpich/ch3/most
test:mpich/ch4/most

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots
test:mpich/ch3/most
test:mpich/ch4/most

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots
test:mpich/ch3/most
test:mpich/ch4/most

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/argobots

@shintaro-iwasaki
Copy link
Contributor Author

test:mpich/ch4/ofi

@shintaro-iwasaki
Copy link
Contributor Author

Finally all checks passed. @hzhou could you please take a look at this PR when you have time?

@hzhou
Copy link
Contributor

hzhou commented Apr 12, 2020

Finally all checks passed. @hzhou could you please take a look at this PR when you have time?

Sure, I'll review it tomorrow. Thank you for the effort.

hzhou
hzhou previously approved these changes Apr 13, 2020
Copy link
Contributor

@hzhou hzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a minor comment. I am approving it. There is no need to rerun the tests if you just amend the comment.

@@ -101,6 +101,10 @@ int main(int argc, char *argv[])
#endif /* USE_THREADS */

#ifdef USE_THREADS
/* Initialize the thread package. It should not be initialized via
* MTest_Init_thread since MTest_Finalize cannot be called. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because both MTest_Init_thread and MTest_Finalize are not called rather than cannot be called, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will change the comment. Thanks a lot for reviewing this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hzhou I fixed the comment.

th_taskmaster.c uses MTest thread functions for forking and joining threads, but
the thread package is not initialized.  This patch fixes it.

Note that we cannot explicitly call MTest_Init_thread() since the corresponding
MTest_Finalize() prints "No Errors" as many as spawned MPI processes, breaking
this test.
@shintaro-iwasaki shintaro-iwasaki merged commit a5865d0 into pmodels:master Apr 13, 2020
@@ -41,6 +41,7 @@ MTEST_THREAD_RETURN_TYPE test_comm_dup(void *arg)
wait = 0;
for (i = 0; i < NUM_THREADS; i++)
wait += start_idup[i];
MTest_thread_yield();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shintaro-iwasaki MTest_thread_yield() is not in a header file that is included in idup_deadlock.c, causing a compiler warning. Could you fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I missed it. #4503 should fix it. Thanks!

shintaro-iwasaki added a commit to shintaro-iwasaki/mpich that referenced this pull request May 6, 2020
shintaro-iwasaki added a commit to shintaro-iwasaki/mpich that referenced this pull request May 15, 2020
This patch adds MTest_thread_yield() to tests that I missed in
pmodels#4421.

Note this does not affect the program behavior if OS-level threads (Pthreads)
are used because MTest_thread_yield() is mapped to no-op.
shintaro-iwasaki added a commit to shintaro-iwasaki/mpich that referenced this pull request May 18, 2020
This patch adds MTest_thread_yield() to tests that I missed in
pmodels#4421.

Note this does not affect the program behavior if OS-level threads (Pthreads)
are used because MTest_thread_yield() is mapped to no-op.
@shintaro-iwasaki shintaro-iwasaki deleted the FixArgobotsSupport branch October 30, 2020 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test/mpich/threads: MTest_init_thread_pkg() is not called.
2 participants