-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing horace mex and disentangling various mex options. #1754
Conversation
…rm.m and parallel_config.m # Conflicts: # herbert_core/classes/MPIFramework/@ClusterSlurm/ClusterSlurm.m
1ee8f78
to
95ae066
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one query about mpi labelling as herbert
horace_core/admin/check_horace_mex.m
Outdated
% function checks if horace mex files are compiled correctly and returns | ||
% their version string. | ||
% | ||
% Usage: | ||
% >>[rez, n_errors] = check_horace_mex(); | ||
% >>[rez, n_errors, can_use_mex_4_combine] = check_horace_mex(); | ||
% >>[rez, n_errors, can_use_mex_4_combine,can_use_herbert_mpi] = check_horace_mex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why herbert is appearing here in the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because parallel computing toolbox uses Matlab MPI and Herbert may use Poor man MPI which is not an MPI but pretends to be such (writes/reads/deletes files pretending to be MPI messages). And may use cpp_communicator
which supports proper MPI implementation compiled for Herbert. The last option, is validated here.
@@ -296,14 +296,20 @@ | |||
if use | |||
% Configure mex usage | |||
% -------------------- | |||
[~, n_errors, can_combine_with_mex] = check_horace_mex(); | |||
[~, n_errors, can_combine_with_mex,can_use_herbert_mpi] = check_horace_mex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above unclear why herbert is appearing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be gave this variable better name
@@ -1,17 +1,29 @@ | |||
function [rez, n_errors, can_use_mex_4_combine] = check_horace_mex() | |||
function [rez, n_errors, can_use_mex_4_combine,can_use_custom_MPI] = check_horace_mex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is custom
here? What makes it custom
? Do you mean system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom means created by us, not provided by Matlab.
@@ -491,7 +491,13 @@ | |||
function obj = set.parallel_threads(obj,n_threads) | |||
n_threads = floor(n_threads); | |||
n_workers = get_or_restore_field(obj, 'parallel_workers_number'); | |||
n_poss_threads = floor(obj.n_cores/n_workers); | |||
if ispc % let's assume pc is Intel so threades are double number of cores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean "assume it's intel"? Do you mean it has hyperthreading? You should be careful
a) because hyperthreading may be captured in n_cores
or it may not. Sometimes you have hyperthreading mapped as a logical (but not a physical) processor other times it may just be a hidden feature handled internally as in some ARM systems.
b) Because hyperthreading may not always be advantageous to use if you're already saturating the processors with work and can result in slowdown.
if ispc % let's assume pc is Intel so threades are double number of cores | ||
% this estimeat will not be too wrong even if it is AMD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ispc % let's assume pc is Intel so threades are double number of cores | |
% this estimeat will not be too wrong even if it is AMD, | |
if ispc % let's assume pc is Intel so threads are double number of cores | |
% this estimate will not be too wrong even if it is AMD, |
admin/horace_mex_mpi.m
Outdated
@@ -5,7 +5,7 @@ | |||
% Manually modify this script to specify the mpi libraries location in your | |||
% system. | |||
% | |||
use_her_mpich = false; | |||
use_her_mpich = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing defaults here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I was trying to build cpp_communicator
using the library, we provide with Herbert
and it worked. Probably it is a bit easier to build cpp_communicator
with the libraties we provide with Herbert, than install external MPI on a system and then build cpp_communicator
(use_her_mpich=false
-- look for MPI installed on a system)
There are actually little difference between these two options as compiling cpp_communicator
is a simpler task than making it to work, which I will try to explain in a fresh ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As from Slack earlier: two things. One, if it needs that much explanation, then it should be documented in the code. Please add a substantial comment saying what is going on there. Two, I don't understand your explanation. It sounds more as if there was a temporary glitch in the build process rather than something fundamental that requires this change. Please can you write a paragraph for comments and run them by us. If need be we can use the sprint review slot tomorrow to discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are generally not advisable to describe in words the things the code is doing. It is usually clear enough from the code itself. You should describe the meanings if input/ouptut variables and describe intentions. All this is present here.
This requests violates this principle, but it is not difficult to add what the code is doing adding couple of sentences, so I am adding it in commit 4205739
end | ||
if ~can_use_custom_mpi | ||
pc = parallel_config; | ||
if ismember(pc.parallel_cluster,{'mpiexec_mpi','slurm_mpi'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if these two are present in parallel_cluster
would it default to herbert? What is this if
statement actually checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because ~can_use_custom_mpi
means that cpp_communicator
does not work. These two frameworks use cpp_communicator
as the transport, so they are not avaliable if it does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing with Jacob that the issues he comments on need explaining. Not adding any changes required myself.
I think this needs a more comprehensive explanation going back to first principles; the changes and comments assume the reader knows something about these issues, which may not be the case. Please work out how to tell this story in the code.
…(but related to the alignment ticket) and made to push the build with changed integration scripts
Fixed Re #1753.
These are small changes necessary for using simple mex and MPI mex independently.
also fixed horace_mex to be able to help users without
cmake
should be dealt with after Re #1752 is merged