Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing horace mex and disentangling various mex options. #1754

Merged
merged 11 commits into from
Oct 21, 2024
Merged

Conversation

abuts
Copy link
Member

@abuts abuts commented Sep 27, 2024

Fixed Re #1753.

These are small changes necessary for using simple mex and MPI mex independently.

also fixed horace_mex to be able to help users without cmake

should be dealt with after Re #1752 is merged

@abuts abuts changed the title fixing horace mex and dientangling various mex options. fixing horace mex and disentangling various mex options. Sep 27, 2024
Copy link
Collaborator

@cmarooney-stfc cmarooney-stfc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one query about mpi labelling as herbert

% function checks if horace mex files are compiled correctly and returns
% their version string.
%
% Usage:
% >>[rez, n_errors] = check_horace_mex();
% >>[rez, n_errors, can_use_mex_4_combine] = check_horace_mex();
% >>[rez, n_errors, can_use_mex_4_combine,can_use_herbert_mpi] = check_horace_mex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear why herbert is appearing here in the description

Copy link
Member Author

@abuts abuts Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because parallel computing toolbox uses Matlab MPI and Herbert may use Poor man MPI which is not an MPI but pretends to be such (writes/reads/deletes files pretending to be MPI messages). And may use cpp_communicator which supports proper MPI implementation compiled for Herbert. The last option, is validated here.

@@ -296,14 +296,20 @@
if use
% Configure mex usage
% --------------------
[~, n_errors, can_combine_with_mex] = check_horace_mex();
[~, n_errors, can_combine_with_mex,can_use_herbert_mpi] = check_horace_mex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above unclear why herbert is appearing here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be gave this variable better name

@abuts abuts requested a review from cmarooney-stfc October 2, 2024 12:52
@@ -1,17 +1,29 @@
function [rez, n_errors, can_use_mex_4_combine] = check_horace_mex()
function [rez, n_errors, can_use_mex_4_combine,can_use_custom_MPI] = check_horace_mex()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is custom here? What makes it custom? Do you mean system?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom means created by us, not provided by Matlab.

@@ -491,7 +491,13 @@
function obj = set.parallel_threads(obj,n_threads)
n_threads = floor(n_threads);
n_workers = get_or_restore_field(obj, 'parallel_workers_number');
n_poss_threads = floor(obj.n_cores/n_workers);
if ispc % let's assume pc is Intel so threades are double number of cores
Copy link
Collaborator

@oerc0122 oerc0122 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean "assume it's intel"? Do you mean it has hyperthreading? You should be careful
a) because hyperthreading may be captured in n_cores or it may not. Sometimes you have hyperthreading mapped as a logical (but not a physical) processor other times it may just be a hidden feature handled internally as in some ARM systems.
b) Because hyperthreading may not always be advantageous to use if you're already saturating the processors with work and can result in slowdown.

Comment on lines 494 to 495
if ispc % let's assume pc is Intel so threades are double number of cores
% this estimeat will not be too wrong even if it is AMD,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ispc % let's assume pc is Intel so threades are double number of cores
% this estimeat will not be too wrong even if it is AMD,
if ispc % let's assume pc is Intel so threads are double number of cores
% this estimate will not be too wrong even if it is AMD,

@@ -5,7 +5,7 @@
% Manually modify this script to specify the mpi libraries location in your
% system.
%
use_her_mpich = false;
use_her_mpich = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing defaults here?

Copy link
Member Author

@abuts abuts Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I was trying to build cpp_communicator using the library, we provide with Herbert and it worked. Probably it is a bit easier to build cpp_communicator with the libraties we provide with Herbert, than install external MPI on a system and then build cpp_communicator (use_her_mpich=false -- look for MPI installed on a system)

There are actually little difference between these two options as compiling cpp_communicator is a simpler task than making it to work, which I will try to explain in a fresh ticket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As from Slack earlier: two things. One, if it needs that much explanation, then it should be documented in the code. Please add a substantial comment saying what is going on there. Two, I don't understand your explanation. It sounds more as if there was a temporary glitch in the build process rather than something fundamental that requires this change. Please can you write a paragraph for comments and run them by us. If need be we can use the sprint review slot tomorrow to discuss it.

Copy link
Member Author

@abuts abuts Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are generally not advisable to describe in words the things the code is doing. It is usually clear enough from the code itself. You should describe the meanings if input/ouptut variables and describe intentions. All this is present here.

This requests violates this principle, but it is not difficult to add what the code is doing adding couple of sentences, so I am adding it in commit 4205739

end
if ~can_use_custom_mpi
pc = parallel_config;
if ismember(pc.parallel_cluster,{'mpiexec_mpi','slurm_mpi'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why if these two are present in parallel_cluster would it default to herbert? What is this if statement actually checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because ~can_use_custom_mpi means that cpp_communicator does not work. These two frameworks use cpp_communicator as the transport, so they are not avaliable if it does not work.

Copy link
Collaborator

@cmarooney-stfc cmarooney-stfc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with Jacob that the issues he comments on need explaining. Not adding any changes required myself.

I think this needs a more comprehensive explanation going back to first principles; the changes and comments assume the reader knows something about these issues, which may not be the case. Please work out how to tell this story in the code.

@abuts abuts merged commit 8f71d35 into master Oct 21, 2024
5 checks passed
@abuts abuts deleted the 1753_horace_mex branch October 21, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants