Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tacho - team size adjustment for cuda 11 and cuda 10 #9192

Merged
merged 1 commit into from
May 29, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1726,7 +1726,18 @@ namespace Tacho {
const ordinal_type half_level = _nlevel/2;
//const ordinal_type team_size_factor[2] = { 64, 16 }, vector_size_factor[2] = { 8, 8};
//const ordinal_type team_size_factor[2] = { 16, 16 }, vector_size_factor[2] = { 32, 32};
#if defined (CUDA_VERSION)
#if (11000 > CUDA_VERSION)
/// cuda 11.1 below
const ordinal_type team_size_factor[2] = { 32, 64 }, vector_size_factor[2] = { 8, 4};
#else
/// cuda 11.1 and higher
const ordinal_type team_size_factor[2] = { 64, 64 }, vector_size_factor[2] = { 8, 4};
#endif
#else
/// not cuda ... whatever..
const ordinal_type team_size_factor[2] = { 64, 64 }, vector_size_factor[2] = { 8, 4};
#endif
const ordinal_type team_size_update[2] = { 16, 8 }, vector_size_update[2] = { 32, 32};
{
typedef TeamFunctor_FactorizeLDL<supernode_info_type> functor_type;
Expand Down Expand Up @@ -1848,7 +1859,18 @@ namespace Tacho {
#endif
// this should be considered with average problem sizes in levels
const ordinal_type half_level = _nlevel/2;
#if defined (CUDA_VERSION)
#if (11000 > CUDA_VERSION)
/// cuda 11.1 below
const ordinal_type team_size_solve[2] = { 32, 16 }, vector_size_solve[2] = { 8, 8};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that lines 1865 and 1868 are identical. Was this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intended. This reminds me that that 1) I can change (or specialize) the team size for cuda11 and cuda11 max team size setting is different from previous generations, 2) the best solve performance for a particular test problem (100k symmetric indefinite complex problem) is obtained from this team size. When I test the code with the newest cuda later, I can still play with it without messing up the team setting for the previous generation cuda.

#else
/// cuda 11.1 and higher
const ordinal_type team_size_solve[2] = { 32, 16 }, vector_size_solve[2] = { 8, 8};
#endif
#else
/// not cuda whatever...
const ordinal_type team_size_solve[2] = { 64, 16 }, vector_size_solve[2] = { 8, 8};
#endif
const ordinal_type team_size_update[2] = { 128, 32}, vector_size_update[2] = { 1, 1};
{
typedef TeamFunctor_SolveLowerLDL<supernode_info_type> functor_type;
Expand Down