-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX SW 396203] check launch kernel grid size not beyond 32bit integer #2263
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure bindings are just a suggestion but making Get***XdlopsNHWCConfigLargestTile()
dynamically configurable through a data size can improve code quality and should be addressed.
@@ -875,6 +897,19 @@ bool ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC::IsApplicable( | |||
miopen::GetTypeSize(problem.GetInDataType()))) | |||
return false; | |||
|
|||
{ | |||
auto largest_config = problem.IsFp32() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If GetFwdXdlopsNHWCConfigLargestTile
would depend on data size, we can simplify this code to
auto largest_config = GetFwdXdlopsNHWCConfigLargestTile(dataSize);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like described above, not exactly depending on data size.
@@ -683,6 +698,13 @@ bool PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC::IsValid( | |||
if(problem.IsFp16() && gemm_k_global_split != 0 && vector_store != 1 && splits_4G > 1) | |||
return false; | |||
|
|||
size_t current_block_size, current_grid_size, current_splits_4G; | |||
std::tie(current_block_size, current_grid_size, current_splits_4G) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have got C++17 available, structure binding seems to be less wordy:
auto [current_block_size, current_grid_size, current_splits_4G] = GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);
vs
size_t current_block_size, current_grid_size, current_splits_4G;
std::tie(current_block_size, current_grid_size, current_splits_4G) = GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);
But it doesn't allow to use std::ignore.
: (problem.IsFp16() ? GetFwdXdlopsNHWCConfigLargestTileFp16() | ||
: GetFwdXdlopsNHWCConfigLargestTileBf16()); | ||
size_t current_block_size, current_grid_size, current_splits_4G; | ||
std::tie(current_block_size, current_grid_size, current_splits_4G) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using structure binding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Notice] @CAHEK7 @carlushuang I am still wondering why we use tuples as return types, which is error prone (sensitive to the order) and to enforces us to use comments to keep the code understandable (like this):
static std::tuple<size_t, // block_size
size_t, // grid_size
size_t> // splits_4G
GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(
const ProblemDescription& problem,
const PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC& config);
instead of structures that contain members with "normal" names.
Wrt binding -- it is better sometimes, but looks as another crutch to me.
size_t block_size; | ||
size_t grid_size; | ||
|
||
int splits_4G; | ||
|
||
std::tie(kernel_name, block_size, grid_size, splits_4G) = | ||
GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(ctx, problem, config); | ||
std::tie(block_size, grid_size, splits_4G) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using structure binding here.
Shouldn't you add a test if it is meant to fix a bug? |
@carlushuang could you deal with the above review comments? For test, I suggest very large tensor and force MISA kernels. |
#2263) * check 32bit launch size
#2263) * check 32bit launch size
I second this comment whole-heartedly :) |
Indeed, we do not check the grid size, this is known potential problem. Perhaps we have to systematically apply this kind of check to all existing solvers. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
[Note] It seems a little questionable whether it's better to explicitly keep copies of the "largest configs" (as done in this PR) or just keep an index of the "largest config". Both approaches have drawbacks.
* [FIX SW 396203] check launch kernel grid size not beyond 32bit integer (#2263) * check 32bit launch size * Revert "Do not fail the generic search if n_runs_total is zero; turns warnings into infos (#2266)" This reverts commit 6795a81. * Patch half.hpp file location reorg (#2275) * [Tuning][MI100][MI210][MI250] Gold18 (#2264) * gold18 db update, remove detectron2 configs to allow miopen heuristic * remove invalid performance configs --------- Co-authored-by: carlushuang <[email protected]> Co-authored-by: Jun Liu <[email protected]>
Our HIP launch kernel API support launch grid size no larger than 32bit integer (4294967295), otherwise HIP runtime will through exception.
We need to check this number and set inside
isApplicable()
/isValid()