Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX SW 396203] check launch kernel grid size not beyond 32bit integer #2263

Merged
merged 3 commits into from
Jul 25, 2023

Conversation

carlushuang
Copy link
Contributor

Our HIP launch kernel API support launch grid size no larger than 32bit integer (4294967295), otherwise HIP runtime will through exception.
We need to check this number and set inside isApplicable()/isValid()

@carlushuang carlushuang requested a review from junliume July 18, 2023 12:35
Copy link
Contributor

@CAHEK7 CAHEK7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure bindings are just a suggestion but making Get***XdlopsNHWCConfigLargestTile() dynamically configurable through a data size can improve code quality and should be addressed.

@@ -875,6 +897,19 @@ bool ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC::IsApplicable(
miopen::GetTypeSize(problem.GetInDataType())))
return false;

{
auto largest_config = problem.IsFp32()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GetFwdXdlopsNHWCConfigLargestTile would depend on data size, we can simplify this code to

 auto largest_config = GetFwdXdlopsNHWCConfigLargestTile(dataSize);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like described above, not exactly depending on data size.

@@ -683,6 +698,13 @@ bool PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC::IsValid(
if(problem.IsFp16() && gemm_k_global_split != 0 && vector_store != 1 && splits_4G > 1)
return false;

size_t current_block_size, current_grid_size, current_splits_4G;
std::tie(current_block_size, current_grid_size, current_splits_4G) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have got C++17 available, structure binding seems to be less wordy:

auto [current_block_size, current_grid_size, current_splits_4G] = GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);

vs

size_t current_block_size, current_grid_size, current_splits_4G;
std::tie(current_block_size, current_grid_size, current_splits_4G) =  GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);

But it doesn't allow to use std::ignore.

: (problem.IsFp16() ? GetFwdXdlopsNHWCConfigLargestTileFp16()
: GetFwdXdlopsNHWCConfigLargestTileBf16());
size_t current_block_size, current_grid_size, current_splits_4G;
std::tie(current_block_size, current_grid_size, current_splits_4G) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using structure binding here.

Copy link
Contributor

@atamazov atamazov Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] @CAHEK7 @carlushuang I am still wondering why we use tuples as return types, which is error prone (sensitive to the order) and to enforces us to use comments to keep the code understandable (like this):

static std::tuple<size_t, // block_size
                  size_t, // grid_size
                  size_t> // splits_4G
GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(
    const ProblemDescription& problem,
    const PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC& config);

instead of structures that contain members with "normal" names.

Wrt binding -- it is better sometimes, but looks as another crutch to me.

size_t block_size;
size_t grid_size;

int splits_4G;

std::tie(kernel_name, block_size, grid_size, splits_4G) =
GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(ctx, problem, config);
std::tie(block_size, grid_size, splits_4G) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using structure binding here.

@Epliz
Copy link

Epliz commented Jul 22, 2023

Shouldn't you add a test if it is meant to fix a bug?

@junliume
Copy link
Contributor

Shouldn't you add a test if it is meant to fix a bug?

@carlushuang could you deal with the above review comments? For test, I suggest very large tensor and force MISA kernels.

@junliume junliume merged commit 07e3e78 into develop Jul 25, 2023
junliume pushed a commit that referenced this pull request Jul 28, 2023
junliume pushed a commit that referenced this pull request Jul 29, 2023
@jithunnair-amd
Copy link

Shouldn't you add a test if it is meant to fix a bug?

I second this comment whole-heartedly :)

@atamazov
Copy link
Contributor

atamazov commented Aug 4, 2023

@JehandadKhan @junliume

Our HIP launch kernel API support launch grid size no larger than 32bit integer (4294967295), otherwise HIP runtime will through exception. We need to check this number and set inside isApplicable()/isValid()

Indeed, we do not check the grid size, this is known potential problem. Perhaps we have to systematically apply this kind of check to all existing solvers. What do you think?

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

[Note] It seems a little questionable whether it's better to explicitly keep copies of the "largest configs" (as done in this PR) or just keep an index of the "largest config". Both approaches have drawbacks.

@junliume junliume deleted the fix_sw_396203 branch August 28, 2023 23:26
cderb added a commit that referenced this pull request Sep 18, 2023
* [FIX SW 396203] check launch kernel grid size not beyond 32bit integer (#2263)

* check 32bit launch size

* Revert "Do not fail the generic search if n_runs_total is zero; turns warnings into infos (#2266)"

This reverts commit 6795a81.

* Patch half.hpp file location reorg (#2275)

* [Tuning][MI100][MI210][MI250] Gold18 (#2264)

* gold18 db update, remove detectron2 configs to allow miopen heuristic

* remove invalid performance configs

---------

Co-authored-by: carlushuang <[email protected]>
Co-authored-by: Jun Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants