New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[FIX SW 396203] check launch kernel grid size not beyond 32bit integer #2263

Merged

junliume merged 3 commits into develop from fix_sw_396203

Jul 25, 2023

Contributor

carlushuang commented Jul 18, 2023

Our HIP launch kernel API support launch grid size no larger than 32bit integer (4294967295), otherwise HIP runtime will through exception.
We need to check this number and set inside isApplicable()/isValid()


          check 32bit launch size

cfa0050

carlushuang requested a review from junliume

July 18, 2023 12:35


          add ULL

3ccb9f5

junliume reviewed

View reviewed changes

src/solver/conv_asm_implicit_gemm_gtc_bwd_nhwc.cpp Outdated Show resolved Hide resolved

junliume added value_high urgency_blocker bug labels


          update tidy

651a6b3

CAHEK7 suggested changes

View reviewed changes

Contributor

CAHEK7 left a comment

Structure bindings are just a suggestion but making Get***XdlopsNHWCConfigLargestTile() dynamically configurable through a data size can improve code quality and should be addressed.

src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp Show resolved Hide resolved

src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp

                                                  miopen::GetTypeSize(problem.GetInDataType())))
                       return false;
+                  {
+                      auto largest_config = problem.IsFp32()

Contributor

CAHEK7 Jul 20, 2023

If GetFwdXdlopsNHWCConfigLargestTile would depend on data size, we can simplify this code to

 auto largest_config = GetFwdXdlopsNHWCConfigLargestTile(dataSize);

Contributor Author

carlushuang Jul 20, 2023

Like described above, not exactly depending on data size.

src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp

                   if(problem.IsFp16() && gemm_k_global_split != 0 && vector_store != 1 && splits_4G > 1)
                       return false;
+                  size_t current_block_size, current_grid_size, current_splits_4G;
+                  std::tie(current_block_size, current_grid_size, current_splits_4G) =

Contributor

CAHEK7 Jul 20, 2023

Since we have got C++17 available, structure binding seems to be less wordy:

auto [current_block_size, current_grid_size, current_splits_4G] = GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);

vs

size_t current_block_size, current_grid_size, current_splits_4G;
std::tie(current_block_size, current_grid_size, current_splits_4G) =  GetImplicitGemmGtcDynamicBwdXdlopsNHWCKernel(problem, *this);

But it doesn't allow to use std::ignore.

src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp

+                                                : (problem.IsFp16() ? GetFwdXdlopsNHWCConfigLargestTileFp16()
+                                                                    : GetFwdXdlopsNHWCConfigLargestTileBf16());
+                      size_t current_block_size, current_grid_size, current_splits_4G;
+                      std::tie(current_block_size, current_grid_size, current_splits_4G) =

Contributor

CAHEK7 Jul 20, 2023

Consider using structure binding here.

Contributor

atamazov Aug 4, 2023 •

edited

Loading

[Notice] @CAHEK7 @carlushuang I am still wondering why we use tuples as return types, which is error prone (sensitive to the order) and to enforces us to use comments to keep the code understandable (like this):

static std::tuple<size_t, // block_size
                  size_t, // grid_size
                  size_t> // splits_4G
GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(
    const ProblemDescription& problem,
    const PerformanceConfigAsmImplicitGemmGTCFwdXdlopsNHWC& config);

instead of structures that contain members with "normal" names.

Wrt binding -- it is better sometimes, but looks as another crutch to me.

src/solver/conv_asm_implicit_gemm_gtc_fwd_nhwc.cpp

                   size_t block_size;
                   size_t grid_size;
                   int splits_4G;
-                  std::tie(kernel_name, block_size, grid_size, splits_4G) =
-                      GetImplicitGemmGtcDynamicFwdXdlopsNHWCKernel(ctx, problem, config);
+                  std::tie(block_size, grid_size, splits_4G) =

Contributor

CAHEK7 Jul 20, 2023

Consider using structure binding here.

CAHEK7 approved these changes

View reviewed changes

Epliz commented Jul 22, 2023 •

edited

Loading

Shouldn't you add a test if it is meant to fix a bug?

Contributor

junliume commented Jul 22, 2023

Shouldn't you add a test if it is meant to fix a bug?

@carlushuang could you deal with the above review comments? For test, I suggest very large tensor and force MISA kernels.

junliume merged commit 07e3e78 into develop

junliume pushed a commit that referenced this pull request


          [FIX SW 396203] check launch kernel grid size not beyond 32bit integer (

171cb5d

#2263)

* check 32bit launch size

junliume pushed a commit that referenced this pull request


          [FIX SW 396203] check launch kernel grid size not beyond 32bit integer (

ab31cce

#2263)

* check 32bit launch size

jithunnair-amd commented Aug 4, 2023

Shouldn't you add a test if it is meant to fix a bug?

I second this comment whole-heartedly :)

Contributor

atamazov commented Aug 4, 2023

@JehandadKhan @junliume

Our HIP launch kernel API support launch grid size no larger than 32bit integer (4294967295), otherwise HIP runtime will through exception. We need to check this number and set inside isApplicable()/isValid()

Indeed, we do not check the grid size, this is known potential problem. Perhaps we have to systematically apply this kind of check to all existing solvers. What do you think?

atamazov reviewed

View reviewed changes

Contributor

atamazov left a comment

LGTM!

[Note] It seems a little questionable whether it's better to explicitly keep copies of the "largest configs" (as done in this PR) or just keep an index of the "largest config". Both approaches have drawbacks.

junliume deleted the fix_sw_396203 branch

August 28, 2023 23:26

cderb added a commit that referenced this pull request


          Promote from public (#42)

5788ebf

* [FIX SW 396203] check launch kernel grid size not beyond 32bit integer (#2263)

* check 32bit launch size

* Revert "Do not fail the generic search if n_runs_total is zero; turns warnings into infos (#2266)"

This reverts commit 6795a81.

* Patch half.hpp file location reorg (#2275)

* [Tuning][MI100][MI210][MI250] Gold18 (#2264)

* gold18 db update, remove detectron2 configs to allow miopen heuristic

* remove invalid performance configs

---------

Co-authored-by: carlushuang <[email protected]>
Co-authored-by: Jun Liu <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug urgency_blocker value_high