[ROCm] Fused convolution+bias+activation #9666

ekuznetsov139 · 2024-02-20T15:49:18Z

This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly.

xla/stream_executor/rocm/rocm_dnn.cc

xla/service/gpu/cudnn_fused_conv_rewriter.cc

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc

thomasjoerg · 2024-02-21T12:16:08Z

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc

+    };
+    std::thread threads[4];
+    for(int i=0; i<4; i++)
+      threads[i] = std::thread(executor, i);


What is the motivation for the multi thread test? All threads run the same module, right?

This would potentially flag race conditions due to MIOpen (or CuDNN) executing identical fused operations in different threads.

The race condition would need to make Run(...) return false for this test to catch it, which may or may not happen. Wouldn't running under ThreadSanetizer give you better chances of finding such issues? Also, this test class doesn't seem the right place to test MIOpen or cuDNN for thread safety, in my opinion.

Did you reach an agreement about this with @thomasjoerg?
It looks like the test is still there.

The test is still there, but it was changed to use RunAndCompare instead of Run, so it would catch mismatches. I don't see any other places where backend implementations of fused convolutions are tested or could be tested for concurrency issues.

As it is, worst thing that might happen is that there might be an issue but the test won't catch it. I can certainly remove the test, but I'm not sure how dropping the test would improve matters.

Tests need to be maintained and in its current form this test will be hard to maintain. When this test breaks, the person looking at the breakage needs to right information to understand the issue.

A code comment should explain the motivation for this test, because it is different from the non-threaded tests around it.

Please explain where the race condition would occur. I'm not sure here - the threads don't seem to share anything and don't communicate.

This test does not test XLA's conv rewriter, but MIOpen / cuDNN, right? This is not obvious and deserves a comment too. Would XLA actually call convs on multiple threads (inter-op parallelism)?

The threads don't share anything and they don't communicate, but, at least in the MIOpen case, stream executor will cache and reuse the same "fusion plan" handle for operations with identical parameters (https://github.com/openxla/xla/blob/main/xla/stream_executor/rocm/rocm_dnn.cc#L560), and potentially the same handle may be used in parallel in different threads. CuDNN may be doing something similar at some level.

Since this PR is almost 2 months old by now, rather than argue the point further, I'm going to drop the test. If race conditions are really possible, someone will have fun tracking them down the hard way.

xla/service/gpu/cudnn_fused_conv_rewriter.cc

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc

xla/stream_executor/rocm/rocm_dnn.cc

xla/tests/convolution_test.cc

ekuznetsov139 · 2024-03-05T10:48:53Z

Bump

kamaljeeti · 2024-03-18T08:11:52Z

Hi @tdanyluk , can you look into this once? Thanks.

tdanyluk

Thanks for the fixes and sorry for the delay on this.

Could you fix these 2 macro/ifdef related issues?

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc

kamaljeeti · 2024-05-07T04:21:42Z

Hi @thomasjoerg , can you please look into this ? Thanks.

Imported from GitHub PR #9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be49 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f89 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=#9666 from ROCm:ci_fused_conv 8792f89 PiperOrigin-RevId: 633859923

Imported from GitHub PR openxla/xla#9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be494abe6621dfd2c4ebcca4f8992077d8a89 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f892770aba54e38a5352d9bec1d003e341d3 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#9666 from ROCm:ci_fused_conv 8792f892770aba54e38a5352d9bec1d003e341d3 PiperOrigin-RevId: 633859923

Imported from GitHub PR #9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be49 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f89 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=#9666 from ROCm:ci_fused_conv 8792f89 PiperOrigin-RevId: 633859923

Imported from GitHub PR openxla/xla#9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be494abe6621dfd2c4ebcca4f8992077d8a89 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f892770aba54e38a5352d9bec1d003e341d3 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#9666 from ROCm:ci_fused_conv 8792f892770aba54e38a5352d9bec1d003e341d3 PiperOrigin-RevId: 633859923

Imported from GitHub PR #9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be49 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f89 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=#9666 from ROCm:ci_fused_conv 8792f89 PiperOrigin-RevId: 633859923

Imported from GitHub PR openxla/xla#9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be494abe6621dfd2c4ebcca4f8992077d8a89 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f892770aba54e38a5352d9bec1d003e341d3 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#9666 from ROCm:ci_fused_conv 8792f892770aba54e38a5352d9bec1d003e341d3 PiperOrigin-RevId: 633859923

Imported from GitHub PR openxla/xla#9666 This PR enables MIOpen convolution+bias+activation fusion for ROCm and updates fusion unit tests accordingly. Copybara import of the project: -- bd5be494abe6621dfd2c4ebcca4f8992077d8a89 by Eugene Kuznetsov <[email protected]>: Switch to NHWC for ROCm and F16 -- 8792f892770aba54e38a5352d9bec1d003e341d3 by Eugene Kuznetsov <[email protected]>: Fused convolution+bias+activation Merging this change closes #9666 PiperOrigin-RevId: 633923105

…strategy scope. In some cases `v._distribute_strategy` can be None, which was causing an AttributeError when trying to check if the `extended` field matched the expected value. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#9666 from ROCm:ci_fused_conv 8792f892770aba54e38a5352d9bec1d003e341d3 PiperOrigin-RevId: 633574887

The OneDNN matmul rewriter starts a thread pool if one wasn't provided as a compile option, spawning many threads per compilation. If we pass a preexisting thread pool, we can save the cost of creating and tearing down threads. FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#9666 from ROCm:ci_fused_conv 8792f892770aba54e38a5352d9bec1d003e341d3 PiperOrigin-RevId: 633926908

…ekly-sync due to openxla/xla#9666

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 20, 2024

github-actions bot assigned kamaljeeti and xla-rotation Feb 20, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 20, 2024

kamaljeeti requested review from ddunl and thomasjoerg February 21, 2024 05:50

thomasjoerg requested changes Feb 21, 2024

View reviewed changes

ekuznetsov139 force-pushed the ci_fused_conv branch from 6123d58 to 3e74da9 Compare February 22, 2024 15:49

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 22, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 22, 2024

tdanyluk self-requested a review February 23, 2024 10:01

tdanyluk reviewed Feb 23, 2024

View reviewed changes

ekuznetsov139 force-pushed the ci_fused_conv branch from 3e74da9 to 03ea211 Compare February 24, 2024 23:30

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 24, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 24, 2024

ekuznetsov139 force-pushed the ci_fused_conv branch from 03ea211 to 6b38aed Compare February 24, 2024 23:38

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 24, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 24, 2024

ekuznetsov139 force-pushed the ci_fused_conv branch from 6b38aed to 630debf Compare February 25, 2024 02:48

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 25, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 25, 2024

ekuznetsov139 force-pushed the ci_fused_conv branch from 630debf to 8fd3748 Compare February 25, 2024 03:36

github-actions bot added the kokoro:force-run Forces CI to rerun label Feb 25, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Feb 25, 2024

tdanyluk suggested changes Mar 21, 2024

View reviewed changes

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc Outdated Show resolved Hide resolved

xla/service/gpu/cudnn_fused_conv_rewriter_test.cc Outdated Show resolved Hide resolved

ekuznetsov139 force-pushed the ci_fused_conv branch from 8fd3748 to b56d07d Compare March 27, 2024 11:56

github-actions bot added the kokoro:force-run Forces CI to rerun label Mar 27, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Mar 27, 2024

github-actions bot added the kokoro:force-run Forces CI to rerun label Apr 19, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label Apr 19, 2024

thomasjoerg approved these changes May 7, 2024

View reviewed changes

ekuznetsov139 force-pushed the ci_fused_conv branch from 7cb82b9 to 4fb5629 Compare May 8, 2024 16:20

github-actions bot added the kokoro:force-run Forces CI to rerun label May 8, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 8, 2024

ekuznetsov139 added 2 commits May 13, 2024 20:24

Switch to NHWC for ROCm and F16

bd5be49

Fused convolution+bias+activation

8792f89

ekuznetsov139 force-pushed the ci_fused_conv branch from 4fb5629 to 8792f89 Compare May 13, 2024 20:29

github-actions bot added the kokoro:force-run Forces CI to rerun label May 13, 2024

kokoro-team removed the kokoro:force-run Forces CI to rerun label May 13, 2024

copybara-service bot mentioned this pull request May 15, 2024

PR #9666: [ROCm] Fused convolution+bias+activation #12498

Merged

copybara-service bot mentioned this pull request May 15, 2024

PR #9666: [ROCm] Fused convolution+bias+activation tensorflow/tensorflow#67624

Merged

copybara-service bot closed this in cb16451 May 15, 2024

copybara-service bot mentioned this pull request May 15, 2024

Fix a bug in validate_colocate when a variable is not created in a strategy scope. tensorflow/tensorflow#67561

Merged

copybara-service bot mentioned this pull request May 15, 2024

[PJRT:CPU] Provide a compile-time thread pool. tensorflow/tensorflow#67638

Closed

i-chaochen added a commit to ROCm/tensorflow-upstream that referenced this pull request May 17, 2024

disable cudnn_fused_conv_rewriter_test but this can be enable next we…

07b3278

…ekly-sync due to openxla/xla#9666

i-chaochen mentioned this pull request May 29, 2024

Develop upstream sync 240521 ROCm/tensorflow-upstream#2548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Fused convolution+bias+activation #9666

[ROCm] Fused convolution+bias+activation #9666

ekuznetsov139 commented Feb 20, 2024

thomasjoerg Feb 21, 2024

ekuznetsov139 Feb 22, 2024

thomasjoerg Feb 22, 2024

tdanyluk Mar 28, 2024

ekuznetsov139 Apr 2, 2024

thomasjoerg Apr 5, 2024

ekuznetsov139 Apr 9, 2024

ekuznetsov139 commented Mar 5, 2024

kamaljeeti commented Mar 18, 2024

tdanyluk left a comment

kamaljeeti commented May 7, 2024

[ROCm] Fused convolution+bias+activation #9666

[ROCm] Fused convolution+bias+activation #9666

Conversation

ekuznetsov139 commented Feb 20, 2024

thomasjoerg Feb 21, 2024

Choose a reason for hiding this comment

ekuznetsov139 Feb 22, 2024

Choose a reason for hiding this comment

thomasjoerg Feb 22, 2024

Choose a reason for hiding this comment

tdanyluk Mar 28, 2024

Choose a reason for hiding this comment

ekuznetsov139 Apr 2, 2024

Choose a reason for hiding this comment

thomasjoerg Apr 5, 2024

Choose a reason for hiding this comment

ekuznetsov139 Apr 9, 2024

Choose a reason for hiding this comment

ekuznetsov139 commented Mar 5, 2024

kamaljeeti commented Mar 18, 2024

tdanyluk left a comment

Choose a reason for hiding this comment

kamaljeeti commented May 7, 2024