Simplify gating check for CUDA Graph usage #16491

hariharans29 · 2023-06-27T01:53:25Z

Description

As part of relaxing the node EP check for CUDA Graphs in #16358, logic was introduced to collect all shape massaging nodes. This logic was to collect all nodes between Shape and Reshape nodes. This covers the most common shape massaging node pattern. However, this isn't exhaustive. Shape massaging subgraphs may not end at a Reshape node. It may end in other nodes that consume shape info (like Expand, ConstantOfShape, etc.) In fact, a Reshape node itself may be part of the all the shape massaging nodes (see illustration below).

The gating check now is as follows:

(1) For CUDA and TRT EP: Ensure that there are no control flow nodes (same as before)

(2)
For TRP EP: Ensure all nodes have been placed on the TRT EP (same as before)

(New logic below)
For CUDA EP: Ensure that all nodes have been partitioned to CUDA or CPU EP && there are no memcpy nodes. The reasoning behind this logic is that certain shape nodes will be forced onto CPU and as long as there are no memcpy nodes this is confirmation that no compute nodes have been placed on the CPU EP.
Additionally, for the CUDA EP, we log a warning for the user to know that there are shape subgraphs that will execute on CPU for them to decide if they want to use CUDA Graphs. In most cases, shape subgraphs on CPU should mean it is safe to use CUDA Graphs.

Motivation and Context

Refine logic introduced in #16358

tianleiwu

Some time, it could miss some case that shall not run cuda graph (like some node does not have CUDA implementation and consuming some shape input but its output is not constant.

As we have warning message, it is fine that we apply less constraints to unblock most users (assuming that user will verify the accuracy by themselves).

hariharans29 · 2023-06-27T18:16:47Z

Some time, it could miss some case that shall not run cuda graph (like some node does not have CUDA implementation and consuming some shape input but its output is not constant.

As we have warning message, it is fine that we apply less constraints to unblock most users (assuming that user will verify the accuracy by themselves).

like some node does not have CUDA implementation - Yes, that is true. I thought about this case myself last night - When a shape input is eventually consumed by some nodes (for example, Expand) and if that is placed on CPU because there is no CUDA implementation, we will still treat it as a "shape node" with the above logic and allow for CUDA Graph usage. In fact, it is hard to write perfect logic that differentiates between a shape node that has been forced to CPU and a node assigned to CPU because there is no CUDA implementation.

I think I will simplify this logic in such a way that we allow for CUDA Graph capture as long as there are no memcpy nodes and we will log a warning for the user to check results if we see a Shape node and we have some nodes assigned to the CPU EP (assume that these nodes are shape subgraphs).

onnxruntime/core/session/inference_session.cc

hariharans29 added 2 commits June 26, 2023 18:31

Initial change

f508584

a

86babf5

hariharans29 requested a review from pranavsharma June 27, 2023 01:54

pranavsharma previously approved these changes Jun 27, 2023

View reviewed changes

tianleiwu previously approved these changes Jun 27, 2023

View reviewed changes

hariharans29 changed the title ~~Shape massaging nodes collection logic refinement for CUDA Graph~~ Simplify gating check for CUDA Graph usage Jun 27, 2023

Refine

f49e546

hariharans29 dismissed stale reviews from tianleiwu and pranavsharma via f49e546 June 27, 2023 23:59

a

e8f304e

hariharans29 requested a review from tianleiwu June 28, 2023 02:15

tianleiwu approved these changes Jun 28, 2023

View reviewed changes

pranavsharma reviewed Jun 28, 2023

View reviewed changes

onnxruntime/core/session/inference_session.cc Show resolved Hide resolved

pranavsharma approved these changes Jun 28, 2023

View reviewed changes

hariharans29 merged commit ff0894e into main Jun 28, 2023

hariharans29 deleted the hari/cuda_graph_debug branch June 28, 2023 22:25

MaxMood96 mentioned this pull request May 23, 2023

[Snyk] Security upgrade webpack from 5.36.2 to 5.76.0 MaxMood96/onnxruntime#411

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify gating check for CUDA Graph usage #16491

Simplify gating check for CUDA Graph usage #16491

hariharans29 commented Jun 27, 2023 •

edited

Loading

tianleiwu left a comment •

edited

Loading

hariharans29 commented Jun 27, 2023

Simplify gating check for CUDA Graph usage #16491

Simplify gating check for CUDA Graph usage #16491

Conversation

hariharans29 commented Jun 27, 2023 • edited Loading

Description

Motivation and Context

tianleiwu left a comment • edited Loading

Choose a reason for hiding this comment

hariharans29 commented Jun 27, 2023

hariharans29 commented Jun 27, 2023 •

edited

Loading

tianleiwu left a comment •

edited

Loading