Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify gating check for CUDA Graph usage #16491

Merged
merged 4 commits into from
Jun 28, 2023
Merged

Conversation

hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Jun 27, 2023

Description

As part of relaxing the node EP check for CUDA Graphs in #16358, logic was introduced to collect all shape massaging nodes. This logic was to collect all nodes between Shape and Reshape nodes. This covers the most common shape massaging node pattern. However, this isn't exhaustive. Shape massaging subgraphs may not end at a Reshape node. It may end in other nodes that consume shape info (like Expand, ConstantOfShape, etc.) In fact, a Reshape node itself may be part of the all the shape massaging nodes (see illustration below).

image

The gating check now is as follows:

(1) For CUDA and TRT EP: Ensure that there are no control flow nodes (same as before)

(2)
For TRP EP: Ensure all nodes have been placed on the TRT EP (same as before)

(New logic below)
For CUDA EP: Ensure that all nodes have been partitioned to CUDA or CPU EP && there are no memcpy nodes. The reasoning behind this logic is that certain shape nodes will be forced onto CPU and as long as there are no memcpy nodes this is confirmation that no compute nodes have been placed on the CPU EP.
Additionally, for the CUDA EP, we log a warning for the user to know that there are shape subgraphs that will execute on CPU for them to decide if they want to use CUDA Graphs. In most cases, shape subgraphs on CPU should mean it is safe to use CUDA Graphs.

Motivation and Context

Refine logic introduced in #16358

pranavsharma
pranavsharma previously approved these changes Jun 27, 2023
tianleiwu
tianleiwu previously approved these changes Jun 27, 2023
Copy link
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some time, it could miss some case that shall not run cuda graph (like some node does not have CUDA implementation and consuming some shape input but its output is not constant.

As we have warning message, it is fine that we apply less constraints to unblock most users (assuming that user will verify the accuracy by themselves).

@hariharans29
Copy link
Member Author

Some time, it could miss some case that shall not run cuda graph (like some node does not have CUDA implementation and consuming some shape input but its output is not constant.

As we have warning message, it is fine that we apply less constraints to unblock most users (assuming that user will verify the accuracy by themselves).

Some time, it could miss some case that shall not run cuda graph (like some node does not have CUDA implementation and consuming some shape input but its output is not constant.

As we have warning message, it is fine that we apply less constraints to unblock most users (assuming that user will verify the accuracy by themselves).

like some node does not have CUDA implementation - Yes, that is true. I thought about this case myself last night - When a shape input is eventually consumed by some nodes (for example, Expand) and if that is placed on CPU because there is no CUDA implementation, we will still treat it as a "shape node" with the above logic and allow for CUDA Graph usage. In fact, it is hard to write perfect logic that differentiates between a shape node that has been forced to CPU and a node assigned to CPU because there is no CUDA implementation.

I think I will simplify this logic in such a way that we allow for CUDA Graph capture as long as there are no memcpy nodes and we will log a warning for the user to check results if we see a Shape node and we have some nodes assigned to the CPU EP (assume that these nodes are shape subgraphs).

@hariharans29 hariharans29 changed the title Shape massaging nodes collection logic refinement for CUDA Graph Simplify gating check for CUDA Graph usage Jun 27, 2023
@hariharans29 hariharans29 dismissed stale reviews from tianleiwu and pranavsharma via f49e546 June 27, 2023 23:59
@hariharans29 hariharans29 requested a review from tianleiwu June 28, 2023 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants