-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix location planning for initializers used only in nested subgraphs #8642
Conversation
…lanForInitializers
continue; | ||
} | ||
|
||
ORT_TRY { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we surround GeneratePlanForWeightsHelper in a try/catch at the point of invocation inside GeneratePlanForWeights instead of doing it in every recursive call?
Secondly, not clear why do we even need a try/catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. The top level exception handler will catch an exception and return a status so unless you're going to do something more than that there's no point catching anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surrounded the call to GeneratePlanForWeightsHelper()
in a try/catch. Is there another top level exception catcher that would make even this unnecessary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of InferenceSesion::Initialize is in a try/catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the superfluous try/catch
OrtValueIndex init_data_index; | ||
main_graph_ort_value_index_map.GetIdx("init_data", init_data_index); | ||
|
||
EXPECT_EQ(main_graph_plan->allocation_plan[init_data_index].location.device.Type(), OrtDevice::GPU); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this would have been "planned" to CPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: The current logic for planning the location for all initializers at a graph level only involves walking the nodes at that graph level. This is incomplete as the initializers could actually be used in any of the nested subgraphs relative to that graph level. We need to walk all the nested subgraphs to have the necessary information needed to decide the correct location that the initializers needs to reside on.
The negative impact of not doing this is that if the initializers were only used in one of the nested subgraphs (not used in the graph level they are introduced in) and used in a provider node (e.g. CUDA), with the current logic keeping these initializers on CPU, the controlflow nodes' implementation takes them from CPU to the device at runtime (and the impact is much worse for Loops as these copies are done on every iteration).
This change adds logic to walk all the nested subgraphs while processing initializers at a given graph level and take them statically to the right device thus saving all the runtime copies.
Motivation and Context
~50% gains for a 1P model. Should also benefit other models having a similar structure.