Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix location planning for initializers used only in nested subgraphs #8642

Merged
merged 14 commits into from
Sep 1, 2021

Conversation

hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Aug 6, 2021

Description: The current logic for planning the location for all initializers at a graph level only involves walking the nodes at that graph level. This is incomplete as the initializers could actually be used in any of the nested subgraphs relative to that graph level. We need to walk all the nested subgraphs to have the necessary information needed to decide the correct location that the initializers needs to reside on.

The negative impact of not doing this is that if the initializers were only used in one of the nested subgraphs (not used in the graph level they are introduced in) and used in a provider node (e.g. CUDA), with the current logic keeping these initializers on CPU, the controlflow nodes' implementation takes them from CPU to the device at runtime (and the impact is much worse for Loops as these copies are done on every iteration).

This change adds logic to walk all the nested subgraphs while processing initializers at a given graph level and take them statically to the right device thus saving all the runtime copies.

Motivation and Context
~50% gains for a 1P model. Should also benefit other models having a similar structure.

@hariharans29 hariharans29 requested a review from a team as a code owner August 6, 2021 18:38
@hariharans29 hariharans29 changed the title Fix location planning for initializers Fix location planning for initializers used in nested subgraphs Aug 11, 2021
@hariharans29 hariharans29 changed the title Fix location planning for initializers used in nested subgraphs Fix location planning for initializers used only in nested subgraphs Aug 27, 2021
onnxruntime/core/framework/session_state.h Show resolved Hide resolved
onnxruntime/core/framework/allocation_planner.cc Outdated Show resolved Hide resolved
continue;
}

ORT_TRY {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we surround GeneratePlanForWeightsHelper in a try/catch at the point of invocation inside GeneratePlanForWeights instead of doing it in every recursive call?
Secondly, not clear why do we even need a try/catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. The top level exception handler will catch an exception and return a status so unless you're going to do something more than that there's no point catching anything here.

Copy link
Member Author

@hariharans29 hariharans29 Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surrounded the call to GeneratePlanForWeightsHelper() in a try/catch. Is there another top level exception catcher that would make even this unnecessary ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of InferenceSesion::Initialize is in a try/catch

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the superfluous try/catch

OrtValueIndex init_data_index;
main_graph_ort_value_index_map.GetIdx("init_data", init_data_index);

EXPECT_EQ(main_graph_plan->allocation_plan[init_data_index].location.device.Type(), OrtDevice::GPU);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this would have been "planned" to CPU

pranavsharma
pranavsharma previously approved these changes Aug 31, 2021
Copy link
Contributor

@skottmckay skottmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@hariharans29 hariharans29 merged commit acd9db7 into master Sep 1, 2021
@hariharans29 hariharans29 deleted the hari/fixAllocationPlanForInitializers branch September 1, 2021 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants