-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[ENV] update runtime setting default values #18987
Conversation
Hey @szha , Thanks for submitting the PR
CI supported jobs: [centos-cpu, windows-gpu, centos-gpu, unix-cpu, sanity, windows-cpu, miscellaneous, clang, website, edge, unix-gpu] Note: |
@sxjscience @zhreshold it would be great to have some data on how such defaults work for the cv and nlp models. |
src/storage/storage.cc
Outdated
if (type == nullptr) | ||
type = "Naive"; // default pool | ||
if (type == nullptr) { | ||
type = "Round"; // default pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's obvious that round can help speed up certain dynamic input workloads, but tends to oom more frequently. I suggest we much very cautious about changing the default to round, unless there's a good fallback for oom handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the concern on this. This change will impact those static-size static-graph models that were at the boundary of GPU memory limit. Which of the current GluonCV model training scripts fall in this category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope is of course to provide a good out-of-the-box usage experience to mxnet users. From what I observed, there seems to be more models with dynamic shape inputs than static ones, and many of the static-shape models can still run in this setting, hence the proposal.
I think CNNs are generally static shape while models in NLP are generally dynamic shape. Do we have any plan for improving the memory usage?
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Sheng Zha <[email protected]>
Sent: Monday, August 24, 2020 6:30:41 PM
To: apache/incubator-mxnet <[email protected]>
Cc: Xingjian SHI <[email protected]>; Mention <[email protected]>
Subject: Re: [apache/incubator-mxnet] [ENV] update runtime setting default values (#18987)
@szha commented on this pull request.
________________________________
In src/storage/storage.cc<#18987 (comment)>:
@@ -67,8 +67,9 @@ StorageManager *CreateStorageManager(const Context &ctx, const char *context,
int num_gpu_device, std::string *pStrategy) {
const auto env_var = env_var_name(context, pool_type);
const char *type = getenv(env_var.c_str());
- if (type == nullptr)
- type = "Naive"; // default pool
+ if (type == nullptr) {
+ type = "Round"; // default pool
My hope is of course to provide a good out-of-the-box usage experience to mxnet users. From what I observed, there seems to be more models with dynamic shape inputs than static ones, and many of the static-shape models can still run in this setting, hence the proposal.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#18987 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3TMASLXDEBXVJXNACLSCMH4DANCNFSM4QI2O3KQ>.
|
I don't think we can generalize like this. For example, object detection and segmentation are based on CNN and are usually not static-shaped.
Of course we do. I think @ArmageddonKnight is currently fixing some missed allocation entries in memory profiler, and plans on developing a memory usage visualization tool later this week to help narrow down the focus for memory optimization. We also intend to add mirror option to cached op to allow training for larger model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can approve from the NLP side. Because the workloads in NLP are usually dynamic so it's beneficial to have round
memory management.
thanks for the reviews. @zhreshold how about setting the pool strategy to be naive when shapes are static? |
This is actually fine when shapes are static, my major concern is that with round enabled by default, in most use cases mxnet can be faster but consumes more memory than expected |
@zhreshold we could consider adding an interface to allocate the exact size and use it in cached op for static shape only. |
I reverted the round pool change first to merge the rest of the changes. I will work on a cached op path to enable exact size allocation to avoid the memory waste in the static graph case. |
ff5a940
to
9267d11
Compare
Description
update runtime setting default values for resource copies, mem pool type
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments