-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Add dumping of the memory statistcs #28441
base: master
Are you sure you want to change the base?
Conversation
@@ -66,6 +66,12 @@ void DebugCapsConfig::readProperties() { | |||
|
|||
if ((envVarValue = readEnv("OV_CPU_AVERAGE_COUNTERS"))) | |||
averageCountersPath = envVarValue; | |||
|
|||
if ((envVarValue = readEnv("OV_CPU_MEMORY_STATISTICS_LEVEL"))) | |||
memoryStatisticsDumpLevel = std::stoi(envVarValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we align the handling of those environment variables with the ones we already have?
I mean OV_CPU_MEMORY_STATISTICS_PATH could have options like :
- cout
- *.csv
- *.etc
And automatically enable the level 1 if specified.
And OV_CPU_MEMORY_STATISTICS_LEVEL could be used to increase the level
This way one could dump the statistic to csv file without a need to specify two environment variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@EgorDuplensky , do you have any further comments? |
} | ||
|
||
private: | ||
std::vector<std::shared_ptr<MemoryBlockWithRelease>> m_unique_blocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to count unique_blocks even without this data structure, isn't?
I mean it will be slower for sure, but do we care how fast we collect those statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also used to calculate the actually allocated memory size. The wrappers stores the memory size requested from each tensor, but not the really allocated memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we do have access to the parent memory block via individual memory block, don't we? I mean to get the requested memory size.
Just trying to understand whether we can have production code untouched (or maybe refactored?), so we can get all the necessary information without introducing extra logic (perhaps only adding mem block substitution, that is it). But maybe it is not feasible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's true. Technically we can extend the wrapper interface with an accessor to the parent block, then build a set of unique blocks once requested. Theoretically it may help to avoid using two MemoryBlocks
classes at the cost of more complex data retrieval algo. But anyway, the compile time type should be different as in the case of DEBUG_CAPS build we still need the wrapper.
I don't think we really need to set a goal of leaving the production code untouched, as it doesn't make sense to keep debug information in the production build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective it would just look cleaner - to only substitute the memory block implementation to achieve memory statistics collection. Like we injected different implementation and achieved what we need. I can imagine using such approach in scope of unit testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for highlighting this problem. I made an attempt to simplify the MemoryManagerNonOverlappingSets
substituting the proxy type in only one place. Hope this helps to make the implementation cleaner.
@@ -371,10 +618,20 @@ void MemoryControl::releaseMemory() { | |||
m_allocated = false; | |||
} | |||
|
|||
edgeClusters MemoryControl::findEdgeClusters(const std::vector<EdgePtr>& graphEdges) { | |||
#ifdef CPU_DEBUG_CAPS | |||
MemoryStatistics MemoryControl::dumpStatistics() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using free friend functions instead, which will be able to access the private fields, so we can move every debug caps related logic into a separate file to avoid cluttering the production logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can make the code cleaner but my idea was that this statics calculation is strictly bound to the specific memory manager type (in terms of data members and the underlining algorithm) so once the main memory management implementation is changed, this memory statistics collection subroutine most likely needs to be changed too.
Moreover, I even didn't want to wrap them into the CPU_DEBUG_CAPS
macro, but since we want to keep the main version as lightweight and fast as possible, some of this implementations become ill formed as they access only debug versions of data members.
Thus, if you still sure that it's better to move them into a separate file even though it will be more difficult to keep them relevant, I'll do it.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just trying to figure out what is the common way of achieving such kind of serialization of a complex data structure.
From my understanding there are 3 options:
- The one which is implemented here, where we are saying that "this thing is quite complex so we will serialize it with some additional processing and do it internally without introducing any additional interfaces". Thus, we have to put debug serialization logic under ifdefs which always does not look good.
- Expose all the internal data structures via const methods, so the serializer can access them in a normal way. In this case we are treating the serialization process almost as a production logic (like we need this serializer as a production feature or maybe even multiple implementations of the serialization in the production code).
- Similar to 2, but without introducing extra access methods, using 'friend' functionality. Personally, I like this approach simply because it is basically considered as a c++ guideline for ostream serialization. I mean to implement an external friend operator overloading. I consider this memory statistic dumping just as a fancy serialization. So, this approach fits well here, I think. And it could be our development guideline to implement serialization mechanism this way, so we don't do it in a unique way every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind to extract these methods into external functions, but it simply means that in the case of any change we will have to modify two parts of code base and it's more easy to make a mistake (forget to modify it) when the code is located in separate files modules and so on. Also, these serialization functions depend on classes defined in this translation unit, and extracting this functions and moving them into another translation unit would mean exposing this internal types (due to limitations of the programming language) As for me it's something that doesn't worth the effort. Therefore I propose extracting this serialization methods, but defining them in the same translation unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But on the other hand, every time we need debug capabilities in our code we have to put it near the production logic. Imagine every major class of the plugin would have that (extra logic and data structures with bunch of #ifdef for the debug code), it would be quite a mess.
It will be great to establish an approach where we keep things clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for raising this concern. I made an attempt to separate the dumping implementations from the classes, but, I decided to leave these implementations in the same translation unit just not to expose internal types. Hope this made the classes implementation cleaner.
Details:
Add yet another debug capability: dumping the following memory statistics:
Standard output and
*.csv
file dump are supportedToDo:
Tickets: