Cecilia/itensor stage2 #64

ceciliapeng2011 · 2023-06-28T02:47:52Z

Details:

item1
...

Tickets:

ticket-id

ceciliapeng2011 · 2023-06-28T02:50:06Z

src/plugins/intel_cpu/src/output_mem_mgr.cpp

+    // WA: resize stage might work because there is no shape change,
+    // but the underlying actual memory manager changes.
+    bool validated = (_previous != m_pMngr);
+    if (validated && m_pMngr->getSize() < m_Size) {
+        m_pMngr->resize(m_Size);
+    }


@maxnick Here is the WA for #62
Sounds not a good solution as well, as it assumes something happening in resizing stage and potential extra resize in some situations.

ceciliapeng2011 · 2023-06-28T02:52:34Z

src/plugins/intel_cpu/src/graph.cpp

+            // deternmine a group with outputs.
+            size_t isOutGrp = 0;
+            int64_t outBoxId = -1;
+            for (auto& box : group) {
+                if (std::any_of(
+                    edge_clusters[box.id].begin(),
+                    edge_clusters[box.id].end(),
+                    [box](const ov::intel_cpu::EdgePtr edge) {
+                        return edge->getChild()->getType() == Type::Output;
+                    })) {
+                        isOutGrp++;
+                        outBoxId = box.id;
+                }
+            }
+            if (isOutGrp) {
+                IE_ASSERT(isOutGrp==1);  // reuse_io_tensors false
+                grpMemMngr =
+                    std::make_shared<OutputMemoryMngr>(grpMemMngr);
+                DEBUG_LOG(grpMemMngr, " ", this);
+
+                // Store the output memory managers.
+                // So that, the infer requests can be able to access them.
+                for (auto& edge : edge_clusters[outBoxId]) {
+                    const auto child = edge->getChild();
+                    if (child->getType() == Type::Output) {
+                        for (auto &output : outputNodesMap) {
+                            if (output.second == child) outputNodesMemMngrMap[output.first] = grpMemMngr;
+                        }
+                    }
+                }
+            }


@maxnick indeed not a good place to add OutputMemoryMngr. I just realize that there might be situations when graph input is dynamic shape, but output is static shape (e.g. some interpolation ops).
Need further discussion to follow up #62

Also there could be corner cases that more than one outputs share the same memory. e.g. each output is the output from Split.

Also there could be corner cases that more than one outputs share the same memory. e.g. each output is the output from Split.

That is why I recommended check Edge::Status::NeedAllocation status. If the edge status is different, it means it shares the memory with other edges and in that case we cann't set proxy memory manager.

I just realize that there might be situations when graph input is dynamic shape, but output is static shape (e.g. some interpolation ops).

To avoid affecting static tensors, this step may be performed after initializing static tensors, L831.

maxnick · 2023-06-28T10:47:52Z

src/plugins/intel_cpu/src/output_mem_mgr.h

+namespace ov {
+namespace intel_cpu {
+
+class OutputMemoryMngr : public IMemoryMngrObserver {


Let's rename it to ProxyMemoryMngr not to bind its purpose only for output.

maxnick · 2023-06-28T10:50:32Z

src/plugins/intel_cpu/src/output_mem_mgr.cpp

+
+using namespace ov::intel_cpu;
+
+void OutputMemoryMngr::setTensor(std::shared_ptr<Tensor> tensor) {


We do not need this method at all. The class is a simple proxy, which allows to replace the underlying memory manager without changing the proxy memory manager reference, which is shared between partitional memory managers and memory objects.

maxnick · 2023-06-28T10:52:53Z

src/plugins/intel_cpu/src/infer_request.cpp

                    data = make_blob_with_precision(desc);
                    data->allocate();
+
+                    auto mem_ptr = create_memory(InferenceEngine::details::convertPrecision(outputNode->second->get_input_element_type(0)), dims);


We have to allow creation memory from the actual shape. The Memory object may have dynamic shape.

We have to defer it until api 2.0 probably, as we need wrap this tensor to TensorMemoryBlob, whose constructor will call tensor's get_shape and get_strides. These calls would throw exception if it is dynamic shape.
We simply use dims=0 here to WA.

maxnick · 2023-06-28T10:53:45Z

src/plugins/intel_cpu/src/infer_request.cpp

                    data = make_blob_with_precision(desc);
                    data->allocate();
+
+                    auto mem_ptr = create_memory(InferenceEngine::details::convertPrecision(outputNode->second->get_input_element_type(0)), dims);
+                    const auto &tensor_ptr = std::make_shared<Tensor>(mem_ptr);


As we discussed before, the data blob should be instanced from the tensor. We should use Blob wrapper over the Tensor object which is already available in the core part.

The wrapper is not usable for now. We have discussed in the email thread. It is only temprary. We need a fix, or api 2.0.

As I remember it can be easily fixed to unblock these changes. Than it will be removed after moving to API 2.0 entities.

done. though not that easy as thought.

CPUTensor is a wrap of memory object, but it’s also wrapped by TensorMemoryBlob object. Thus there’s problem that - when the memory object is resized, it won’t reflect to TensorMemoryBlob. So in graph::pulloutputdata stage, CPUTensor is set_shape again as blob tensordesc still has old (empty) dims.

The second problem is, since all buffer(), rwmap() method of Blob returns LockedMemory instance, instead of void*, and we also have to sync the blob and its within wrapped tensor, a NutshellAllocator is invented to blindly lock memory without check.

maxnick · 2023-06-28T10:55:46Z

src/plugins/intel_cpu/src/infer_request.cpp

+                auto inferrequest = std::dynamic_pointer_cast<InferRequest>(this->shared_from_this());
+                OPENVINO_ASSERT(inferrequest, "should be a InferRequest instance");


This is now how we handle polymorphism.
If something is should be called only on the InferRequest level, than it shall be implemented on that level!!!

done. a legacy of debugging.

maxnick · 2023-06-28T10:58:04Z

src/plugins/intel_cpu/src/infer_request.cpp

+
+                if (canBeInPlace) {
+                    auto tt = std::get<0>(outputsTensor2BlobMap[it.first]);
+                    outputMemMngr->setTensor(tt);


As we discussed, we must not set Tensor. We have to use the memory manager of the tt underlining memory object!

maxnick · 2023-06-28T10:58:55Z

src/plugins/intel_cpu/src/infer_request.cpp

+                    outputMemMngr->setTensor(tt);
+                    DEBUG_LOG("setTensor ", tt, " graph ", graph, " inferrequest ", this);
+                } else {
+                    outputMemMngr->setTensor(nullptr);


Here we simply set a new instance of memory Mngr.

There is a memory allocation in graph init stage for output edges. In situations that zero-copy is not feasible, we could reuse that memory manager, instead of create a new instance frequently. How do you think?

const MemoryMngrPtr m_pOrigMngr; in the proxy manager is for this purpose.

maxnick · 2023-06-28T10:59:26Z

src/plugins/intel_cpu/src/graph.h

@@ -246,6 +248,8 @@ class Graph {
    std::map<std::string, NodePtr> inputNodesMap;
    std::map<std::string, NodePtr> outputNodesMap;

+    std::map<std::string, MemoryMngrPtr> outputNodesMemMngrMap;


Use specific memory manager proxy type to avoid dyn downcast.

maxnick · 2023-06-28T11:07:53Z

src/plugins/intel_cpu/src/graph.cpp

+            // deternmine a group with outputs.
+            size_t isOutGrp = 0;
+            int64_t outBoxId = -1;
+            for (auto& box : group) {
+                if (std::any_of(
+                    edge_clusters[box.id].begin(),
+                    edge_clusters[box.id].end(),
+                    [box](const ov::intel_cpu::EdgePtr edge) {
+                        return edge->getChild()->getType() == Type::Output;
+                    })) {
+                        isOutGrp++;
+                        outBoxId = box.id;
+                }
+            }
+            if (isOutGrp) {
+                IE_ASSERT(isOutGrp==1);  // reuse_io_tensors false
+                grpMemMngr =
+                    std::make_shared<OutputMemoryMngr>(grpMemMngr);
+                DEBUG_LOG(grpMemMngr, " ", this);
+
+                // Store the output memory managers.
+                // So that, the infer requests can be able to access them.
+                for (auto& edge : edge_clusters[outBoxId]) {
+                    const auto child = edge->getChild();
+                    if (child->getType() == Type::Output) {
+                        for (auto &output : outputNodesMap) {
+                            if (output.second == child) outputNodesMemMngrMap[output.first] = grpMemMngr;
+                        }
+                    }
+                }
+            }


Also there could be corner cases that more than one outputs share the same memory. e.g. each output is the output from Split.

That is why I recommended check Edge::Status::NeedAllocation status. If the edge status is different, it means it shares the memory with other edges and in that case we cann't set proxy memory manager.

I just realize that there might be situations when graph input is dynamic shape, but output is static shape (e.g. some interpolation ops).

To avoid affecting static tensors, this step may be performed after initializing static tensors, L831.

maxnick · 2023-06-28T11:13:04Z

src/plugins/intel_cpu/src/cpu_memory.h

@@ -67,6 +67,8 @@ class IMemoryMngr {
     * @return status whether the object has control over underlying memory buffer
     */
    virtual bool hasExtBuffer() const noexcept = 0;
+
+    virtual size_t getSize() const noexcept = 0;


This call is not necessary. The proxy may keep the last requested size internally and call resize every time the new memory manager is set, to insure that there is enough memory allocated.

maxnick · 2023-06-29T09:50:39Z

src/plugins/intel_cpu/src/proxy_mem_mgr.cpp

+    {
+        std::lock_guard<std::mutex> guard(m_lock);
+        if (m_validated) {
+            m_pMngr->resize(m_Size);
+            m_validated = false;
+        }
+    }


We can avoid this complexity by unconditional resize call inside setManager method.
So it may cost possible extra malloc (though it is rather a speculative conclusion) I think we can sacrifice this possibility in spite of the less code complexity. If we really face any perf issues caused by this call, we can refine this later. Also technically getRawPtr is a more frequent call than setManager and having synchronization here may have more significant perf impact.

maxnick · 2023-06-29T09:53:47Z

src/plugins/intel_cpu/src/proxy_mem_mgr.h

+
+private:
+    // We keep the original MemMngr as may fallback to copy output.
+    const MemoryMngrPtr m_pOrigMngr;


I think it is better to extract the responsibility of setting a default (fallback) memory manager to the proxy to the user. Explicit is better than implicit in this case. And anyway, the decision is still on the call user, sine the latter must set nullptr to enforce the proxy to switch to the fallback mem manager.

leave this to your decision.

maxnick · 2023-06-29T09:55:17Z

src/plugins/intel_cpu/src/node.h

@@ -732,6 +732,9 @@ class Node {
    std::unordered_map<std::string, MemoryPtr> privateWeightCache;

    CPU_DEBUG_CAP_ENABLE(friend class Verbose);
+
+public:
+    bool forceUpdateShape = false;


Why did you brought back this flag. This looks like an unnecessary extra responsibility, that may be resolved on the proxy mem manager level via storing the size from the previous run.

maxnick · 2023-06-29T09:59:04Z

src/plugins/intel_cpu/src/infer_request.cpp

+                    outputMemMngr->setManager(memptr->getMemoryMngr());
+                    DEBUG_LOG("setManager ", memptr->getMemoryMngr(), " graph ", graph, " inferrequest ", this);
+                } else {
+                    outputMemMngr->setManager(nullptr);


Here we can just change the call signature with:

outputMemMngr->setManager(std::make_shared<DnnlMemoryMngr>(make_unique<MemoryManagerWithReuse>());

maxnick · 2023-06-29T10:21:53Z

src/plugins/intel_cpu/src/graph.cpp

+                }
+            }
+        }
+        IE_ASSERT(outputNodesMemMngrMap.size() <= outputNodesMap.size());


Please add meaningful error message for easier debugging in future.

unnecessary. removed.

maxnick · 2023-06-29T10:25:24Z

src/plugins/intel_cpu/src/infer_request.cpp

-                if (!isDynamic && !externalPtr.count(name) &&
-                    data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc())) {
+                if (!externalPtr.count(name) &&
+                    (isDynamic ||(data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc()) && !isDynamic))) { // TODO: handle desc incompatible if isDynamic.


Suggested change

(isDynamic ||(data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc()) && !isDynamic))) { // TODO: handle desc incompatible if isDynamic.

(isDynamic || (data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc())) { // TODO: handle desc incompatible if isDynamic.

We do not need additional !isDynamic check since || operator is used.

yes, right. done

But there is a question: do we need tensor desc compatibility check for dynamic case?

fix Update src/plugins/intel_cpu/src/cpu_tensor.cpp Co-authored-by: Maksim Kutakov <[email protected]> Update src/plugins/intel_cpu/src/cpu_tensor.cpp Co-authored-by: Maksim Kutakov <[email protected]> remove unit test of check empty tensor data(). add mock IMemory for CPUTensor unit test. fix function template inherit issue unit test with mock memory update_strides fix fix fix CR fix and add unit test. trivial fix CR fix. code cleanup. CR fix fix afer rebase

…put memory to graph via ITensor implementation. fix throw exception CR fix remove irrelated headers. CR fix: instance ProxyMemManager for output edge which NeedAllocation stage tensor_to_blob tensor_to_blob wrap to use api 1.0 fix

github-actions bot added the category: CPU label Jun 28, 2023

ceciliapeng2011 marked this pull request as draft June 28, 2023 02:48

ceciliapeng2011 commented Jun 28, 2023

View reviewed changes

maxnick reviewed Jun 28, 2023

View reviewed changes

ceciliapeng2011 force-pushed the cecilia/itensor_stage2 branch from 5cabcc0 to abce82d Compare June 29, 2023 02:13

maxnick reviewed Jun 29, 2023

View reviewed changes

ceciliapeng2011 force-pushed the cecilia/itensor_stage2 branch from af162b0 to 931020e Compare June 30, 2023 03:55

github-actions bot added the category: inference label Jun 30, 2023

ceciliapeng2011 requested a review from maxnick June 30, 2023 03:57

ceciliapeng2011 marked this pull request as ready for review June 30, 2023 03:58

maxnick force-pushed the in_place_dynamic branch 2 times, most recently from 97b8873 to 8c87601 Compare July 10, 2023 16:03

ceciliapeng2011 added 2 commits July 10, 2023 18:23

ceciliapeng2011 force-pushed the cecilia/itensor_stage2 branch from 931020e to 8fb04bc Compare July 11, 2023 01:25

ceciliapeng2011 added 3 commits July 11, 2023 17:57

fix reorder without prepareParams when only address changes.

508a1bf

log

c0275b6

fix

c9c3ba9


		using namespace ov::intel_cpu;

		void OutputMemoryMngr::setTensor(std::shared_ptr<Tensor> tensor) {

		auto inferrequest = std::dynamic_pointer_cast<InferRequest>(this->shared_from_this());
		OPENVINO_ASSERT(inferrequest, "should be a InferRequest instance");

	(isDynamic \|\|(data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc()) && !isDynamic))) { // TODO: handle desc incompatible if isDynamic.
	(isDynamic \|\| (data->getTensorDesc() == MemoryDescUtils::convertToTensorDesc(output->second->getParentEdgesAtPort(0)[0]->getMemory().getDesc())) { // TODO: handle desc incompatible if isDynamic.

Cecilia/itensor stage2 #64

Are you sure you want to change the base?

Cecilia/itensor stage2 #64

Conversation

ceciliapeng2011 commented Jun 28, 2023

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceciliapeng2011 Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceciliapeng2011 Jun 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceciliapeng2011 Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ceciliapeng2011 Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

ceciliapeng2011 Jun 29, 2023 •

edited

Loading

ceciliapeng2011 Jun 30, 2023 •

edited

Loading

ceciliapeng2011 Jun 29, 2023 •

edited

Loading

ceciliapeng2011 Jun 29, 2023 •

edited

Loading