inference overheads optimizations #1392

fs-eire · 2019-07-12T08:51:55Z

Description:
This change makes some optimizations on various places. This change consists of a part of PR #1240 (removed the problematic part) and some other trivial fix.

reduce unnecessary copy when constructing vector or objects that contains vector as member. use std::move when applicable.
use std::vector<std::reference_wrapper<const TensorShape>> instead of std::vector<TensorShape>, when it is only for constant reference usage.
fix move constructor of TensorShape
calculate key BEFORE (instead of AFTER) acquire lock in SessionState::GetMemoryPatternGroup
other trivial fixes (code should be straightforward and self-explainable).

Motivation and Context

Profiling shows that overheads are remarkable when model is small enough. A lot of CPU instructions are related to memory alloc/free of small objects. This change is a part of the work to reduce the overheads.

skottmckay · 2019-07-12T10:12:57Z

onnxruntime/core/framework/execution_frame.cc

@@ -189,7 +189,7 @@ ExecutionFrame::ExecutionFrame(const std::vector<int>& feed_mlvalue_idxs, const
  // and we have execution plan generated, try to setup
  // memory pattern optimization.
  if (session_state.GetEnableMemoryPattern() && session_state.GetExecutionPlan()) {
-    std::vector<TensorShape> input_shapes;
+    std::vector<std::reference_wrapper<const TensorShape>> input_shapes;
    bool all_tensors = true;


This is awesome. Can you please close Task 3344 on aiinfra when this is checked in? There are a number of other potential small improvements in tasks under the same parent if you're interested.

skottmckay · 2019-07-12T10:18:41Z

onnxruntime/core/session/onnxruntime_c_api.cc

@@ -182,10 +178,8 @@ template <typename T>
 OrtStatus* CreateTensorImpl(const int64_t* shape, size_t shape_len, const OrtAllocatorInfo* info,
                            void* p_data, size_t p_data_len, std::unique_ptr<Tensor>* out) {
  size_t elem_count = 1;
-  std::vector<int64_t> shapes(shape_len);
  for (size_t i = 0; i != shape_len; ++i) {


Given TensorShape has code to do this can we create a TensorShape instance here, get Size() from it, and then std::move it when calling the Tensor ctor?

Should we also have a check somewhere to ensure there's no symbolic dimension with value of -1 leading to a negative total size?

I will figure out this in a separated PR.

snnn · 2019-07-12T16:23:44Z

onnxruntime/core/framework/tensorprotoutils.cc

@@ -466,8 +466,7 @@ Status TensorProtoToMLValue(const Env& env, const ORTCHAR_T* tensor_proto_path,
  }
  std::vector<int64_t> tensor_shape_vec = GetTensorShapeFromTensorProto(tensor_proto);
  // Note: We permit an empty tensor_shape_vec, and treat it as a scalar (a tensor of size 1).
-  TensorShape tensor_shape{tensor_shape_vec};
-  value.Init(new Tensor(type, tensor_shape, tensor_data, allocator), DataTypeImpl::GetType<Tensor>(),
+  value.Init(new Tensor(type, TensorShape(std::move(tensor_shape_vec)), tensor_data, allocator), DataTypeImpl::GetType<Tensor>(),


I would say Pranav created a bad example there. If he didn't add the move constructor for TensorShape, you won't use it

From the performance point view, I am neutral on std::move vs reinterpret_cast<>. However, copy constructor of std::vector is much heavier than either of the both and should be avoid when possible.

What's so "bad example" about the move constructor? What's bad is exposing implementation details of a class like this, using reinterpret_cast and then doing additional checks to ensure the cast is successful. Really?

We don't need the move constructor. If you didn't add it, nobody would use it.

Change of this line exactly shows why a move constructor is needed.

I mean, I do not have to use a move constructor. My goal is to reduce the unnecessary copy of a std::vector object. Is there a better way to do here?

BTW, profiling shows that the major overhead comes from memory allocation when copying std::vector. Few instructions in move constructor are trivial.

snnn · 2019-07-12T17:23:05Z

Indeed, I really don't like the move constructor of TensorShape. I think it's unnecessary. It's harmless as long as nobody uses it.

TensorShape is std::vector. They have the same memory layout. You can reinterpret_cast one of them to the other. So, there is no real use case for a move. You can still do it, it's cheap, but it's not the best way.

snnn · 2019-07-12T17:26:21Z

onnxruntime/core/framework/onnxruntime_typeinfo.cc

@@ -111,7 +111,8 @@ OrtStatus* OrtTypeInfo::FromDataTypeImpl(const ONNX_NAMESPACE::TypeProto* input,
        auto& t = s.dim(i);
        shape_data[i] = t.has_dim_value() ? t.dim_value() : -1;
      }
-      st = GetTensorShapeAndType(reinterpret_cast<const TensorShape*>(&shape_data), type, &info);


The original code is good. Casting is a no-op at runtime.

The original code is tricky. It's error-prone.

In reply to: 303077037 [](ancestors = 303077037)

But that is the design.

See: https://github.com/microsoft/onnxruntime/blob/master/include/onnxruntime/core/framework/tensor_shape.h#L135

This function has been there for a long long time. It was added by a WinML guy, and it is widely used in WinML. We must guarantee it always works.

TensorShape is std::vector. They have the same memory layout. You can reinterpret_cast one of them to the other. So, there is no real use case for a move. You can still do it, it's cheap, but it's not the best way.

This is the implementation detail of type TensorShape. Since TensorShape is private inherited from std::vector, this implementation detail is hidden from those who use this class. Anyone who use TensorShape does not need to know, and should not assume this detail.

@fs-eire In general, you're right. But in this project, for this class('TensorShape'), we can treat it special. Because it's so important, every dev should know it is just an alias of std::vector<int64_t>, like a typedef.
For example, you may found this class is strange. In general, every inherited class must has a virtual destructor, otherwise the object may not get deleted correctly . But this class doesn't have it, and std::vector<int64_t> doesn't have it too. It's kind of violating the basic law, but it doesn't mean the code is bad. It designed as that, so every user should know the contract and obey it. Otherwise what's the meaning of inheriting from std::vector? We can easily implement this class without bothering std::vector. We do the inheritance only because we want to do the casting.

The virtual destructor applies for public inheritance only; so it's not violating any law here. If you're publicly inheriting without a virtual destructor, it is indeed bad code, unless you know for sure no one is going to store the derived class ptr in the base class ptr.
Private inheritance means "is implemented in terms of". This means the vector is purely an implementation detail of the TensorShape class.

yufenglee · 2019-07-12T17:56:23Z

I would say reinterpret_cast is dangerous. If there is a need to add a new variable member for TensorShape and the author who adds it doesn't know this trick, it will be a disaster.

In reply to: 510966606 [](ancestors = 510966606)

yufenglee

snnn · 2019-07-12T18:04:52Z

I would say reinterpret_cast is dangerous. If there is a need to add a new variable member for TensorShape and the author who adds it doesn't know this trick, it will be a disaster.

In reply to: 510966606 [](ancestors = 510966606)

That's safe. we have the check.

snnn

Usually, we should avoid using std::move

fs-eire · 2019-07-13T14:17:12Z

Usually, we should avoid using std::move

Is there any context of this guideline? I think it helps in a lot of places where a copy of std::vector can be avoided in our code base

snnn · 2019-07-13T17:31:59Z

Usually, we should avoid using std::move

Is there any context of this guideline? I think it helps in a lot of places where a copy of std::vector can be avoided in our code base

A guy in VC++ team said:" The C++ Core Guidelines recommends avoiding std::move(), so that is your red flag when the junior dev does the code transformation."

The transformation he was referring to change

       std::vector<MyClass> v;
       v.push_back(func());

To

      std::vector<MyClass> v;
       MyClass c = func();
       v.push_back(std::move(c));

I guess he was saying, in most cases, std::move is unnecessary.

fs-eire · 2019-07-18T18:01:17Z

I reverted changes related to TensorShape. I think stuffs need to be further discussed before lead to a concrete conclusion. A separated PR will be created after all those concerns resolved.

Other optimizations remained should be straightforward and clear.

snnn · 2019-07-19T02:40:54Z

Thanks.

fs-eire requested a review from a team as a code owner July 12, 2019 08:51

fs-eire mentioned this pull request Jul 12, 2019

various optimization on memory allocation #1240

Closed

use std::move to avoid some unnecessary copy

896804a

fs-eire force-pushed the trivial_perf_fix branch from 2a8ec78 to 896804a Compare July 12, 2019 08:55

skottmckay reviewed Jul 12, 2019

View reviewed changes

linkerzhang previously approved these changes Jul 12, 2019

View reviewed changes

snnn reviewed Jul 12, 2019

View reviewed changes

yufenglee previously approved these changes Jul 12, 2019

View reviewed changes

snnn reviewed Jul 12, 2019

View reviewed changes

fs-eire added 2 commits July 17, 2019 21:01

Merge remote-tracking branch 'ms/master' into trivial_perf_fix

b3f5f27

revert reinterpret_cast

625a319

fs-eire dismissed stale reviews from yufenglee and linkerzhang via 625a319 July 18, 2019 07:01

fs-eire added 2 commits July 18, 2019 10:56

revert TensorShape related changes

60e8b57

Merge remote-tracking branch 'ms/master' into trivial_perf_fix

f58e7c0

snnn approved these changes Jul 18, 2019

View reviewed changes

snnn merged commit 887930e into microsoft:master Jul 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference overheads optimizations #1392

inference overheads optimizations #1392

fs-eire commented Jul 12, 2019

skottmckay Jul 12, 2019

skottmckay Jul 12, 2019

fs-eire Jul 18, 2019

snnn Jul 12, 2019

fs-eire Jul 18, 2019

pranavsharma Jul 18, 2019 •

edited

Loading

snnn Jul 18, 2019

fs-eire Jul 18, 2019 •

edited

Loading

snnn commented Jul 12, 2019

snnn Jul 12, 2019

yufenglee Jul 12, 2019

snnn Jul 12, 2019

snnn Jul 12, 2019

fs-eire Jul 13, 2019

snnn Jul 13, 2019

pranavsharma Jul 18, 2019 •

edited

Loading

yufenglee commented Jul 12, 2019

yufenglee left a comment

snnn commented Jul 12, 2019

snnn left a comment

fs-eire commented Jul 13, 2019

snnn commented Jul 13, 2019

fs-eire commented Jul 18, 2019

snnn commented Jul 19, 2019

inference overheads optimizations #1392

inference overheads optimizations #1392

Conversation

fs-eire commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pranavsharma Jul 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fs-eire Jul 18, 2019 • edited Loading

Choose a reason for hiding this comment

snnn commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pranavsharma Jul 18, 2019 • edited Loading

Choose a reason for hiding this comment

yufenglee commented Jul 12, 2019

yufenglee left a comment

Choose a reason for hiding this comment

snnn commented Jul 12, 2019

snnn left a comment

Choose a reason for hiding this comment

fs-eire commented Jul 13, 2019

snnn commented Jul 13, 2019

fs-eire commented Jul 18, 2019

snnn commented Jul 19, 2019

pranavsharma Jul 18, 2019 •

edited

Loading

fs-eire Jul 18, 2019 •

edited

Loading

pranavsharma Jul 18, 2019 •

edited

Loading