[Bug] Segmentation Fault in libMKLDNNPlugin.so when using blobs with pre-allocated buffers #4742

tanmayv25 · 2021-03-11T19:45:50Z

System information (version)

OpenVINO=> 2021.2.185
Operating System / Platform => 18.04.1-Ubuntu
Compiler => gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Problem classification => Inference

Detailed description

For my use case where I already had input data in memory, I created InferenceEngine::Blob objects using the pointer to the pre-allocated buffer and size. I had to free the buffer once the inference results were retrieved and create/set another blob with the next requests. First several runs are successful with expected results, however, I see a segmentation fault after some iterations.

When using the valgrind it looks like in the Infer() call the libMKLDNNPlugin.so attempts to access the the memory address of one of the previous request which has been already freed when the corresponding request completed. I expect once the SetBlob is called on the InferRequest it should override the previous SetBlob call completely and the Infer call should use the latest blob for inference.

See the valgrind log here:

==1842== Command: ./repeated_wraps /tmp/host/model.xml 10000000 CPU
==1842== 
==1842== Thread 2:
==1842== Invalid write of size 2
==1842==    at 0x4C38753: memmove (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1842==    by 0x7A1E24A: ??? (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so)
==1842==    by 0x7A7311C: ??? (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so)
==1842==    by 0x7A8D4DA: ??? (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so)
==1842==    by 0x7A2C59C: ??? (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/libMKLDNNPlugin.so)
==1842==    by 0x5ACAAC1: tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/external/tbb/lib/libtbb.so.2)
==1842==    by 0x4EDC5F0: ??? (in /opt/intel/openvino_2021.2.185/deployment_tools/inference_engine/lib/intel64/libinference_engine.so)
==1842==    by 0x51CE6DE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==1842==    by 0x62946DA: start_thread (pthread_create.c:463)
==1842==    by 0x57D371E: clone (clone.S:95)
==1842==  Address 0x9d810e0 is 0 bytes inside a block of size 4 free'd
==1842==    at 0x4C32D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1842==    by 0x10D4CB: main (in /root/inference_engine_cpp_samples_build/intel64/Release/repeated_wraps)
==1842==  Block was alloc'd at
==1842==    at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1842==    by 0x10CFE9: main (in /root/inference_engine_cpp_samples_build/intel64/Release/repeated_wraps)

I hacked the hello_classification sample code little bit to repeatedly issue the Infer requests to an IR model converted from ONNX Identity model. The main section looks like:

for (int i = 0; i < 10000; i++) {
        // --------------------------- 6. Prepare input --------------------------------------------------------
        float* input_ptr = (float*)malloc(4);
        *input_ptr = 2.2f;

        // Blob::Ptr imgBlob = wrapMat2Blob(image);  // just wrap Mat data by Blob::Ptr without allocating of new memory
        InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
        auto tensor_desc = input_info->getTensorDesc();
        Blob::Ptr data_blob = make_shared_blob<float>(tensor_desc, input_ptr, 4);
        infer_request.SetBlob(input_name, data_blob);  // infer_request accepts input blob of any size
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 7. Do inference --------------------------------------------------------
        /* Running the request synchronously */
        infer_request.Infer();
        // -----------------------------------------------------------------------------------------------------

        // --------------------------- 8. Process output ------------------------------------------------------
        Blob::Ptr output = infer_request.GetBlob(output_name);
        float* output_ptr = output->buffer().as<float *>();
        if (*output_ptr != 2.2f) {
           std::cout << "mismatch found " << *output_ptr << std::endl;
        }
        // Print classification results
        free(input_ptr);
        // -----------------------------------------------------------------------------------------------------
}

Steps to reproduce

ov_segf_repro.zip
The attached zip file contains

The hacked sample in a directory repeated_wraps.
The model.onnx file.
The compiled repeated_wraps binary. [redundant if compiling from the above source]
IR model directory [redundant if converting from model.xml ]

Follow the below steps to reproduce the segmentation fault:

Extract and copy the attached archive file into openvino/ubuntu18_dev container image.
Convert the model.onnx to IR using model optimizer in the container:

root@1b6004fab03d:/opt/intel/openvino_2021.2.185/deployment_tools/model_optimizer# python3 mo.py --input
_model model.onnx --input_shape [1]

Copy the repeated_wraps folder to the /opt/intel/openvino_2021.2.185/inference_engine/samples/cpp
apt install build-essential then change to the above directory.
Run ./build_samples.sh script.
Move to ~/inference_engine_cpp_samples_build/intel64/Release
Run the compiled binary like below pointing to the IR model generated in step 2.

./repeated_wraps <path_to_model.xml>/model.xml 10000000 CPU

A segmentation fault will occur.

Some example runs:

root@5309ddf0990e:/tmp/host/ov_segf_repro# ./repeated_wraps_bin IRModels/model.xml 1 CPU
This sample is an API example, for any performance measurements please use the dedicated benchmark_app tool
root@5309ddf0990e:/tmp/host/ov_segf_repro# ./repeated_wraps_bin IRModels/model.xml 2 CPU
This sample is an API example, for any performance measurements please use the dedicated benchmark_app tool
root@5309ddf0990e:/tmp/host/ov_segf_repro# ./repeated_wraps_bin IRModels/model.xml 100 CPU
Segmentation fault (core dumped)

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues, Stack Overflow, etc and have not found solution
There is reproducer code and related data files: images, videos, models, etc.

The text was updated successfully, but these errors were encountered:

jgespino · 2021-03-15T18:56:47Z

Hi @tanmayv25

Thanks for reporting, providing necessary files and steps to reproduce! I was able to reproduce the segmentation fault with niter set to 3 and above. Interesting, your application works fine when using a GPU device. We have opened a bug under the CPU plugin for the development team to investigate.

Regards,
Jesus

Ref. 51087

maxnick · 2021-03-17T07:39:46Z

Hi @tanmayv25

Thanks for the reproducer. But the attached model is a degenerate case that triggers an unusual call sequence that has little to do with real-world use cases, but it should work anyways and will be fixed. However, I suspect that this reproducer does not reveal the real problem. I suppose that your original model is somewhat more complex and the problem with it may have a different source. Am I right? If so, could you please provide the original model that the problem initially occurred with?

Regards,
Maksim

tanmayv25 · 2021-03-17T12:39:01Z

Hi @maxnick,
Thanks for quick response...
I am working on the openvino backend for Triton. The model that I have shared(identity model) comes from Triton's CI testing. In order to productize the backend this is an essential test. The identity model helps in understanding the overhead added by Triton for max throughput case. Also this is the case that I have been able to consistently reproduce outside triton.

That being said, I have tried running valgrind on Triton+OV for resnet50_int8 model and saw the following reports:

Invalid read of size 32
==5608==    at 0x182A64090: ???
==5608==    by 0x2: ???
==5608==    by 0x17AB3F63F: ???
==5608==    by 0x2: ???
==5608==  Address 0x189c5bf90 is 3,984 bytes inside a block of size 4,004 alloc'd
==5608==    at 0x483E0F0: memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==5608==    by 0x483E212: posix_memalign (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==5608==    by 0x17466FB5D: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x17463F05D: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x17463F554: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x17306A3B7: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x1730A5D53: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x174512F48: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x1746BE1CF: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x1746C00C7: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x174689CED: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)
==5608==    by 0x174689EBB: ??? (in /opt/tritonserver/backends/openvino/libMKLDNNPlugin.so)

This doesn't cause seg faults like the model I shared before and I don't find any inconsistencies in the results. The output is of 1001 FP32 values with probability for each class. I don't get why libMKLDNNPlugin.so will try to read 32 bytes at the margin. It looked like an artifact from avx instructions. Unfortunately, I don't have a simple reproducer for the latter case but I can share one if it is of interest.

tanmayv25 · 2021-04-12T08:46:21Z

@maxnick Has there been any progress on the resolution of this issue? An identity model is the most simplistic and serve various utilities for the framework use-cases.

jgespino · 2021-04-12T16:00:28Z

@tanmayv25 Take a look at PR #5000 for updates.

Regards,
Jesus

tanmayv25 added bug Something isn't working support_request labels Mar 11, 2021

Iffa-Intel self-assigned this Mar 12, 2021

Iffa-Intel added ONNX Related to support for ONNX standard. and removed bug Something isn't working labels Mar 15, 2021

jgespino added bug Something isn't working PSE labels Mar 15, 2021

jgespino self-assigned this Mar 15, 2021

maxnick linked a pull request Apr 13, 2021 that will close this issue

[CPU] Added support for network inputs and outputs with the same name. #5000

Merged

dmitry-gorokhov closed this as completed in #5000 May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Segmentation Fault in libMKLDNNPlugin.so when using blobs with pre-allocated buffers #4742

[Bug] Segmentation Fault in libMKLDNNPlugin.so when using blobs with pre-allocated buffers #4742

tanmayv25 commented Mar 11, 2021 •

edited

Loading

jgespino commented Mar 15, 2021

maxnick commented Mar 17, 2021

tanmayv25 commented Mar 17, 2021 •

edited

Loading

tanmayv25 commented Apr 12, 2021 •

edited

Loading

jgespino commented Apr 12, 2021

[Bug] Segmentation Fault in libMKLDNNPlugin.so when using blobs with pre-allocated buffers #4742

[Bug] Segmentation Fault in libMKLDNNPlugin.so when using blobs with pre-allocated buffers #4742

Comments

tanmayv25 commented Mar 11, 2021 • edited Loading

System information (version)

Detailed description

Steps to reproduce

Issue submission checklist

jgespino commented Mar 15, 2021

maxnick commented Mar 17, 2021

tanmayv25 commented Mar 17, 2021 • edited Loading

tanmayv25 commented Apr 12, 2021 • edited Loading

jgespino commented Apr 12, 2021

tanmayv25 commented Mar 11, 2021 •

edited

Loading

tanmayv25 commented Mar 17, 2021 •

edited

Loading

tanmayv25 commented Apr 12, 2021 •

edited

Loading