OnnxRuntime_test_all fails when building with TensorRT #5232

mayani-nv · 2020-09-21T01:16:31Z

I followed the instructions described here to build with TensorRT and got the following error

1:   YOU HAVE 4 DISABLED TESTS
1:
1:
1: ----- MEMORY LEAKS: 216 bytes of memory leaked in 6 allocations
1/3 Test #1: onnxruntime_test_all .............***Failed  229.37 sec
test 2
    Start 2: onnx_test_pytorch_converted

2: Test command: C:\Users\AzureUser\onnxruntime\build\Windows\Debug\Debug\onnx_test_runner.exe "C:/Users/AzureUser/onnxruntime/cmake/external/onnx/onnx/backend/test/data/pytorch-converted"
2: Test timeout computed to be: 10000000
2: 2020-09-20 16:46:09.6963656 [E:onnxruntime:Default, testcase_driver.cc:41 onnxruntime::test::TestCaseDriver::RunParallel] Running tests in parallel: at most 6 models at any time
2: 2020-09-20 16:46:17.2699163 [E:onnxruntime:Default, testcase_driver.cc:63 onnxruntime::test::TestCaseDriver::RunModelsAsync] Running tests finished. Generating report
2: result:
2:      Models: 57
2:      Total test cases: 57
2:              Succeeded: 57
2:              Not implemented: 0
2:              Failed: 0
2:      Stats by Operator type:
2:              Not implemented(0):
2:              Failed:
2: Failed Test Cases:
2/3 Test #2: onnx_test_pytorch_converted ......   Passed    9.40 sec
test 3
    Start 3: onnx_test_pytorch_operator

3: Test command: C:\Users\AzureUser\onnxruntime\build\Windows\Debug\Debug\onnx_test_runner.exe "C:/Users/AzureUser/onnxruntime/cmake/external/onnx/onnx/backend/test/data/pytorch-operator"
3: Test timeout computed to be: 10000000
3: 2020-09-20 16:46:18.6645857 [E:onnxruntime:Default, testcase_driver.cc:41 onnxruntime::test::TestCaseDriver::RunParallel] Running tests in parallel: at most 6 models at any time
3: 2020-09-20 16:46:22.0578970 [E:onnxruntime:Default, testcase_driver.cc:63 onnxruntime::test::TestCaseDriver::RunModelsAsync] Running tests finished. Generating report
3: result:
3:      Models: 24
3:      Total test cases: 24
3:              Succeeded: 24
3:              Not implemented: 0
3:              Failed: 0
3:      Stats by Operator type:
3:              Not implemented(0):
3:              Failed:
3: Failed Test Cases:
3/3 Test #3: onnx_test_pytorch_operator .......   Passed    4.78 sec

67% tests passed, 1 tests failed out of 3

Total Test time (real) = 243.60 sec

The following tests FAILED:
          1 - onnxruntime_test_all (Failed)
Errors while running CTest
Traceback (most recent call last):
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1800, in <module>
    sys.exit(main())
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1741, in main
    run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs)
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1195, in run_onnxruntime_tests
    run_subprocess(ctest_cmd, cwd=cwd, dll_path=dll_path)
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 446, in run_subprocess
    env=my_env, shell=shell)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\Program Files\\CMake\\bin\\ctest.EXE', '--build-config', 'Debug', '--verbose']' returned non-zero exit status 8.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04): WS 2016
Visual Studio : 2017 Community
ONNX Runtime installed from (source or binary): source
ONNX Runtime version: 1.7
Python version: 3.7
CUDA version: 10.1
cuDNN version: 7.6.5
GPU card: P40

Thanks in advance

The text was updated successfully, but these errors were encountered:

stevenlix · 2020-09-21T16:45:31Z

We've upgraded TensorRT to 7.1.3.4. Please download the package and use it with CUDA11 and CuDNN 8 on Windows.

mayani-nv · 2020-09-21T16:54:29Z

@stevenlix Thanks for the suggestion. I was using the TensorRT 7.1.3.4 and got the error which I posted in the previous message. So do you mean that upgrading from CUDA 10.2 to CUDA 11 and that from cuDNN 7.6.5 to cuDNN 8 will solve this error?

ppyun · 2020-09-21T22:26:41Z

@stevenlix, Thanks for your reply, Steven. Mohit (@maynai-nv) at Nvidia is trying to build ONNXRT on Windows. His question is whether ORT-TRT 7.1.3.4 with CUDA 10.2 and cuDNN 7.6.5 on Windows verified working or not.

jywu-msft · 2020-09-21T22:33:30Z

2 and cuDNN 7.6.5 on Windows verified working or not.

On Windows, I thought TensorRT 7.1.3.4 is only available with CUDA 11.0 (at least that is what the Nvidia download page indicates)
And that is the configuration we've tested on.

mayani-nv · 2020-09-22T01:03:16Z

@jywu-msft and @stevenlix. thanks for the suggestion. I updated to CUDA 11.0, cudNN 8.0 and TensorRT 7.1.3.4 and still got the same error

67% tests passed, 1 tests failed out of 3

Total Test time (real) = 521.52 sec

The following tests FAILED:
          1 - onnxruntime_test_all (Failed)
Errors while running CTest
Traceback (most recent call last):
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1800, in <module>
    sys.exit(main())
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1741, in main
    run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs)
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 1195, in run_onnxruntime_tests
    run_subprocess(ctest_cmd, cwd=cwd, dll_path=dll_path)
  File "C:\Users\AzureUser\onnxruntime\\tools\ci_build\build.py", line 446, in run_subprocess
    env=my_env, shell=shell)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\Program Files\\CMake\\bin\\ctest.EXE', '--build-config', 'Debug', '--verbose']' returned non-zero exit status 8.

jywu-msft · 2020-09-22T12:41:20Z

can you attach the full output log?

kzuiderveld · 2020-09-22T19:22:56Z

Please note that I submitted a bug report weeks ago that TensorRT was broken on Windows with the latest NVIDIA binaries (#4841). I have been waiting for a follow-up.

Can someone confirm that TensorRT 7.1.3.4/Cuda11.0/CuDnn8 is now working correctly on Windows?

mayani-nv · 2020-09-22T22:10:55Z

@jywu-msft The following is the full ouptut log.
log.txt

snnn · 2020-09-23T03:49:52Z

" ----- MEMORY LEAKS: 216 bytes of memory leaked in 6 allocations"

That's the reason.

Please try to run 'onnxruntime_test_all' in visual studio, it will give you more information.

mayani-nv · 2020-09-25T22:32:06Z

@snnn Thanks for the suggestion. I tried running 'onnxruntime_test_all'in Visual studio and got the following information.
vslog.txt

snnn · 2020-09-25T22:53:14Z

@mayani-nv Thank you for the valuable information, we'll start to fix them.

Linux13524 · 2020-11-04T11:03:59Z

" ----- MEMORY LEAKS: 216 bytes of memory leaked in 6 allocations"

I have the exact same error message, but Im not building with TensorRT:

.\build.bat --parallel --config Debug --build_dir build --cmake_generator "Visual Studio 16 2019"

full.log

snnn · 2020-11-04T16:22:01Z

Please try to run 'onnxruntime_test_all' in visual studio, it will tell us which allocation leaked.

snnn · 2020-11-04T16:23:52Z

@Linux13524 Could you please give us the onnxruntime commit id you used in the build?

Linux13524 · 2020-11-05T07:38:03Z

@snnn Here is the log from Visual Studio: vs.log

The commit-id is 5de47af which is v1.5.1.

snnn · 2020-11-05T21:03:02Z

Hi @Linux13524

Thank you. I can reproduce the error but I think the latest master is fine. As a workaround, you may go to https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/build.py#L965 and set "-Donnxruntime_ENABLE_MEMLEAK_CHECKER=OFF" for all the cases.

Linux13524 · 2020-11-09T08:59:19Z

@snnn Thanks a lot for the info! I will try and see if the latest master does work, otherwise I will use your workaround.

Linux13524 · 2020-11-09T11:52:49Z

Latest master works. Thanks again @snnn!

RandySheriffH added the ep:TensorRT issues related to TensorRT execution provider label Sep 21, 2020

snnn mentioned this issue Sep 25, 2020

Add valgrind support to our cmake files #5296

Merged

snnn self-assigned this Nov 5, 2020

snnn closed this as completed Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OnnxRuntime_test_all fails when building with TensorRT #5232

OnnxRuntime_test_all fails when building with TensorRT #5232

mayani-nv commented Sep 21, 2020

stevenlix commented Sep 21, 2020

mayani-nv commented Sep 21, 2020

ppyun commented Sep 21, 2020

jywu-msft commented Sep 21, 2020 •

edited

Loading

mayani-nv commented Sep 22, 2020

jywu-msft commented Sep 22, 2020

kzuiderveld commented Sep 22, 2020 •

edited

Loading

mayani-nv commented Sep 22, 2020

snnn commented Sep 23, 2020

mayani-nv commented Sep 25, 2020

snnn commented Sep 25, 2020

Linux13524 commented Nov 4, 2020

snnn commented Nov 4, 2020

snnn commented Nov 4, 2020

Linux13524 commented Nov 5, 2020

snnn commented Nov 5, 2020

Linux13524 commented Nov 9, 2020

Linux13524 commented Nov 9, 2020

OnnxRuntime_test_all fails when building with TensorRT #5232

OnnxRuntime_test_all fails when building with TensorRT #5232

Comments

mayani-nv commented Sep 21, 2020

stevenlix commented Sep 21, 2020

mayani-nv commented Sep 21, 2020

ppyun commented Sep 21, 2020

jywu-msft commented Sep 21, 2020 • edited Loading

mayani-nv commented Sep 22, 2020

jywu-msft commented Sep 22, 2020

kzuiderveld commented Sep 22, 2020 • edited Loading

mayani-nv commented Sep 22, 2020

snnn commented Sep 23, 2020

mayani-nv commented Sep 25, 2020

snnn commented Sep 25, 2020

Linux13524 commented Nov 4, 2020

snnn commented Nov 4, 2020

snnn commented Nov 4, 2020

Linux13524 commented Nov 5, 2020

snnn commented Nov 5, 2020

Linux13524 commented Nov 9, 2020

Linux13524 commented Nov 9, 2020

jywu-msft commented Sep 21, 2020 •

edited

Loading

kzuiderveld commented Sep 22, 2020 •

edited

Loading