[TensorRT EP] support TensorRT 8.5 #13867

chilo-ms · 2022-12-06T23:05:03Z

Integrate TensorRT 8.5

Update TensorRT EP to support TensorRT 8.5
Update relevant CI pipelines
Disable known non-supported ops for TensorRT
Make timeout configurable.
We observe more than 20 hours of running unit tests with TensorRT 8.5 in package pipelines. Because we can't use placeholder to significantly reduce testing time (c-api application test will deadlock) in package pipelines, we only run subsets of model tests and unit tests that are related to TRT (add new build flag--test_all_timeout and set it to 72000 seconds by package pipelines). Just to remember, we still run all the tests in TensorRT CI pipelines to have full test coverage.
include Use onnxruntime_fetchcontent_makeavailable cmake function for TRT #13918 to fix onnx-tensorrt compile error.

…nt.)

add back "--gpus all"

…into chi_trt85

jywu-msft · 2022-12-08T05:01:55Z

fyi, main branch doesn't use git submodules anymore after #13523
need to update https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt

jywu-msft · 2022-12-08T22:29:04Z

cmake/deps.txt

@@ -24,13 +24,12 @@ microsoft_wil;https://github.com/microsoft/wil/archive/5f4caba4e7a9017816e47becd
 mimalloc;https://github.com/microsoft/mimalloc/archive/refs/tags/v2.0.3.zip;e4f37b93b2da78a5816c2495603a4188d316214b
 mp11;https://github.com/boostorg/mp11/archive/refs/tags/boost-1.79.0.zip;c8f04e378535ededbe5af52c8f969d2dedbe73d5
 onnx;https://github.com/onnx/onnx/archive/5a5f8a5935762397aa68429b5493084ff970f774.zip;edc8e1338c02f3ab222f3d803a24e17608c13895
-#Branch name: 8.4-GA
-onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/87c7a70688fd98fb355b8976f41425b40e4fe52f.zip;b97d112d9d6efa180c9b94e05268f2ff3294a534
+onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/369d6676423c2a6dbf4a5665c4b5010240d99d3c.zip;62119892edfb78689061790140c439b111491275


leave a comment indicating which branch it's from. previously there was a comment for 8.4-GA

cgmanifests/generate_cgmanifest.py

jywu-msft · 2022-12-12T16:42:59Z

cmake/onnxruntime_unittests.cmake

-    # made test name contain the "ep" and "model path" information, so we can easily filter the tests using cuda ep or other ep with *cpu__* or *xxx__*.
-    list(APPEND test_all_args "--gtest_filter=-*cpu__*:*cuda__*" )
+    if (onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS)
+       # TRT EP package pipelines takes much longer time to run tests with TRT 8.5. We can't use placeholder to reduce testing time due to application test deadlock. 


what's the impact of this?
how much did test time increase and what test coverage do we lose?

can we make the timeout configurable and schedule a daily run which runs through all the tests?

what's the impact of this? how much did test time increase and what test coverage do we lose?

The test time is 2.5 hours for TRT 8.4 to finish, but it increases to more than 9 hours for TRT 8.5 still not even finished. (I think it needs several more hours to finish)

With this change, we won't test any unit tests instead of TensorrtExecutionProviderTest, but we will run model tests.

can we make the timeout configurable and schedule a daily run which runs through all the tests?

Yes, we can.

tools/ci_build/build.py

Update following package pipelines to support TRT 8.5 after #13867: - [Linux Multi GPU TensorRT CI Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1016&_a=summary) - [Python packaging pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary) - [build-perf-test-binaries](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1130&_a=summary) - [Linux-GPU-EP-Perf](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)

Integrate TensorRT 8.5 - Update TensorRT EP to support TensorRT 8.5 - Update relevant CI pipelines - Disable known non-supported ops for TensorRT - Make timeout configurable. We observe more than [20 hours](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=256729&view=logs&j=71ce39d8-054f-502a-dcd0-e89fa9931f40) of running unit tests with TensorRT 8.5 in package pipelines. Because we can't use placeholder to significantly reduce testing time (c-api application test will deadlock) in package pipelines, we only run subsets of model tests and unit tests that are related to TRT (add new build flag--test_all_timeout and set it to 72000 seconds by package pipelines). Just to remember, we still run all the tests in TensorRT CI pipelines to have full test coverage. - include microsoft#13918 to fix onnx-tensorrt compile error. Co-authored-by: George Wu <[email protected]>

Update following package pipelines to support TRT 8.5 after microsoft#13867: - [Linux Multi GPU TensorRT CI Pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1016&_a=summary) - [Python packaging pipeline](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary) - [build-perf-test-binaries](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1130&_a=summary) - [Linux-GPU-EP-Perf](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=841&_a=summary)

Two modifications: - After [TRT 8.5](#13867) being merged, we can manually set timeout and make TRT EP only run small portion of unit tests (`onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS=ON`) due to additional TRT kernel overhead introduced by TRT 8.5 which increases test time a lot. This PR modifies the checking condition and make TensorRT CIs (can enable builder placeholder) still run most of the unit tests. - Exclude TRT EP from [Resize Opset 18](#13890) unit tests since TensorRT 8.5 supports operators up to Opset 17.

jywu-msft and others added 21 commits November 4, 2022 10:48

test TRT 8.5 GA

a5971f0

update onnx-tensorrt submodule to 8.5-GA

4d11ee8

test builtin parser

fa2a58a

try OSS parser again

ce021a2

add back --gpus all

b82948d

Revert to the state where build and test are running in container

8c1bb7f

Revert to the state where build and test are running in container (co…

0dd1129

…nt.)

Revert to the state where build and test are running in container (co…

1514ff9

…nt.)

Update linux-gpu-tensorrt-ci-pipeline.yml

42666d3

add back "--gpus all"

skip tests for known issues

de0f435

skip tests for known issues

5a217de

Update TRT Windows CI ymal

4f5ef22

Merge branch 'main' into chi_trt85

0677f5e

update CI ymals

9288869

Merge branch 'chi_trt85' of https://github.com/microsoft/onnxruntime …

100b934

…into chi_trt85

use original pool

331a947

add placeholder flag for package pipelines

f59bd59

increase timeout for TRT EP

26d9c84

revert increase timeout

50d583a

add back timeout

7494080

remove place holder since it still causes application deadlock

c59a421

chilo-ms added 5 commits December 8, 2022 19:33

increase timeout to 10 hours

cade3ab

Merge branch 'main' into chi_trt85

48d66ff

update deps.txt

10611cb

remove increased time since merging the main

948279a

fix bug

a21b306

jywu-msft reviewed Dec 8, 2022

View reviewed changes

chilo-ms added 2 commits December 9, 2022 19:29

include #13918 to fix compile issue

ff83678

add comment to deps.txt

ea0c763

chilo-ms requested a review from a team as a code owner December 12, 2022 02:24

chilo-ms assigned stevenlix, jywu-msft and yf711 Dec 12, 2022

add --skip_and_perform_filtered_tensorrt_tests to package pipeline

ed16a9b

jywu-msft reviewed Dec 12, 2022

View reviewed changes

cgmanifests/generate_cgmanifest.py Show resolved Hide resolved

jywu-msft reviewed Dec 12, 2022

View reviewed changes

chilo-ms added 2 commits December 12, 2022 20:30

make timeout configurable

61fdf47

make timeout configurable (cont.)

11c9d29

github-advanced-security bot found potential problems Dec 12, 2022

View reviewed changes

tools/ci_build/build.py Fixed Show fixed Hide fixed

chilo-ms added 7 commits December 12, 2022 22:06

make timeout configurable

c3376d9

make timeout configurable

ba4b59e

make timeout configurable

af57f18

make timeout configurable (fix bug)

68c185d

refactor

f3ccdd6

refactor

8b63162

fix for flake8 error

ffce45c

jywu-msft approved these changes Dec 13, 2022

View reviewed changes

chilo-ms unassigned stevenlix, jywu-msft and yf711 Dec 14, 2022

snnn approved these changes Dec 14, 2022

View reviewed changes

chilo-ms merged commit 5b492cb into main Dec 14, 2022

chilo-ms deleted the chi_trt85 branch December 14, 2022 21:06

chilo-ms mentioned this pull request Dec 16, 2022

Update package pipelines to support TRT 8.5 #13998

Merged

chilo-ms mentioned this pull request Jan 18, 2023

Unit test modification for TensorRT EP #14339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] support TensorRT 8.5 #13867

[TensorRT EP] support TensorRT 8.5 #13867

chilo-ms commented Dec 6, 2022 •

edited

Loading

jywu-msft commented Dec 8, 2022

jywu-msft Dec 8, 2022

jywu-msft Dec 12, 2022

jywu-msft Dec 12, 2022 •

edited by chilo-ms

Loading

chilo-ms Dec 12, 2022 •

edited

Loading

chilo-ms Dec 12, 2022 •

edited

Loading

[TensorRT EP] support TensorRT 8.5 #13867

[TensorRT EP] support TensorRT 8.5 #13867

Conversation

chilo-ms commented Dec 6, 2022 • edited Loading

jywu-msft commented Dec 8, 2022

jywu-msft Dec 8, 2022

Choose a reason for hiding this comment

jywu-msft Dec 12, 2022

Choose a reason for hiding this comment

jywu-msft Dec 12, 2022 • edited by chilo-ms Loading

Choose a reason for hiding this comment

chilo-ms Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

chilo-ms Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

chilo-ms commented Dec 6, 2022 •

edited

Loading

jywu-msft Dec 12, 2022 •

edited by chilo-ms

Loading

chilo-ms Dec 12, 2022 •

edited

Loading

chilo-ms Dec 12, 2022 •

edited

Loading