Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] support TensorRT 8.5 #13867

Merged
merged 52 commits into from
Dec 14, 2022
Merged

[TensorRT EP] support TensorRT 8.5 #13867

merged 52 commits into from
Dec 14, 2022

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Dec 6, 2022

Integrate TensorRT 8.5

  • Update TensorRT EP to support TensorRT 8.5

  • Update relevant CI pipelines

  • Disable known non-supported ops for TensorRT

  • Make timeout configurable.
    We observe more than 20 hours of running unit tests with TensorRT 8.5 in package pipelines. Because we can't use placeholder to significantly reduce testing time (c-api application test will deadlock) in package pipelines, we only run subsets of model tests and unit tests that are related to TRT (add new build flag--test_all_timeout and set it to 72000 seconds by package pipelines). Just to remember, we still run all the tests in TensorRT CI pipelines to have full test coverage.

  • include Use onnxruntime_fetchcontent_makeavailable cmake function for TRT #13918 to fix onnx-tensorrt compile error.

@jywu-msft
Copy link
Member

fyi, main branch doesn't use git submodules anymore after #13523
need to update https://github.com/microsoft/onnxruntime/blob/main/cmake/deps.txt

@@ -24,13 +24,12 @@ microsoft_wil;https://github.com/microsoft/wil/archive/5f4caba4e7a9017816e47becd
mimalloc;https://github.com/microsoft/mimalloc/archive/refs/tags/v2.0.3.zip;e4f37b93b2da78a5816c2495603a4188d316214b
mp11;https://github.com/boostorg/mp11/archive/refs/tags/boost-1.79.0.zip;c8f04e378535ededbe5af52c8f969d2dedbe73d5
onnx;https://github.com/onnx/onnx/archive/5a5f8a5935762397aa68429b5493084ff970f774.zip;edc8e1338c02f3ab222f3d803a24e17608c13895
#Branch name: 8.4-GA
onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/87c7a70688fd98fb355b8976f41425b40e4fe52f.zip;b97d112d9d6efa180c9b94e05268f2ff3294a534
onnx_tensorrt;https://github.com/onnx/onnx-tensorrt/archive/369d6676423c2a6dbf4a5665c4b5010240d99d3c.zip;62119892edfb78689061790140c439b111491275
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a comment indicating which branch it's from. previously there was a comment for 8.4-GA

@chilo-ms chilo-ms requested a review from a team as a code owner December 12, 2022 02:24
# made test name contain the "ep" and "model path" information, so we can easily filter the tests using cuda ep or other ep with *cpu__* or *xxx__*.
list(APPEND test_all_args "--gtest_filter=-*cpu__*:*cuda__*" )
if (onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS)
# TRT EP package pipelines takes much longer time to run tests with TRT 8.5. We can't use placeholder to reduce testing time due to application test deadlock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the impact of this?
how much did test time increase and what test coverage do we lose?

Copy link
Member

@jywu-msft jywu-msft Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the timeout configurable and schedule a daily run which runs through all the tests?

Copy link
Contributor Author

@chilo-ms chilo-ms Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the impact of this? how much did test time increase and what test coverage do we lose?

The test time is 2.5 hours for TRT 8.4 to finish, but it increases to more than 9 hours for TRT 8.5 still not even finished. (I think it needs several more hours to finish)

With this change, we won't test any unit tests instead of TensorrtExecutionProviderTest, but we will run model tests.

Copy link
Contributor Author

@chilo-ms chilo-ms Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the timeout configurable and schedule a daily run which runs through all the tests?

Yes, we can.

tools/ci_build/build.py Fixed Show fixed Hide fixed
@chilo-ms chilo-ms merged commit 5b492cb into main Dec 14, 2022
@chilo-ms chilo-ms deleted the chi_trt85 branch December 14, 2022 21:06
henrywu2019 pushed a commit to henrywu2019/onnxruntime that referenced this pull request Dec 26, 2022
Integrate TensorRT 8.5

- Update TensorRT EP to support TensorRT 8.5
- Update relevant CI pipelines
- Disable known non-supported ops for TensorRT
- Make timeout configurable.
We observe more than [20
hours](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=256729&view=logs&j=71ce39d8-054f-502a-dcd0-e89fa9931f40)
of running unit tests with TensorRT 8.5 in package pipelines. Because we
can't use placeholder to significantly reduce testing time (c-api
application test will deadlock) in package pipelines, we only run
subsets of model tests and unit tests that are related to TRT (add new
build flag--test_all_timeout and set it to 72000 seconds by package
pipelines). Just to remember, we still run all the tests in TensorRT CI
pipelines to have full test coverage.

- include microsoft#13918 to fix
onnx-tensorrt compile error.

Co-authored-by: George Wu <[email protected]>
henrywu2019 pushed a commit to henrywu2019/onnxruntime that referenced this pull request Dec 26, 2022
chilo-ms added a commit that referenced this pull request Jan 19, 2023
Two modifications:

- After [TRT 8.5](#13867)
being merged, we can manually set timeout and make TRT EP only run small
portion of unit tests
(`onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS=ON`) due to
additional TRT kernel overhead introduced by TRT 8.5 which increases
test time a lot. This PR modifies the checking condition and make
TensorRT CIs (can enable builder placeholder) still run most of the unit
tests.
- Exclude TRT EP from [Resize Opset
18](#13890) unit tests
since TensorRT 8.5 supports operators up to Opset 17.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants