Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-703] Update to TensorRT 5, ONNX IR 3. Fix inference bugs. #13310

Merged
merged 3 commits into from
Jan 16, 2019

Conversation

KellenSunderland
Copy link
Contributor

@KellenSunderland KellenSunderland commented Nov 17, 2018

Description

This PR updates the IR which is passed to TensorRT to use version3 of the spec, which aligns much better with MXNet defaults and results in a decrease in boilerplate code. This update also fixes some bugs when building inference engines that were resulting in feature vectors that were very different from what they should have been.

Fixes #12598
Fixes #13113

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Marco:
Resolves #13459

@KellenSunderland KellenSunderland force-pushed the master branch 3 times, most recently from 3a9b7e7 to f663c47 Compare November 17, 2018 01:21
@KellenSunderland KellenSunderland changed the title WIP [MXNET-703] Update TensorRT ONNX IR to v3. Fix inference bugs. WIP [MXNET-703] Update TensorRT to v5 ONNX IR to v3. Fix inference bugs. Nov 17, 2018
@KellenSunderland KellenSunderland force-pushed the master branch 2 times, most recently from e410dd8 to f6133bf Compare November 18, 2018 01:46
@KellenSunderland KellenSunderland force-pushed the master branch 9 times, most recently from 06a47fa to 6150f4a Compare November 18, 2018 16:53
@kalyc
Copy link
Contributor

kalyc commented Nov 19, 2018

@KellenSunderland please take a look at the failed CI build

@mxnet-label-bot add [pr-work-in-progress]

@kalyc
Copy link
Contributor

kalyc commented Nov 19, 2018

@mxnet-label-bot add [pr-work-in-progress]

@marcoabreu marcoabreu added the pr-work-in-progress PR is still work in progress label Nov 19, 2018
@KellenSunderland KellenSunderland force-pushed the master branch 6 times, most recently from 22f738f to 2260f59 Compare November 20, 2018 04:40
@KellenSunderland
Copy link
Contributor Author

Note: will rebase this PR before merging.

// More information on ONNX versions and opsets can be found at:
// https://github.com/onnx/onnx/blob/master/docs/IR.md

auto opset_proto = model_proto.add_opset_import();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is copy ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I believe so, proto editing in C++ is a little strange but I've seen this pattern in several places. I've run model validation and it certainly failed for me right away if I did not have a properly set opset proto associated with my model proto for onnx v3 models.

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@KellenSunderland
Copy link
Contributor Author

Thanks for the review @larroy! Will update.

@marcoabreu do you feel your comments were addressed sufficiently? Anything you'd still like to see changed?

@marcoabreu
Copy link
Contributor

Lets wait until the Jenkins refactor has been finalized, ok?

@NRauschmayr
Copy link
Contributor

I have two minor comments:

  1. It would be good to have an environment variable to change the verbosity of nvinfer1::ILogger::Severity. The verbosity level kINFO gives details about the optimizations TensorRT has applied which can be useful for debugging purposes.
  2. When I was investigating the MXNet TensorRT issue MXNet TensorRT generates wrong results #13113 , I was previsouly running on a GPU model that was not supported. It would be good if a warning would be given in this situation.

@Roshrini
Copy link
Member

@marcoabreu Is jenking refactor finalized now? @KellenSunderland Can you rebase this PR?

@Roshrini
Copy link
Member

Roshrini commented Jan 2, 2019

@mxnet-label-bot Update [pr-awaiting-response]

@marcoabreu marcoabreu added pr-awaiting-response PR is reviewed and waiting for contributor to respond and removed pr-awaiting-review PR is waiting for code review labels Jan 2, 2019
This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.
@KellenSunderland
Copy link
Contributor Author

Rebased (sorry took a while), I want to do one more pass addressing Pedro, Marco and Nathalie's comments.

@KellenSunderland KellenSunderland added the pr-work-in-progress PR is still work in progress label Jan 14, 2019
@KellenSunderland KellenSunderland force-pushed the master branch 2 times, most recently from a1030a2 to d4d9ba3 Compare January 14, 2019 19:19
@KellenSunderland
Copy link
Contributor Author

@NRauschmayr Totally agree with your comments here. Would it be ok if we make those changes in a follow up PR?

@NRauschmayr
Copy link
Contributor

@KellenSunderland Sure, that's ok! There is no hurry for the changes which I suggested.

@KellenSunderland KellenSunderland added pr-awaiting-review PR is waiting for code review and removed pr-awaiting-response PR is reviewed and waiting for contributor to respond pr-work-in-progress PR is still work in progress labels Jan 16, 2019
@KellenSunderland KellenSunderland merged commit cc15d9a into apache:master Jan 16, 2019
stephenrawls pushed a commit to stephenrawls/incubator-mxnet that referenced this pull request Feb 16, 2019
…che#13310)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
…che#13310)

* [MXNET-703] Install CUDA 10 compatible cmake

This works around a CUDA 10 cmake issue documented here:
clab/dynet#1457

This fix is temporary; once an updated cmake package is published to Ubuntu's
package repo it may be reverted.

* [MXNET-703] Update to TensorRT 5 ONNX IR 3. Fix inference bugs.

* [MXNET-703] Describe onnx opsets and major version
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
6 participants