Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Merge upstream/master into release #3

Merged
merged 150 commits into from
Aug 6, 2019
Merged

Merge upstream/master into release #3

merged 150 commits into from
Aug 6, 2019

Conversation

tomdol
Copy link

@tomdol tomdol commented Jul 31, 2019

No description provided.

mayeut and others added 30 commits June 27, 2019 15:45
* Update cuda for python wheels

* Update cuda for python wheels

* Update cuda for python wheels

* Update azure-pipelines-py-packaging.yml

* Update to cuda 10

* Only test win gpu

* Update cuda for python wheels

* Use manylinux2010 image to build linux python wheels

Allow wheels built to truly be compliant with a manylinux policy
* Fix NMS const_cast that modified kernel state creating
  thread safety issue. Re-factor for future CUDA implementation.
…nation for OneHot op (microsoft#1317)

* Handle nondefault negative axis value

* Support more intuitive data types for this op
…icrosoft#1289)

* Add logging messages

Implements logging messages at INFO, WARNING and ERROR levels.
Utilizes ONNX-RT's logging infrastrcture.
Reversed IsGraphSupported check logic to facilitate logging.

* Add MO exception text to WARNING log message
* Update Versioning.md

* Update Versioning.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update BUILD.md

* Update HighLevelDesign.md

* Update Versioning.md

* Update README.md

* Update tool compat table

* typo

* Updates based on feedback

* Update template to include model

* Updates based on feedback

* Typos
* init

* Update DNNLibrary

* Update DNNLibrary, set compiler flags, it compiles now

* Add more missing flags, add test

* Update DNNLibrary

* Update Compile method, fix allocator and some other bugs

* Update DNNLibrary

* Implement CopyTensor

* Not delete state explicitly since it is managed by unique_ptr

* Add the missing files when SingleUnitTestProjct is ON

* misc changes

* Fix wrong name in provider factory

* Add my own test

* Update the code of add node into graph, and add the missing initializer into graph

* Fix the bug that re-build the graph produces extra output

* Update DNNLibrary

* Transpose nchw (ONNX) -> nhwc (NNAPI)

* Add license

* Add GetSupportedNodes method (implement it later)

* Rename onnxruntime_nnapi_test->onnxruntime_nnapi_squeezenet_test

* Update squeezenet_test.cpp after rebase master

* Remove squeezenet_test.cpp since it is almost same with the c++ sample

* Update DNNLibrary for GetSupportedNodes

* Update GetSupportedNodes

* Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample"

This reverts commit a97575f.

* Update DNNLibrary

* Fix multiple outputs bug

* Remove GetKernelRegistry

* Revert "Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample""

This reverts commit 2a0670e.

* Set default memory type of NNAPI EP

* Add CPUOutput allocator

* Update DNNLibrary for multiple outputs

* Fix bug of nhwc->nchw

* Remove GetExecutionHandle()
* add scikit example

* format text

* format doc
* Improve CUDA kernel performance for Concat. Implement the kernel code instead of using cudaMemCpy in a loop.

* Update the index lookup part for Concat & Split
Description:

Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version.
Separate constant and overridable initializers so that it's clear which ones constant folding can utilize.
Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4.
Motivation and Context
In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent.

The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.
Description:

This change adds the common part of TVM based codegen library. It includes following parts:
* Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI
* Compiler pass for traversing ONNX graph and generate TVM ops
* Compiler pass for traversing generated graph and specify TVM schedule
* Compiler pass for handling weight layout
* Utils for debugging

Motivation and Context:

TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models.

This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.
* Cleanup naming of test input to use .onnx for models.

* Remove file deleted on master
* replace log sinks

* limit headers to include dir

* first changes to do dynamic linking

* wip for using cxx api

* remove weird dangling dependency

* building with tests failing

* finish updating converters

* fix const

* intital introduction of typedef

* change logging to use spdlog

* get tests passing

* clang format

* map logging levels better

* clean up unused imports

* trent cr comments

* clang-format

* code review comments

* changing buffer use to reserve

* Dynamically link

* revert tvm

* update binary uploading

* catch exceptions by const-ref

* Revert "revert tvm"

This reverts commit 387676d.

* fix typo

* update versioning of lib
* Fix unnecessary memory allocation in MKLDNN 1x1 convolution.

* remove the patch header.
* Fix link

* Update PyOp.md
* Update to include urgency

* Wording update

* Wording update
This change removes a number of unused math helpers from core/util/math.h. Most operators are already using MLAS or Eigen directly.
…rosoft#1356)

* Check for empty string as dim_param in allocation planner.
* Validate shape is compatible at runtime when re-using Tensor.
* dockerfile updates for BYOC scenario

* updates for 3 different build versions

* updating to remove libopenblas, python3, python3-pip

* Including LICENSE-IMAGE.txt for CUDA/TensorRT dockerfiles

* remove unnecessary cmake files

* fixing comment typo

* optimizing dockerfile.source as per review suggestions (not working currently)

* Optimizing dockerfiles with install_dependencies script

* update dockerfile with --cmake_extra_defines version number

* add &&\ for license copy lines

* updates, adding miniconda to path, reincluded clearing the pycache

* adding maintainer note

* update readme instructions

* update tensorrt versioning in dockerfile
microsoft#1361)

Fix the random UT failure for RNN/GRU cases which have padded sequence. e.g. max_seq = 2. batch_size =2, sequence_lengths = {2, 1}. For the output beyond the shorter sequence {1}, we should initialize the value to 0.

Root cause:
Cudnn library doesn't guarantee the value beyond the shorter sequence.
Fix:
Initialize the output Y data to all 0 before calling cudnn library.
hariharans29 and others added 8 commits July 26, 2019 11:16
* Make Squeeze operator support no axes attribute cases

* Fix build break

* Resolve PR comments and exclude tensorrt for the new tests
Rename Tensor.Size() to Tensor.SizeInBytes()
* remove the GetStream from cuda ep.

* fix comments
* Mention OrtCreateSessionFromArray in C API doc

* Cleanup a few inconsistencies in the C API.

* updates

* More updates
Python script and necessary changes in the azure-pipelines yaml file to post the binary size data from NuGet package build. Currently only posted from CPU pipeline. GPU and other pipelines may be added as necessary.
Avoid use of Hungarian naming convention for cross-platform API code.

I'm taking my cue here from the "ONNX Runtime coding conventions and standard" document which say we use the "Google C++ style guide", and that says "Do not use Hungarian notation"
https://github.com/microsoft/onnxruntime/blob/master/docs/Coding_Conventions_and_Standards.md
https://google.github.io/styleguide/cppguide.html#Windows_Code

X-ref: internal PR 4824
…rosoft#1516)

Couple of performance cleanups
  - don't create debug label string unless dumping matrixes
  - use raw pointer in fill_n calls
@tomdol tomdol self-assigned this Jul 31, 2019
tracysh and others added 20 commits July 31, 2019 12:30
Update the NCHWc graph transformer to allow Conv/Add fusion for convolutions where stride=2.
Publish daily build NuGet package to Azure blob store for sharing among internal partners
…osoft#1521)

* Fix inclusion of ARM binary in the release pkg

* Add lib and pdb as well
…osoft#1526)

* Add MacOS leg of Python packaging job

* Update copy files source directory for Mac OS leg

* Add a task to display the binaries directories contents after build wheel creation

* Revert some changes

* Add task to log

* Update

* Remove unnecessary logs
Description:
crash if the output shape has 0 in it. because the code to / output_shape[i]
Fix:
If the output shape has 0 which means output_shape.Size() is 0, so output should be null.
* Bug fix for shape of optional output in Dropout op

* Exclude new test from NGraph EP

* Account for the fact that mask could be of different type in different opset variants of the op

* Make accompanying Cuda changes

* Fix build break

* Exclude Opset 7 test for tensorRT EP

* PR comments
* memcpy is not necessary for mkldnn ep to copy from/to host.

* update
…aph input with the same name. (microsoft#1186)

* If there is an outer scope value that matches a subgraph input, don't create an implicit input from the outer scope value.

Minor unrelated change for issue noticed while debugging: Use unordered_set for implicit inputs so we don't add them multiple times.

* Add unit test based on onnx issue.
Register int64 for Greater and refactor the register code
* Add capability for the input and output of Shrink op to share a commong buffer

* Cosmetic change
)

Replaces all occurrences of VAD-R/VAD_R with VAD-M/VAD_M.
Aligns with the official hardware branding.
@tomdol tomdol merged commit cba9e61 into release Aug 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.