Merge upstream/master into release #3

tomdol · 2019-07-31T14:56:53Z

No description provided.

* Update cuda for python wheels * Update cuda for python wheels * Update cuda for python wheels * Update azure-pipelines-py-packaging.yml * Update to cuda 10 * Only test win gpu * Update cuda for python wheels * Use manylinux2010 image to build linux python wheels Allow wheels built to truly be compliant with a manylinux policy

* Fix NMS const_cast that modified kernel state creating thread safety issue. Re-factor for future CUDA implementation.

…nation for OneHot op (microsoft#1317) * Handle nondefault negative axis value * Support more intuitive data types for this op

…ld (microsoft#1316)

…icrosoft#1289) * Add logging messages Implements logging messages at INFO, WARNING and ERROR levels. Utilizes ONNX-RT's logging infrastrcture. Reversed IsGraphSupported check logic to facilitate logging. * Add MO exception text to WARNING log message

* Update Versioning.md * Update Versioning.md * Update README.md * Update README.md * Update README.md * Update README.md * Update BUILD.md * Update HighLevelDesign.md * Update Versioning.md * Update README.md * Update tool compat table * typo * Updates based on feedback * Update template to include model * Updates based on feedback * Typos

Resolve microsoft#1322

* init * Update DNNLibrary * Update DNNLibrary, set compiler flags, it compiles now * Add more missing flags, add test * Update DNNLibrary * Update Compile method, fix allocator and some other bugs * Update DNNLibrary * Implement CopyTensor * Not delete state explicitly since it is managed by unique_ptr * Add the missing files when SingleUnitTestProjct is ON * misc changes * Fix wrong name in provider factory * Add my own test * Update the code of add node into graph, and add the missing initializer into graph * Fix the bug that re-build the graph produces extra output * Update DNNLibrary * Transpose nchw (ONNX) -> nhwc (NNAPI) * Add license * Add GetSupportedNodes method (implement it later) * Rename onnxruntime_nnapi_test->onnxruntime_nnapi_squeezenet_test * Update squeezenet_test.cpp after rebase master * Remove squeezenet_test.cpp since it is almost same with the c++ sample * Update DNNLibrary for GetSupportedNodes * Update GetSupportedNodes * Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample" This reverts commit a97575f. * Update DNNLibrary * Fix multiple outputs bug * Remove GetKernelRegistry * Revert "Revert "Remove squeezenet_test.cpp since it is almost same with the c++ sample"" This reverts commit 2a0670e. * Set default memory type of NNAPI EP * Add CPUOutput allocator * Update DNNLibrary for multiple outputs * Fix bug of nhwc->nchw * Remove GetExecutionHandle()

* add scikit example * format text * format doc

* Revisions * Minor fix

* Improve CUDA kernel performance for Concat. Implement the kernel code instead of using cudaMemCpy in a loop. * Update the index lookup part for Concat & Split

Description: Disallow overriding an initializer via a graph input if the IR version is < 4. This enforces an implicit assumption that initializers should be treated as constant, and allows constant folding to be done on a model with an older IR version. Separate constant and overridable initializers so that it's clear which ones constant folding can utilize. Update Graph to not add all initializers to the graph inputs when the graph is manually created (i.e. not loaded from a GraphProto) and the IR version is >= 4. Motivation and Context In order to do constant folding we need to know which initializers can be treated as constant and which are overridable. All initializers were required to have a matching graph input prior to IR version 4, technically making all of them overridable. The intention however was for them to be treated as constants, and this change enforces that intent. The benefit of doing so is that constant folding will work for models with IR version < 4. The cost is that if someone is actually overriding an initializer they will need to update the IR version of their model to version 4 in order to keep doing so. The belief is that this is a very small subset of usage (e.g. models involving feeding in a truncated sequence) and the cost to update that small subset is warranted by the benefit of constant folding being able to be enabled on all older models without them needing an IR version update.

Description: This change adds the common part of TVM based codegen library. It includes following parts: * Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI * Compiler pass for traversing ONNX graph and generate TVM ops * Compiler pass for traversing generated graph and specify TVM schedule * Compiler pass for handling weight layout * Utils for debugging Motivation and Context: TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models. This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.

* Cleanup naming of test input to use .onnx for models. * Remove file deleted on master

* replace log sinks * limit headers to include dir * first changes to do dynamic linking * wip for using cxx api * remove weird dangling dependency * building with tests failing * finish updating converters * fix const * intital introduction of typedef * change logging to use spdlog * get tests passing * clang format * map logging levels better * clean up unused imports * trent cr comments * clang-format * code review comments * changing buffer use to reserve * Dynamically link * revert tvm * update binary uploading * catch exceptions by const-ref * Revert "revert tvm" This reverts commit 387676d. * fix typo * update versioning of lib

…ft#1340)

* Fix unnecessary memory allocation in MKLDNN 1x1 convolution. * remove the patch header.

* Fix link * Update PyOp.md

* Update to include urgency * Wording update * Wording update

…'s internal representation. (microsoft#1353)

This change removes a number of unused math helpers from core/util/math.h. Most operators are already using MLAS or Eigen directly.

…rosoft#1356) * Check for empty string as dim_param in allocation planner. * Validate shape is compatible at runtime when re-using Tensor.

* dockerfile updates for BYOC scenario * updates for 3 different build versions * updating to remove libopenblas, python3, python3-pip * Including LICENSE-IMAGE.txt for CUDA/TensorRT dockerfiles * remove unnecessary cmake files * fixing comment typo * optimizing dockerfile.source as per review suggestions (not working currently) * Optimizing dockerfiles with install_dependencies script * update dockerfile with --cmake_extra_defines version number * add &&\ for license copy lines * updates, adding miniconda to path, reincluded clearing the pycache * adding maintainer note * update readme instructions * update tensorrt versioning in dockerfile

microsoft#1361) Fix the random UT failure for RNN/GRU cases which have padded sequence. e.g. max_seq = 2. batch_size =2, sequence_lengths = {2, 1}. For the output beyond the shorter sequence {1}, we should initialize the value to 0. Root cause: Cudnn library doesn't guarantee the value beyond the shorter sequence. Fix: Initialize the output Y data to all 0 before calling cudnn library.

* Make Squeeze operator support no axes attribute cases * Fix build break * Resolve PR comments and exclude tensorrt for the new tests

Rename Tensor.Size() to Tensor.SizeInBytes()

* remove the GetStream from cuda ep. * fix comments

…soft#1506)

* Mention OrtCreateSessionFromArray in C API doc * Cleanup a few inconsistencies in the C API. * updates * More updates

Python script and necessary changes in the azure-pipelines yaml file to post the binary size data from NuGet package build. Currently only posted from CPU pipeline. GPU and other pipelines may be added as necessary.

Avoid use of Hungarian naming convention for cross-platform API code. I'm taking my cue here from the "ONNX Runtime coding conventions and standard" document which say we use the "Google C++ style guide", and that says "Do not use Hungarian notation" https://github.com/microsoft/onnxruntime/blob/master/docs/Coding_Conventions_and_Standards.md https://google.github.io/styleguide/cppguide.html#Windows_Code X-ref: internal PR 4824

…rosoft#1516) Couple of performance cleanups - don't create debug label string unless dumping matrixes - use raw pointer in fill_n calls

Update the NCHWc graph transformer to allow Conv/Add fusion for convolutions where stride=2.

Publish daily build NuGet package to Azure blob store for sharing among internal partners

…osoft#1521) * Fix inclusion of ARM binary in the release pkg * Add lib and pdb as well

…osoft#1526) * Add MacOS leg of Python packaging job * Update copy files source directory for Mac OS leg * Add a task to display the binaries directories contents after build wheel creation * Revert some changes * Add task to log * Update * Remove unnecessary logs

Description: crash if the output shape has 0 in it. because the code to / output_shape[i] Fix: If the output shape has 0 which means output_shape.Size() is 0, so output should be null.

* Bug fix for shape of optional output in Dropout op * Exclude new test from NGraph EP * Account for the fact that mask could be of different type in different opset variants of the op * Make accompanying Cuda changes * Fix build break * Exclude Opset 7 test for tensorRT EP * PR comments

* Update test data

* memcpy is not necessary for mkldnn ep to copy from/to host. * update

…aph input with the same name. (microsoft#1186) * If there is an outer scope value that matches a subgraph input, don't create an implicit input from the outer scope value. Minor unrelated change for issue noticed while debugging: Use unordered_set for implicit inputs so we don't add them multiple times. * Add unit test based on onnx issue.

Register int64 for Greater and refactor the register code

* Add capability for the input and output of Shrink op to share a commong buffer * Cosmetic change

) Replaces all occurrences of VAD-R/VAD_R with VAD-M/VAD_M. Aligns with the official hardware branding.

mayeut and others added 30 commits June 27, 2019 15:45

Fix NMS const_cast that modified kernel state creating (microsoft#1303)

2f698bd

* Fix NMS const_cast that modified kernel state creating thread safety issue. Re-factor for future CUDA implementation.

enable tests (microsoft#1310)

2698edb

Support non-default negative axis value and intuitive data type combi…

a077ac8

…nation for OneHot op (microsoft#1317) * Handle nondefault negative axis value * Support more intuitive data types for this op

Uninstall the preinstalled cmake in tensorrt image because it's too o…

28759e2

…ld (microsoft#1316)

Fix typo: op[s]iops -> op[t]ions. (microsoft#1329)

98ea675

Resolve microsoft#1322

Rashuai/py op example (microsoft#1325)

bf6a9f9

* add scikit example * format text * format doc

Ryanunderhill/MNIST sample (microsoft#1330)

1bf80e3

PyOp documentation Revisions (microsoft#1318)

5e54bbf

* Revisions * Minor fix

Implement the Concat CUDA kernel (microsoft#1333)

2a6c69d

* Improve CUDA kernel performance for Concat. Implement the kernel code instead of using cudaMemCpy in a loop. * Update the index lookup part for Concat & Split

Cleanup naming of test input to use .onnx for models. (microsoft#1337)

e3919d3

* Cleanup naming of test input to use .onnx for models. * Remove file deleted on master

Update ONNX Runtime server doc to reference Jupyter notebook (microso…

2714576

…ft#1340)

Copy shared library after build ORT Server (microsoft#1347)

9f9ff19

Reduce memory footprint of nGraph (microsoft#1296)

93528d9

* Fix unnecessary memory allocation in MKLDNN 1x1 convolution. * remove the patch header.

Fix link (microsoft#1351)

b7ae0d5

* Fix link * Update PyOp.md

Issue template update (microsoft#1339)

5b93b02

* Update to include urgency * Wording update * Wording update

Make GetTensorShapeFromTensorShapeProto return TensorShape and not it…

e9ce51e

…'s internal representation. (microsoft#1353)

remove unused math routines (microsoft#1354)

3a58886

This change removes a number of unused math helpers from core/util/math.h. Most operators are already using MLAS or Eigen directly.

Remove AgentPool setting in CI yaml

58d6ff3

Add validation of shape when re-using a buffer in ExecutionFrame (mic…

ac6a4af

…rosoft#1356) * Check for empty string as dim_param in allocation planner. * Validate shape is compatible at runtime when re-using Tensor.

fix non-standard u_int32_t type (microsoft#1358)

3cae067

Fix an SAL annotation in onnxruntime_c_api.h

27da857

hariharans29 and others added 8 commits July 26, 2019 11:16

Support missing optional attribute in Squeeze operator (microsoft#1505)

6f538dc

* Make Squeeze operator support no axes attribute cases * Fix build break * Resolve PR comments and exclude tensorrt for the new tests

Rename Tensor.Size() to Tensor.SizeInBytes() (microsoft#1502)

d6a3048

Rename Tensor.Size() to Tensor.SizeInBytes()

remove the GetStream from cuda ep. (microsoft#1514)

cf5a4b5

* remove the GetStream from cuda ep. * fix comments

Enable float16 MatMul+Add -> GEMM fusion for performance boost (micro…

cf73f63

…soft#1506)

More C API changes. (microsoft#1519)

44ab301

* Mention OrtCreateSessionFromArray in C API doc * Cleanup a few inconsistencies in the C API. * updates * More updates

Post binary sizes to dashboard database (microsoft#1517)

a86486a

Python script and necessary changes in the azure-pipelines yaml file to post the binary size data from NuGet package build. Currently only posted from CPU pipeline. GPU and other pipelines may be added as necessary.

Init prev_Ht for zero length sequence to avoid valgrind warning. (mic…

14d46ee

…rosoft#1516) Couple of performance cleanups - don't create debug label string unless dumping matrixes - use raw pointer in fill_n calls

tomdol self-assigned this Jul 31, 2019

tracysh and others added 20 commits July 31, 2019 12:30

NCHWc: Enable Conv/Add fusion for stride=2 convolutions (microsoft#1518)

0b0e329

Update the NCHWc graph transformer to allow Conv/Add fusion for convolutions where stride=2.

Publish nuget package to azure blob store (microsoft#1525)

fb5d0fc

Publish daily build NuGet package to Azure blob store for sharing among internal partners

Fix inclusion of ARM binary in the release pkg (microsoft#1513) (micr…

4d768b3

…osoft#1521) * Fix inclusion of ARM binary in the release pkg * Add lib and pdb as well

enable sse4.1 optimizations for gemmlowp (microsoft#1529)

b599360

Fix a bug in Expand cuda op implementation. (microsoft#1528)

57e2482

Description: crash if the output shape has 0 in it. because the code to / output_shape[i] Fix: If the output shape has 0 which means output_shape.Size() is 0, so output should be null.

Update test data (microsoft#1512)

3045a5f

* Update test data

Upload correct ESRP signed package (microsoft#1531) (microsoft#1534)

624411b

copyfromhost/copytohost are not needed for mkldnn ep (microsoft#1532)

1cf5ebc

* memcpy is not necessary for mkldnn ep to copy from/to host. * update

[WIP] NNAPI EP Update (microsoft#1540)

93cb29f

checking execution provider logic updated. (microsoft#1547)

cb71c69

Register kernel for Greater int64 (microsoft#1546)

a098be1

Register int64 for Greater and refactor the register code

roll back model test update for ngraph provider. (microsoft#1551)

8a6bfe0

add test cases for commit c019bb9 (microsoft#1556)

6c271c6

Modify the kernel declaration for Shrink op (microsoft#1554)

ceb8f1c

* Add capability for the input and output of Shrink op to share a commong buffer * Cosmetic change

[OpenVINO-EP] Update hardware branding of VAD-R as VAD-M (microsoft#1552

05bbb30

) Replaces all occurrences of VAD-R/VAD_R with VAD-M/VAD_M. Aligns with the official hardware branding.

Avoid downloading test data into C:\ (microsoft#1562)

7ee8aca

update default values for weight quatization (microsoft#1564)

16087f3

tomdol merged commit cba9e61 into release Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream/master into release #3

Merge upstream/master into release #3

tomdol commented Jul 31, 2019

Merge upstream/master into release #3

Merge upstream/master into release #3

Conversation

tomdol commented Jul 31, 2019