User/xianz/merge windowsai (#2883)

* Packaging pipeline changes for VS 2019 (#2711) * Tiny fix to codegen * Simplify cache implementation and avoid static variables that may carry over between models * Extend DML kernels (#2641) * Additional DML operators * Check unsupported attributes and inputs * Address PR comments * Add kernel capability function used for partitioning, and re-enable stride-based int64 support based on value range * Fix test failures * Build fix * PR comments * Update Nuphar tutorial notebook (#2721) 1. Reflect int8 GEMV improvements for multi-threading from #2696 2. Add notes on multi-threading control using OpenMP 3. Add samples of running multi-isa AOT, and show int8 GEMM differences between AVX and AVX2 4. Add rnn_benchmark example to resolve #1993 * Add schema for new Qops (#2611) * Add schema for new Qops * adding shape inference + qlinearaveragepool * plus review comments * plus review comments * updates per review comments * plus review comments * [server] Add supposed for model_name and model_version as cli parameter (#2708) * remove 64bit warning message from python validation. (#2727) * MLAS: ARM64 build fix (#2734) fix bad usage of vreinterpret to cast vector element types * Fix broken python docs links (#2740) * Fix build on Mac OS (#2731) mac os ld doesn't support --while-archive, correct option is -all_load * fix ngraph wheel (#2737) * fix ngraph wheel 1.1.0 onnxruntime_ngraph wheel doesn't work * remove libdnnl.so in nGraph Libs * make it easy to compare * Split onnxruntime server to a separated folder (#2744) * Fix build for Python 3.8 (#2747) * Fix build for Python 3.8 * Update protobuf to 3.11.2 (#1928) Update protobuf to 3.11.2 (#1928) * Change default optimization level to All (from Basic) (#2745) * change default optimization level to All (from Basic) * fix test * fix c# test * Update numpy to 1.18 (#2758) * Update numpy to 1.18 * Pipeline changes for python 3.8 (#2753) 1. Pipeline changes for python 3.8 2. Fix a regression in setup.py which was just introduced in the previous commit. Please notice, we still haven't made python 3.8 + Windows + CUDA work. * Add basic stacktrace output for posix debug builds. (#2749) * [NupharEP] fix a race condition when multiple sessions running different models concurrently (#2772) * Revert "Change default optimization level to All (from Basic) (#2745)" This reverts commit 56bb503. * Fix typo in error message (#2736) * Rename MKL-DNN to DNNL to fix broken link (#2730) * Fix nightly build version number issue * Pass BUILD_BUILDNUMBER to linux docker * Disable featurizers in python packages * Import more featurizers (#2781) Make kernels non-template. Add input constraint for learnt data. Add min_max_scalar_transformer, robust_scalar_transformer, inputation_marker_transfomer, label_encoder_transformer, missing_dummies_transformer along with tests. Advance Featurizers library commit. * Implement a more stable softmax (#2715) * Implement a more stable SoftMax e^x is represented as infinity if x is large enough, like 100.f. Infinity divided by Infinity is a NAN. Thus, softmax gets a NAN if one or more item are large enough. A math transform as below is leveraged to get a stable softmax: e^xi/(e^x1 + ...e^xn) = e^(xi - max) / (e^(x1 - max) + ... + e^(xn - max)) And for convenience, force max to 0.f if all xi are negative * Contributing: Fix a typo (#2784) * ACL EP GEMM improvements (#2780) When it is posible we use a fully connected layer instead of the gemm implementation. This will let the library use the best implementation based on the input data. * ACL EP convolution improvements (#2774) Added the optimized implementation for depthwise convolution for both ACL v19.02 and ACL 19.05. Also the pointwise convolution seems to be more optimal in the CPU implementation so we opted for that instead. * Add script for release Nuget validation (#2719) * Initial commit * Nits * Disable a test temporarily * Change working directory * Test * Add download python step * Test update * More changes * Fix space issue * Fix * Verify nuget signing * Fix * Spaces * PR feedback * Nit * Fix * Fix * Remove temporary changes * add uint8 support to where op (#2792) * Improve bert optimization script: (#2712) (1) Move input int64=>int32 conversion to embed layer fusion. (2) Output epsilon attribute for LayerNormalization fusion. * add session creation time cost. (#2798) * ML.NET team needs featurizers within a package (#2789) Add auto ml featurizers to Windows, MacOS as well as to GPU packaging-pipelines. * Initialize max of softmax with lowest of float (#2786) * MLAS: update SGEMM threading parameters (#2808) * add interface to copy batch tensors. (#2807) * add interface to copy batch tensors. * onnxruntime * speed up Windows TRT CI (#2811) * don't run cuda tests if building with tensorrt * remove unnecessary build options for win trt ci * refactor win gpu tensorrt ci yml * --numpy_version=1.17 * update * update * azcopy and cuda path * Update test data (#2356) * Add timeseries imputer transformer featurizer kernel (#2813) Make kernels non-template. Add input constraint for learnt data. Fixup tests. Add two more featurizers along with tests. Tests fail. min_max_scalar_transformer robust_scalar_transformer Fix tests serialized stream by prepending version bytes. Add inputation_marker_transfomer and the test. Fix up float/double type designations. Added label_encoder_transformer along with a test. string_throw case is broken at the momement. Fix labelencodertransfomer_test.cc string_throw case Rename maxabsscalertransformer_test.cc Add MissingDummiesTransformer along with the test. Update manifest. Add TimeSeriesImputerTransformer definition, implementation and tests * Fix memory leak in TRT (#2815) * fix memory leak issue * revert EP_FAIL on enueueV2 * Add manifest missing comma * Run static code analyzer on most of our code (#2817) * Scneario Test : Build Google Test and Taef Test based on preprocessor definition (#2809) * Add winml macro wrappers on top of google test macros * change test methods to disabled * Add custom winml macros for both taef and google tests * PR comments * update quantization doc (#2783) * update documentation for quantization script * plus some spell corrections * Filter CPU case for IsFloat16Supported (#2802) * update default optimization level + fix gemm_activation fusion (#2791) * update defualt optimization level + fix gemm_activation fusion * fix typo * add unit test and incorporate review comments * fix test comment * Fix dnnl wheel package name (#2823) * Append '-dnnl' to whl package name when --use_dnnl * Update build.py * Update Ubuntu & TensorRT version in README (#2820) Dockerfile.tensorrt is using nvcr.io/nvidia/tensorrt:19.09-py3 as base Image, update Ubuntu and TensorRT version according to https://docs.nvidia.com/deeplearning/sdk/tensorrt-container-release-notes/rel_19-09.html#rel_19-09 * Merge fixes * Add OneHotEncoder and HashOneHotEncoder kernels. (#2830) Add defs and imlementation for OneHotEncoders, adjuist date_time_transformer kernel and test. Add OneHotEncoder kernel test. Add HashOneHotVectorizerTransformer unit test. This does not link due to multiple definitions of functions that are included into header from a CPP file. * Upgrade gtest to the latest version (#2827) WinML would like to update the googletest submodule. They want some newer features (namely GTEST_SKIP to skip tests programmatically and be able to skip entire fixtures easily) and would need to update the submodule version. However, because the new version of code hit a bug in gcc, even though the bug is already fixed in the latest gcc but we're using gcc 4.8.x and it won't get patched for the bug, so we have to do a compromise, change our code a little bit to make it work. The gcc bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51213 * Add support for int64_t for topk CPU. Fixes github issue #2806. (#2833) * Ignore allocator type in ExecutionProviders allocator map. Make default initialization of OrtMemoryInfo more clearly invalid. (#2768) * Remove allocator type from the key comparison in ExecutionProviders. Remove usage of DummyArena as it's no longer necessary. * Fix x86 tests where arena allocator is disabled. Make initialization of OrtMemoryInfo clearer by adding Invalid enum value. * Make OrtValueNameIdxMap::MaxIdx more intuitive. * Convert ExternalProject Featurizers into git submodule (#2834) Add git submodule for Featurizer library. Update cmake to build for git submodule. * add domain check for nodes + update documentation (#2831) * Fix cgmanifest.json generating script (#2770) * Fix protobuf submodule name * Workaround pygit2 bug * User/orilevari/32bit comparison warning (#2800) * use correct type for for loop * explicitly specify void for parameters of OrtGetApiBase because the function is defined in c, so when the function is just (), it is interpreted as having an unknown number of parameters. This was causing compiler warning C4276. * CMake cross-generator fixes (#2790) * Fix compilation w/ non-VS CMake generators * Fix custom WINMD target in Ninja * Remove usage of msbuild .targets file * Fix linking using DML in Ninja * Automate SDK kit version choice * Cleanup DML package install * Fix SDK version detection * Fix comment * Revert unittest linkage changes * Fix latest SDK detection * Don't link to non-uapcore libraries * Remove MessageBoxA reference and unused link libs * Fix Linux CUDA nuget packaging pipeline break * Refactor WinMLAPI Tests to build both google and taef test based on preprocessor definition (#2829) * Add winml macro wrappers on top of google test macros * change test methods to disabled * Add custom winml macros for both taef and google tests * PR comments * Refactor winml api tests * Move additional gtest specific macro definition into googleTestMacros.h * Fix test build break since winml_lib_api needs to be statically linked to tests since winmlp::learningmodeldevice::iscpu() is being used in devicehelpers.cpp (#2837) * Enforce WINML_TEST_CLASS_BEGIN_* matches w/ a WINML_TEST_CLASS_END (#2841) * update optimization doc for BERT related fusions (#2819) * Add bert related transformers to doc * Add execution provider and comment for bert optimizations * Add comment about accuracy impact of approximation * Fix warnings that cause build to fail * MLAS: enable threading for quantized GEMMs (#2844) * Fix test warnings and delayload linking (#2843) * Ortmemoryinfo struct changed * mark the camera scenario test as edgecore because it uses d3d11 (#2852) * User/orilevari/pipeline fi breaks (#2853) * remove conflicting artifact names. Decided to stop using drop-nuget-cuda since this may have implications on other dependent pipelines. * change job name in gpu.yml back to Windows_CI_GPU_CUDA_Dev * Remove internal libs from tests (#2864) * Support custom DML in onnxruntime_providers.cmake (#2867) * remove old winmladapter cpp Co-authored-by: Changming Sun <[email protected]> Co-authored-by: KeDengMS <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: Ashwini Khade <[email protected]> Co-authored-by: Andrey <[email protected]> Co-authored-by: George Wu <[email protected]> Co-authored-by: Tracy Sharpe <[email protected]> Co-authored-by: Faith Xu <[email protected]> Co-authored-by: zhanyi-ms <[email protected]> Co-authored-by: Changyoung Koh <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Takeshi Watanabe <[email protected]> Co-authored-by: Dmitri Smirnov <[email protected]> Co-authored-by: Yufeng Li <[email protected]> Co-authored-by: Maher Jendoubi <[email protected]> Co-authored-by: Andrews548 <[email protected]> Co-authored-by: Hariharan Seshadri <[email protected]> Co-authored-by: Nathan <[email protected]> Co-authored-by: Tianlei Wu <[email protected]> Co-authored-by: Ke Zhang <[email protected]> Co-authored-by: stevenlix <[email protected]> Co-authored-by: Ryan Lai <[email protected]> Co-authored-by: Ori Levari <[email protected]> Co-authored-by: Yingge WAN <[email protected]> Co-authored-by: Qing <[email protected]> Co-authored-by: Pranav Sharma <[email protected]> Co-authored-by: Tiago Koji Castro Shibata <[email protected]>
microsoft · Jan 21, 2020 · 59c187a · 59c187a
1 parent 852bf6f
commit 59c187a
Show file tree

Hide file tree

Showing 313 changed files with 10,419 additions and 7,059 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -25,15 +25,9 @@
 [submodule "cmake/external/eigen"]
 	path = cmake/external/eigen
 	url = https://github.com/eigenteam/eigen-git-mirror.git
-[submodule "cmake/external/grpc"]
-	path = cmake/external/grpc
-	url = https://github.com/grpc/grpc
 [submodule "cmake/external/DNNLibrary"]
 	path = cmake/external/DNNLibrary
 	url = https://github.com/JDAI-CV/DNNLibrary
-[submodule "cmake/external/spdlog"]
-	path = cmake/external/spdlog
-	url = https://github.com/gabime/spdlog.git
 [submodule "cmake/external/mimalloc"]
 	path = cmake/external/mimalloc
 	url = https://github.com/microsoft/mimalloc.git
@@ -49,3 +43,9 @@
 [submodule "cmake/external/json"]
 	path = cmake/external/json
 	url = https://github.com/nlohmann/json
+[submodule "server/external/spdlog"]
+	path = server/external/spdlog
+	url = https://github.com/gabime/spdlog.git
+[submodule "cmake/external/FeaturizersLibrary"]
+	path = cmake/external/FeaturizersLibrary
+	url = https://github.com/microsoft/FeaturizersLibrary.git
diff --git a/cgmanifest.json b/cgmanifest.json
@@ -22,7 +22,7 @@
          "component": {
             "type": "git",
             "git": {
-               "commitHash": "9bda90b7e5e08c4c37a832d0cea218aed6af6470",
+               "commitHash": "703bd9caab50b139428cea1aaff9974ebee5742e",
                "repositoryUrl": "https://github.com/google/googletest.git"
             }
          }
@@ -247,7 +247,7 @@
          "component": {
             "type": "git",
             "git": {
-               "commitHash": "48cb18e5c419ddd23d9badcfe4e9df7bde1979b2",
+               "commitHash": "fe1790ca0df67173702f70d5646b82f48f412b99",
                "repositoryUrl": "https://github.com/protocolbuffers/protobuf.git"
             }
          }
@@ -450,7 +450,7 @@
       {
          "component": {
             "git": {
-               "commitHash": "3f0f9802553944b75015aad098d856b2d17220df",
+               "commitHash": "ebec32ef06859b6399bf8854f18b91158c87760b",
                "repositoryUrl": "https://github.com/microsoft/FeaturizersLibrary.git"
             },
             "type": "git"

diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt
@@ -222,12 +222,7 @@ else()
     string(APPEND CMAKE_C_FLAGS_RELEASE " -march=native -mtune=native")
     string(APPEND CMAKE_CXX_FLAGS_RELWITHDEBINFO " -march=native -mtune=native")
     string(APPEND CMAKE_C_FLAGS_RELWITHDEBINFO " -march=native -mtune=native")
-  endif()
-  if(onnxruntime_BUILD_x86)
-    set (CMAKE_SYSTEM_PROCESSOR "x86")
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -msse2 -mfpmath=sse -Wno-narrowing")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -msse2 -mfpmath=sse -Wno-narrowing")
-  endif()
+  endif()  
 endif()
 
 if (${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
@@ -279,10 +274,14 @@ list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/external)
 
 
 # use protobuf as a submodule
+if (CMAKE_SYSTEM_NAME STREQUAL "Android")
+  set(protobuf_BUILD_PROTOC_BINARIES OFF CACHE BOOL "Build protobuf tests" FORCE)
+endif()
+
 add_subdirectory(${PROJECT_SOURCE_DIR}/external/protobuf/cmake EXCLUDE_FROM_ALL)
 set_target_properties(libprotobuf PROPERTIES FOLDER "External/Protobuf")
 set_target_properties(libprotobuf-lite PROPERTIES FOLDER "External/Protobuf")
-set_target_properties(libprotoc PROPERTIES FOLDER "External/Protobuf")
+set_target_properties(libprotoc PROPERTIES FOLDER "External/Protobuf")	
 set_target_properties(protoc PROPERTIES FOLDER "External/Protobuf")
 if (onnxruntime_USE_FULL_PROTOBUF)
   add_library(protobuf::libprotobuf ALIAS libprotobuf)
@@ -295,7 +294,6 @@ if(UNIX AND onnxruntime_ENABLE_LTO)
   #https://github.com/protocolbuffers/protobuf/issues/5923
   target_link_options(protoc PRIVATE "-Wl,--no-as-needed")
 endif()
-
 include(protobuf_function.cmake)
 
 if (onnxruntime_DISABLE_CONTRIB_OPS)
@@ -422,6 +420,14 @@ if (onnxruntime_USE_TVM)
   list(APPEND onnxruntime_EXTERNAL_DEPENDENCIES tvm nnvm_compiler)
 endif()
 
+if (APPLE)
+  #onnx/onnx/proto_utils.h:34:16: error: 'SetTotalBytesLimit' is deprecated: Please use the single 
+  #parameter version of SetTotalBytesLimit(). The second parameter is ignored. 
+  #  coded_stream.SetTotalBytesLimit((2048LL << 20) - 1, 512LL << 20);
+  #TODO: fix the warning in ONNX and re-enable this flag
+  string(APPEND CMAKE_CXX_FLAGS " -Wno-deprecated")
+  string(APPEND CMAKE_C_FLAGS " -Wno-deprecated")
+endif()
 # ONNX
 add_subdirectory(onnx)
 
@@ -498,6 +504,7 @@ else()
   endif()
   check_cxx_compiler_flag(-Wunused-but-set-variable HAS_UNUSED_BUT_SET_VARIABLE)
   check_cxx_compiler_flag(-Wunused-parameter HAS_UNUSED_PARAMETER)
+  check_cxx_compiler_flag(-Wunused-variable HAS_UNUSED_VARIABLE)
   check_cxx_compiler_flag(-Wcast-function-type HAS_CAST_FUNCTION_TYPE)
   check_cxx_compiler_flag(-Wparentheses HAS_PARENTHESES)
   check_cxx_compiler_flag(-Wuseless-cast HAS_USELESS_CAST)

diff --git a/cmake/external/FeaturizersLibrary b/cmake/external/FeaturizersLibrary
diff --git a/cmake/external/dml.cmake b/cmake/external/dml.cmake
@@ -20,19 +20,17 @@ if (NOT onnxruntime_USE_CUSTOM_DIRECTML)
   set(NUGET_CONFIG ${PROJECT_SOURCE_DIR}/../NuGet.config)
   set(PACKAGES_CONFIG ${PROJECT_SOURCE_DIR}/../packages.config)
   set(PACKAGES_DIR ${CMAKE_CURRENT_BINARY_DIR}/packages)
+  set(DML_PACKAGE_DIR ${PACKAGES_DIR}/DirectML.0.0.1)
 
   # Restore nuget packages, which will pull down the DirectML redist package
   add_custom_command(
-    OUTPUT restore_packages.stamp
+    OUTPUT ${DML_PACKAGE_DIR}/bin/x64/DirectML.lib ${DML_PACKAGE_DIR}/bin/x86/DirectML.lib
     DEPENDS ${PACKAGES_CONFIG} ${NUGET_CONFIG}
     COMMAND ${CMAKE_CURRENT_BINARY_DIR}/nuget/src/nuget restore ${PACKAGES_CONFIG} -PackagesDirectory ${PACKAGES_DIR} -ConfigFile ${NUGET_CONFIG}
-    COMMAND ${CMAKE_COMMAND} -E touch restore_packages.stamp
     VERBATIM)
 
-  add_custom_target(RESTORE_PACKAGES ALL DEPENDS restore_packages.stamp)
+  add_custom_target(RESTORE_PACKAGES ALL DEPENDS ${DML_PACKAGE_DIR}/bin/x64/DirectML.lib ${DML_PACKAGE_DIR}/bin/x86/DirectML.lib)
   add_dependencies(RESTORE_PACKAGES nuget)
-
-  list(APPEND onnxruntime_EXTERNAL_DEPENDENCIES RESTORE_PACKAGES)
 else()
   include_directories(${dml_INCLUDE_DIR})
 endif()
diff --git a/cmake/external/featurizers.cmake b/cmake/external/featurizers.cmake
@@ -2,50 +2,17 @@
 # Licensed under the MIT License.
 # This source code should not depend on the onnxruntime and may be built independently
 
-set(featurizers_URL "https://github.com/microsoft/FeaturizersLibrary.git")
-set(featurizers_TAG "3f0f9802553944b75015aad098d856b2d17220df")
-
 set(featurizers_pref FeaturizersLibrary)
 set(featurizers_ROOT ${PROJECT_SOURCE_DIR}/external/${featurizers_pref})
 set(featurizers_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR}/external/${featurizers_pref})
 
-# Only due to GIT_CONFIG
-# Uncoment UPDATE_COMMAND if you work locally
-# on the featurizers so cmake does not undo your changes.
-if (WIN32)
-    ExternalProject_Add(featurizers_lib
-            PREFIX ${featurizers_pref}
-            GIT_REPOSITORY ${featurizers_URL}
-            GIT_TAG ${featurizers_TAG}
-            # Need this to properly checkout crlf
-            GIT_CONFIG core.autocrlf=input
-            SOURCE_DIR ${featurizers_ROOT}
-            # Location of CMakeLists.txt
-            SOURCE_SUBDIR src/Featurizers
-            BINARY_DIR ${featurizers_BINARY_DIR}
-            CMAKE_ARGS -Dfeaturizers_MSVC_STATIC_RUNTIME=${onnxruntime_MSVC_STATIC_RUNTIME}
-#            UPDATE_COMMAND ""
-            INSTALL_COMMAND ""
-        )
-else()
-    ExternalProject_Add(featurizers_lib
-            PREFIX ${featurizers_pref}
-            GIT_REPOSITORY ${featurizers_URL}
-            GIT_TAG ${featurizers_TAG}
-            SOURCE_DIR ${featurizers_ROOT}
-            # Location of CMakeLists.txt
-            SOURCE_SUBDIR src/Featurizers
-            BINARY_DIR ${featurizers_BINARY_DIR}
-            CMAKE_ARGS -DCMAKE_POSITION_INDEPENDENT_CODE=ON
-#            UPDATE_COMMAND ""
-            INSTALL_COMMAND ""
-        )
-endif()
+add_subdirectory(external/FeaturizersLibrary/src/Featurizers ${featurizers_BINARY_DIR} EXCLUDE_FROM_ALL)
+set_target_properties(FeaturizersCode PROPERTIES FOLDER "External/FeaturizersLibrary")
 
 add_library(onnxruntime_featurizers STATIC IMPORTED)
-add_dependencies(onnxruntime_featurizers featurizers_lib)
-target_include_directories(onnxruntime_featurizers INTERFACE ${featurizers_ROOT}/src)
+add_dependencies(onnxruntime_featurizers FeaturizersCode)
 
+target_include_directories(onnxruntime_featurizers INTERFACE ${featurizers_ROOT}/src)
 if(MSVC)
   set_property(TARGET onnxruntime_featurizers PROPERTY IMPORTED_LOCATION
     ${CMAKE_CURRENT_BINARY_DIR}/external/${featurizers_pref}/${CMAKE_BUILD_TYPE}/FeaturizersCode.lib)
@@ -54,6 +21,7 @@ else()
     ${CMAKE_CURRENT_BINARY_DIR}/external/${featurizers_pref}/libFeaturizersCode.a)
 endif()
 
+
 if (WIN32)
     # Add Code Analysis properties to enable C++ Core checks. Have to do it via a props file include.
     set_target_properties(onnxruntime_featurizers PROPERTIES VS_USER_PROPS ${PROJECT_SOURCE_DIR}/ConfigureVisualStudioCodeAnalysis.props)

diff --git a/cmake/external/grpc b/cmake/external/grpc
diff --git a/cmake/onnx/CMakeLists.txt b/cmake/onnx/CMakeLists.txt
@@ -6,7 +6,11 @@ target_include_directories(onnx_proto PUBLIC $<TARGET_PROPERTY:protobuf::libprot
 target_compile_definitions(onnx_proto PUBLIC $<TARGET_PROPERTY:protobuf::libprotobuf,INTERFACE_COMPILE_DEFINITIONS>)
 onnxruntime_protobuf_generate(APPEND_PATH IMPORT_DIRS ${ONNXRUNTIME_ROOT}/core/protobuf TARGET onnx_proto)
 if (WIN32)
-  target_compile_options(onnx_proto PRIVATE /wd4146) # unary minus operator applied to unsigned type
+  target_compile_options(onnx_proto PRIVATE "/wd4146" "/wd4125" "/wd4456" "/wd4267")
+else()
+  if(HAS_UNUSED_VARIABLE)
+    target_compile_options(onnx_proto PRIVATE "-Wno-unused-variable")
+  endif()
 endif()
 # Cpp Tests were added and they require googletest
 # since we have our own copy, try using that

diff --git a/cmake/onnxruntime_common.cmake b/cmake/onnxruntime_common.cmake
@@ -44,6 +44,22 @@ else()
     endif()
 endif()
 
+if(CMAKE_GENERATOR_PLATFORM)
+    # Multi-platform generator
+    set(onnxruntime_target_platform ${CMAKE_GENERATOR_PLATFORM})
+else()
+    set(onnxruntime_target_platform ${CMAKE_SYSTEM_PROCESSOR})
+endif()
+if(onnxruntime_target_platform STREQUAL "ARM64")
+    set(onnxruntime_target_platform "ARM64")
+elseif(onnxruntime_target_platform STREQUAL "ARM" OR CMAKE_GENERATOR MATCHES "ARM")
+    set(onnxruntime_target_platform "ARM")
+elseif(onnxruntime_target_platform STREQUAL "x64" OR onnxruntime_target_platform STREQUAL "x86_64" OR onnxruntime_target_platform STREQUAL "AMD64" OR CMAKE_GENERATOR MATCHES "Win64")
+    set(onnxruntime_target_platform "x64")
+elseif(onnxruntime_target_platform STREQUAL "x86" OR onnxruntime_target_platform STREQUAL "i386" OR onnxruntime_target_platform STREQUAL "i686")
+    set(onnxruntime_target_platform "x86")
+endif()
+
 file(GLOB onnxruntime_common_src CONFIGURE_DEPENDS
     ${onnxruntime_common_src_patterns}
     )

diff --git a/cmake/onnxruntime_mlas.cmake b/cmake/onnxruntime_mlas.cmake
@@ -19,7 +19,7 @@ set(mlas_common_srcs
 )
 
 if(MSVC)
-  if(CMAKE_GENERATOR_PLATFORM STREQUAL "ARM64")
+  if(onnxruntime_target_platform STREQUAL "ARM64")
     set(asm_filename ${ONNXRUNTIME_ROOT}/core/mlas/lib/arm64/SgemmKernelNeon.asm)
     set(pre_filename ${CMAKE_CURRENT_BINARY_DIR}/SgemmKernelNeon.i)
     set(obj_filename ${CMAKE_CURRENT_BINARY_DIR}/SgemmKernelNeon.obj)
@@ -38,11 +38,11 @@ if(MSVC)
             armasm64.exe ${ARMASM_FLAGS} ${pre_filename} ${obj_filename}
     )
     set(mlas_platform_srcs ${obj_filename})
-  elseif(CMAKE_GENERATOR_PLATFORM STREQUAL "ARM" OR CMAKE_GENERATOR MATCHES "ARM")
+  elseif(onnxruntime_target_platform STREQUAL "ARM")
     set(mlas_platform_srcs
       ${ONNXRUNTIME_ROOT}/core/mlas/lib/arm/sgemmc.cpp
     )
-  elseif(CMAKE_GENERATOR_PLATFORM STREQUAL "x64" OR CMAKE_GENERATOR MATCHES "Win64")
+  elseif(onnxruntime_target_platform STREQUAL "x64")
     enable_language(ASM_MASM)
 
     set(mlas_platform_srcs

diff --git a/cmake/onnxruntime_providers.cmake b/cmake/onnxruntime_providers.cmake
@@ -217,7 +217,7 @@ if (onnxruntime_USE_TENSORRT)
   if ( CMAKE_COMPILER_IS_GNUCC )
     set(CMAKE_CXX_FLAGS  "${CMAKE_CXX_FLAGS} -Wno-unused-parameter -Wno-missing-field-initializers")
   endif()
-  set(CXX_VERSION_DEFINED TRUE)  
+  set(CXX_VERSION_DEFINED TRUE)
   add_subdirectory(${ONNXRUNTIME_ROOT}/../cmake/external/onnx-tensorrt)
   set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})
   if (WIN32)
@@ -303,7 +303,7 @@ if (onnxruntime_USE_OPENVINO)
     if(WIN32)
      set(OPENVINO_LIB_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/Release)
      set(OPENVINO_TBB_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/Release)
-     set(OPENVINO_MKL_TINY_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/bin/intel64/Release)	 
+     set(OPENVINO_MKL_TINY_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/bin/intel64/Release)
     else()
      set(OPENVINO_LIB_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/lib/intel64/)
      set(OPENVINO_TBB_DIR $ENV{INTEL_OPENVINO_DIR}/deployment_tools/inference_engine/external/tbb/lib)
@@ -327,9 +327,9 @@ if (onnxruntime_USE_OPENVINO)
   else()
     target_include_directories(onnxruntime_providers_openvino SYSTEM PUBLIC ${ONNXRUNTIME_ROOT} ${eigen_INCLUDE_DIRS} ${OPENVINO_INCLUDE_DIR} ${OPENVINO_EXTENSIONS_DIR} ${OPENVINO_LIB_DIR} ${OPENVINO_TBB_INCLUDE_DIR} ${PYTHON_INCLUDE_DIRS})
   endif()
-  
-   if (WIN32)   
-     string(REPLACE "include" "libs" PYTHON_LIB ${PYTHON_INCLUDE_DIRS})	
+
+   if (WIN32)
+     string(REPLACE "include" "libs" PYTHON_LIB ${PYTHON_INCLUDE_DIRS})
 	   find_package(InferenceEngine 2.1 REQUIRED)
      set(PYTHON_LIBRARIES ${PYTHON_LIB})
      set(OPENVINO_CPU_EXTENSION_DIR ${onnxruntime_BINARY_DIR}/ie_cpu_extension/${CMAKE_BUILD_TYPE})
@@ -430,22 +430,41 @@ if (onnxruntime_USE_DML)
   onnxruntime_add_include_to_target(onnxruntime_providers_dml onnxruntime_common onnxruntime_framework onnx onnx_proto protobuf::libprotobuf)
   add_dependencies(onnxruntime_providers_dml ${onnxruntime_EXTERNAL_DEPENDENCIES})
   target_include_directories(onnxruntime_providers_dml PRIVATE ${ONNXRUNTIME_ROOT} ${ONNXRUNTIME_ROOT}/../cmake/external/wil/include)
-
-  target_link_libraries(onnxruntime_providers_dml ${CMAKE_CURRENT_BINARY_DIR}/packages/DirectML.0.0.1/build/DirectML.targets)
-  target_link_libraries(onnxruntime_providers_dml d3d12.lib dxgi.lib)
+
+  if (NOT onnxruntime_USE_CUSTOM_DIRECTML)
+    if(NOT onnxruntime_target_platform STREQUAL "x86" AND NOT onnxruntime_target_platform STREQUAL "x64")
+      message(FATAL_ERROR "Target platform ${onnxruntime_target_platform} is not supported by DML")
+    endif()
+    foreach(file "DirectML.dll" "DirectML.pdb" "DirectML.Debug.dll" "DirectML.Debug.pdb")
+      add_custom_command(TARGET onnxruntime_providers_dml
+        POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different
+          "${DML_PACKAGE_DIR}/bin/${onnxruntime_target_platform}/${file}" $<TARGET_FILE_DIR:onnxruntime_providers_dml>)
+    endforeach()
+  endif()
+
+  function(target_add_dml target)
+    if (NOT onnxruntime_USE_CUSTOM_DIRECTML)
+      target_link_libraries(${target} PRIVATE "${DML_PACKAGE_DIR}/bin/${onnxruntime_target_platform}/DirectML.lib")
+      target_include_directories(${target} PRIVATE "${DML_PACKAGE_DIR}/include")
+    endif()
+  endfunction()
+
+  target_add_dml(onnxruntime_providers_dml)
+  target_link_libraries(onnxruntime_providers_dml PRIVATE d3d12.lib dxgi.lib delayimp.lib)
   list(APPEND ONNXRUNTIME_LINKER_FLAGS "/DELAYLOAD:DirectML.dll /DELAYLOAD:d3d12.dll /DELAYLOAD:dxgi.dll")
 
   # The DML EP requires C++17
   set_target_properties(onnxruntime_providers_dml PROPERTIES CXX_STANDARD 17)
   set_target_properties(onnxruntime_providers_dml PROPERTIES CXX_STANDARD_REQUIRED ON)
-  
+
   target_compile_definitions(onnxruntime_providers_dml PRIVATE ONNX_NAMESPACE=onnx ONNX_ML LOTUS_LOG_THRESHOLD=2 LOTUS_ENABLE_STDERR_LOGGING PLATFORM_WINDOWS)
   target_compile_definitions(onnxruntime_providers_dml PRIVATE UNICODE _UNICODE NOMINMAX)
   if (MSVC)
     target_compile_definitions(onnxruntime_providers_dml PRIVATE _SILENCE_CXX17_ITERATOR_BASE_CLASS_DEPRECATION_WARNING)
     target_compile_options(onnxruntime_providers_dml PRIVATE "/W3")
   endif()
-  
+
   install(DIRECTORY ${PROJECT_SOURCE_DIR}/../include/onnxruntime/core/providers/dml  DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/onnxruntime/core/providers)
 
   set_target_properties(onnxruntime_providers_dml PROPERTIES LINKER_LANGUAGE CXX)