Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This change implements Conv+Clip activation fusion for FusedConv and NCHWc convolutions. The Clip operation runs in the thread context that is producing the convolution output.
Motivation and Context
This change optimizes models sourced from TF that used Relu6, which is then converted to Clip in ONNX models. As with the other convolution + activation fusions, running this in the convolution threads lets the activation be cheaply parallelized and is also more cache efficient.
For the mobilenet model from mlperf, model time drops from 5->4ms and the ssd_mobilenet_v1_coco drops from 28ms to 25ms. Similar drops are seen when running just "-o 2" with the older NCHW convolution.