You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importtimeimportonnxruntimeimportnumpyasnp# Set the random seednp.random.seed(0)
onnx_model_path='model.onnx'# Load the ONNX model with the CPUExecutionProviderort_session=onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs=ort_session.get_inputs()
nth=100000# Warm-up inference to cache optimizationsinput_data=np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)
# Measure inference time excluding input creationtotal_time_ns=0for_inrange(nth):
start_ns=time.perf_counter_ns()
ort_session.run(None, input_data)
end_ns=time.perf_counter_ns()
total_time_ns+=end_ns-start_nsavg_time_ns=total_time_ns/nthavg_time_ms=avg_time_ns/1e6print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')
if (!(graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Gelu", {1}, kMSDomain) ||
We will update the fusion to support the Gelu from onnx domain.
If it is urgent, temporary solution is to use older version of onnxruntime; or save the optimized model using older version of onnxruntime then run the optimized model with latest version of onnxruntime.
### Description
(1) Update BiasGelu fusion to support onnx Gelu-20
Since onnx Gelu-20 supports float/double/bf16/fp16, here we update
related ops to support these data types in CUDA and ROCm execution
providers:
(2) Add double support for Gelu/FastGelu op in CUDA/ROCm execution
provider
(3) Add BFloat16 support for Gelu ops in CUDA execution provider
(4) Add unit tests
(5) Update operator documents
### Motivation and Context
#23491
### Description
(1) Update BiasGelu fusion to support onnx Gelu-20
Since onnx Gelu-20 supports float/double/bf16/fp16, here we update
related ops to support these data types in CUDA and ROCm execution
providers:
(2) Add double support for Gelu/FastGelu op in CUDA/ROCm execution
provider
(3) Add BFloat16 support for Gelu ops in CUDA execution provider
(4) Add unit tests
(5) Update operator documents
### Motivation and Context
#23491
Describe the issue
From commit 2cdc05f, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.
Bisect range: de7a02b .. 2cdc05f.
Optimized model of de7a02b
Optimized model of 2cdc05f
Performance Comparison
To reproduce
Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
model.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: