[Performance] Slowdown Caused by Gelu Fusion Removal #23491

SuhwanSong · 2025-01-25T14:51:04Z

Describe the issue

From commit 2cdc05f, ONNX Runtime (ORT) no longer performs Gelu fusion, resulting in a 4X performance slowdown.

Bisect range: de7a02b .. 2cdc05f.

Optimized model of `de7a02b`

Optimized model of `2cdc05f`

Performance Comparison

Key	`de7a02b`	`2cdc05f`	Ratio
model_loading_uri	611	603	0.9869
session_initialization	4256	4236	0.9953
/m4/MatMul_kernel_time	616211	531171	0.8623
/m4/Add_kernel_time		4973509
BiasGelu_kernel_time	513038
Gelu_kernel_time		171279
SequentialExecutor::Execute	1193568	5778856	4.8418
model_run	1223691	5796766	4.7372

To reproduce

Download and unzip "model.zip".
Run the following script.

import time
import onnxruntime
import numpy as np

# Set the random seed
np.random.seed(0)

onnx_model_path = 'model.onnx'

# Load the ONNX model with the CPUExecutionProvider
ort_session = onnxruntime.InferenceSession(onnx_model_path, providers=['CPUExecutionProvider'])
ort_session.get_modelmeta()
inputs = ort_session.get_inputs()

nth = 100000

# Warm-up inference to cache optimizations

input_data = np.load("input.npy", allow_pickle=True).item()
ort_session.run(None, input_data)

# Measure inference time excluding input creation
total_time_ns = 0
for _ in range(nth):

    start_ns = time.perf_counter_ns()
    ort_session.run(None, input_data)
    end_ns = time.perf_counter_ns()

    total_time_ns += end_ns - start_ns

avg_time_ns = total_time_ns / nth
avg_time_ms = avg_time_ns / 1e6

print(f'[{onnxruntime.__version__}] Average inference time: {avg_time_ms:.5f} ms')

Urgency

No response

Platform

Linux

OS Version

6.8.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

model.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

tianleiwu · 2025-01-27T18:25:15Z

The reason is BiasGelu fusion does not apply on Gelu-18 onnx operator (only applies to the contrib op in com.microsoft domain):

onnxruntime/onnxruntime/core/optimizer/bias_gelu_fusion.cc

Line 64 in 97c2bbe

    
           if (!(graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Gelu", {1}, kMSDomain) ||

We will update the fusion to support the Gelu from onnx domain.

If it is urgent, temporary solution is to use older version of onnxruntime; or save the optimized model using older version of onnxruntime then run the optimized model with latest version of onnxruntime.

### Description (1) Update BiasGelu fusion to support onnx Gelu-20 Since onnx Gelu-20 supports float/double/bf16/fp16, here we update related ops to support these data types in CUDA and ROCm execution providers: (2) Add double support for Gelu/FastGelu op in CUDA/ROCm execution provider (3) Add BFloat16 support for Gelu ops in CUDA execution provider (4) Add unit tests (5) Update operator documents ### Motivation and Context #23491

SuhwanSong added the performance issues related to performance regressions label Jan 25, 2025

tianleiwu mentioned this issue Jan 28, 2025

Update BiasGelu fusion and related ops #23518

Merged

tianleiwu closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

SuhwanSong commented Jan 25, 2025 •

edited

Loading

tianleiwu commented Jan 27, 2025

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

[Performance] Slowdown Caused by Gelu Fusion Removal #23491

Comments

SuhwanSong commented Jan 25, 2025 • edited Loading

Describe the issue

Optimized model of de7a02b

Optimized model of 2cdc05f

Performance Comparison

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

tianleiwu commented Jan 27, 2025

SuhwanSong commented Jan 25, 2025 •

edited

Loading

Optimized model of `de7a02b`

Optimized model of `2cdc05f`