Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize com.microsoft.MatMulNbits operator #28504

Merged
merged 3 commits into from
Feb 18, 2025

Conversation

bopeng1234
Copy link
Contributor

This PR is doing some optimization work on onnxfrontend com.microsoft.MatMulNbits operators

with this changes:

  1. it disabled const folding with use 75GB for phi3 INT4 model and 200+GB for llama3 INT4 model.
  2. it trigger oneDNN matmul primitives, much benefits the GPU performance

we tested this changes along with another PR #28163 , and confirmed phi3/llama3 INT4 model run well in LNL.

@bopeng1234 bopeng1234 requested a review from a team as a code owner January 17, 2025 06:47
@github-actions github-actions bot added the category: ONNX FE OpenVINO ONNX FrontEnd label Jan 17, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jan 17, 2025
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Jan 17, 2025
@ilya-lavrenov
Copy link
Contributor

build_jenkins

    ### Details:
        - use convert instead of convert_like op, it help disabled const
	  folding and run online int2/4/8 dequantize rather than const
	  folding as complie time, benefits compile memory usage and
	  inference latency.
	- use zero point as uint2/4/8, it trigled oneDNN kernel, much
	  benefits the GPU performance.
@gkrivor
Copy link
Contributor

gkrivor commented Feb 17, 2025

build_jenkins

@bopeng1234
Copy link
Contributor Author

hi @gkrivor , the CI checked passed.

@gkrivor gkrivor added this pull request to the merge queue Feb 18, 2025
Merged via the queue into openvinotoolkit:master with commit 68ecdfb Feb 18, 2025
183 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: ONNX FE OpenVINO ONNX FrontEnd ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants