Add GroupQueryAttention on CPU in model builder #420

kunal-vaishnavi · 2024-05-08T21:23:02Z

Description

This PR adds GroupQueryAttention to ONNX models generated for CPU.

Motivation and Context

This PR is a follow up to this PR.

src/python/py/models/builder.py

### Description This PR adds support for packing the bias after a packed QKV MatMul. ### Motivation and Context This PR is a follow up to [this PR](#420).

### Description This PR adds `GroupQueryAttention` to ONNX models generated for CPU. ### Motivation and Context This PR is a follow up to [this PR](#270).

### Description This PR adds support for packing the bias after a packed QKV MatMul. ### Motivation and Context This PR is a follow up to [this PR](#420).

kunal-vaishnavi added 4 commits April 24, 2024 16:22

Add support for GroupQueryAttention on CPU

1cebb90

Merge branch 'main' into kvaishnavi/gqa-cpu

d973702

Remove unnecessary extra option

c00a480

Update comment for repeat KV

2fc036a

yufenglee reviewed May 8, 2024

View reviewed changes

src/python/py/models/builder.py Outdated Show resolved Hide resolved

yufenglee reviewed May 8, 2024

View reviewed changes

src/python/py/models/builder.py Outdated Show resolved Hide resolved

kunal-vaishnavi added 2 commits May 8, 2024 17:01

Remove comment for unsupported GQA config

f12e4cc

Simplify packed MatMul creation and enable for all models where possible

9a6967d

yufenglee approved these changes May 9, 2024

View reviewed changes

kunal-vaishnavi merged commit e2aa89e into main May 9, 2024
12 checks passed

kunal-vaishnavi deleted the kvaishnavi/gqa-cpu branch May 9, 2024 22:02

kunal-vaishnavi mentioned this pull request May 10, 2024

Fix packed QKV Add after packed QKV MatMul in model builder #432

Merged

baijumeswani added the rel-0.2.0-rc7 label May 10, 2024

baijumeswani pushed a commit that referenced this pull request May 10, 2024

Add GroupQueryAttention on CPU in model builder (#420)

0eaf0b9

### Description This PR adds `GroupQueryAttention` to ONNX models generated for CPU. ### Motivation and Context This PR is a follow up to [this PR](#270).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GroupQueryAttention on CPU in model builder #420

Add GroupQueryAttention on CPU in model builder #420

kunal-vaishnavi commented May 8, 2024

Add GroupQueryAttention on CPU in model builder #420

Add GroupQueryAttention on CPU in model builder #420

Conversation

kunal-vaishnavi commented May 8, 2024

Description

Motivation and Context