Resolved ALIBI bias regression due to porting flat PA #503

tannervoas742 · 2024-11-15T05:44:13Z

Requires associated changes on vllm-hpu-extension PR

Changes:

Added back alibi biases to decode stage.
Optimized ALiBI memory usage.
- Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow
  large models to run with restricted prompt lengths.
- Prompt biases instantiated once in init rather than each
  forward.
- Prompt and decode biases are shared across encoder/decoder layers.
Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve
accuracy issue on long sequences.
Updated jais, mpt, falcon, baichuan, and bloom to work with ALiBI.
- Due to bloom's 176B parameter size I was unable to test this model.
  Its changes are the simplest though.
Works in lazy and eager mode.
ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and
"VLLM_CONTIGUOUS_PA=true".
Add position offsets to improve quality on BS > 1 with sequences of
varying length.
BS > 1 may have accuracy issues if on FW < 1.19.0. This is due to
limitation in softmax. Resolved on FW >= 1.19.0.
NTT patch for GQA

Co-authored-by: Tanner Voas [email protected]
Co-authored-by: Haihao Xiang [email protected]
Signed-off-by: Tanner Voas [email protected]

tannervoas742 · 2024-12-05T19:34:26Z

@itaraban @madamczykhabana @kzawora-intel has anyone gotten a chance to review this PR and the associated one on vllm-hpu-extension. I just pushed out a significant update that minimizes changes to non-alibi code sections. It also has significant accuracy and memory optimization changes.

With the current changes ALiBi is now fully functional as long as FW >= 1.19.0 is being used.

Please help review. Any feedback would be appreciated.

vllm/worker/hpu_model_runner.py

vllm/attention/backends/hpu_attn.py

tannervoas742 · 2025-01-20T13:33:54Z

@michalkuligowski @kwisniewski98 conflicts have been resolved. Yapf and ruff issues should also be resolved now.

vllm/attention/backends/hpu_attn.py

PatrykWo · 2025-02-05T12:47:44Z

@kwisniewski98 please resolve conflicts before merge, plus some tests failed.

tannervoas742 · 2025-02-05T15:47:28Z

@kwisniewski98 please resolve conflicts before merge, plus some tests failed.

Resolved merge conflicts. Waiting for tests to re-run.

kwisniewski98 · 2025-02-06T09:55:54Z

@tannervoas742 Could you please switch for now vllm-hpu-extension version to branch that you've added to check if tests will go through?

tannervoas742 · 2025-02-06T10:04:45Z

@tannervoas742 Could you please switch for now vllm-hpu-extension version to branch that you've added to check if tests will go through?

Updated the reference. The cpu-test passed when I ran it locally.

PatrykWo · 2025-02-10T10:56:39Z

@tannervoas742 please resolve issue in yapf

yapf.....................................................................Failed

hook id: yapf
exit code: 1

tannervoas742 · 2025-02-10T14:08:32Z

@tannervoas742 please resolve issue in yapf

yapf.....................................................................Failed

hook id: yapf

exit code: 1

Fixed. I setup my pre-commit hooks locally and they passed now including yapf.

Problems resolved

PatrykWo · 2025-02-13T10:34:17Z

@tannervoas742 please rebase

tannervoas742 · 2025-02-13T11:47:23Z

@tannervoas742 please rebase

Rebased

tannervoas742 · 2025-02-21T03:06:08Z

Freshly rebased. ALiBi fixes still work. Pre-commit hooks all passed.

Changes: - Added back alibi biases to decode stage. - Optimized ALiBI memory usage. - Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow large models to run with restricted prompt lengths. - Prompt biases instantiated once rather than each forward. - Prompt and decode biases are shared across encoder/decoder layers. - Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve accuracy issue on long sequences. - Works in lazy and eager mode. - ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and "VLLM_CONTIGUOUS_PA=true". - NTT patch for GQA Co-authored-by: Tanner Voas <[email protected]> Co-authored-by: Haihao Xiang <[email protected]> Signed-off-by: Tanner Voas <[email protected]>

tannervoas742 · 2025-02-26T02:51:00Z

@kwisniewski98 @PatrykWo @michalkuligowski any progress towards getting this merged? There are customers in PRC who care about ALiBi support in vLLM.

tannervoas742 mentioned this pull request Nov 15, 2024

Resolved ALIBI bias regression due to porting flat PA HabanaAI/vllm-hpu-extension#34

Merged

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 6 times, most recently from 4a0674d to 3959126 Compare November 18, 2024 10:33

michalkuligowski requested review from madamczykhabana and kzawora-intel November 19, 2024 08:54

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 4 times, most recently from b339767 to 3c3e18a Compare November 27, 2024 03:05

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 3c3e18a to 6c19183 Compare November 28, 2024 01:22

zhouyuan mentioned this pull request Nov 28, 2024

Enable alibi fusedsdpa #561

Merged

mgawarkiewicz requested a review from itaraban December 2, 2024 09:55

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 6c19183 to 3cb455d Compare December 5, 2024 19:23

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 49fcaaa to 64822b0 Compare December 10, 2024 16:16

tannervoas742 requested review from michalkuligowski and mgawarkiewicz as code owners December 10, 2024 16:16

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 64822b0 to 684384e Compare December 11, 2024 15:04

itaraban approved these changes Dec 12, 2024

View reviewed changes

michalkuligowski requested changes Dec 12, 2024

View reviewed changes

vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

vllm/attention/backends/hpu_attn.py Show resolved Hide resolved

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 214885e to d3fa482 Compare December 12, 2024 20:01

tannervoas742 requested a review from michalkuligowski December 12, 2024 20:08

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 787d66c to 1c63b12 Compare January 20, 2025 13:26

tannervoas742 requested review from kwisniewski98 and michalkuligowski January 20, 2025 13:27

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 1c63b12 to 2d7b0a3 Compare January 20, 2025 13:32

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 2d7b0a3 to ec99176 Compare January 22, 2025 02:32

mswiniarsk reviewed Jan 27, 2025

View reviewed changes

vllm/attention/backends/hpu_attn.py Show resolved Hide resolved

mswiniarsk approved these changes Feb 5, 2025

View reviewed changes

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from ec99176 to bd813f2 Compare February 5, 2025 15:46

kwisniewski98 approved these changes Feb 6, 2025

View reviewed changes

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from bd813f2 to fec0f83 Compare February 6, 2025 10:02

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from fec0f83 to fb5523c Compare February 6, 2025 10:27

madamczykhabana mentioned this pull request Feb 6, 2025

[DONOTMERGE] [TEST-ONLY] vLLM-Base: Full enabling of ALiBi #789

Closed

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from fb5523c to c627f44 Compare February 10, 2025 14:08

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from c627f44 to d15e759 Compare February 13, 2025 11:47

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 2 times, most recently from 847c3f7 to 6100fa0 Compare February 21, 2025 02:20

tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 6100fa0 to c49239f Compare February 26, 2025 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolved ALIBI bias regression due to porting flat PA #503

Resolved ALIBI bias regression due to porting flat PA #503

tannervoas742 commented Nov 15, 2024 •

edited by github-actions bot

Loading

tannervoas742 commented Dec 5, 2024

tannervoas742 commented Jan 20, 2025

PatrykWo commented Feb 5, 2025 •

edited

Loading

tannervoas742 commented Feb 5, 2025

kwisniewski98 commented Feb 6, 2025

tannervoas742 commented Feb 6, 2025

PatrykWo commented Feb 10, 2025

tannervoas742 commented Feb 10, 2025

PatrykWo commented Feb 13, 2025

tannervoas742 commented Feb 13, 2025

tannervoas742 commented Feb 21, 2025

tannervoas742 commented Feb 26, 2025

Resolved ALIBI bias regression due to porting flat PA #503

Are you sure you want to change the base?

Resolved ALIBI bias regression due to porting flat PA #503

Conversation

tannervoas742 commented Nov 15, 2024 • edited by github-actions bot Loading

tannervoas742 commented Dec 5, 2024

tannervoas742 commented Jan 20, 2025

PatrykWo commented Feb 5, 2025 • edited Loading

tannervoas742 commented Feb 5, 2025

kwisniewski98 commented Feb 6, 2025

tannervoas742 commented Feb 6, 2025

PatrykWo commented Feb 10, 2025

tannervoas742 commented Feb 10, 2025

PatrykWo commented Feb 13, 2025

tannervoas742 commented Feb 13, 2025

tannervoas742 commented Feb 21, 2025

tannervoas742 commented Feb 26, 2025

tannervoas742 commented Nov 15, 2024 •

edited by github-actions bot

Loading

PatrykWo commented Feb 5, 2025 •

edited

Loading