Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved ALIBI bias regression due to porting flat PA #503

Open
wants to merge 1 commit into
base: habana_main
Choose a base branch
from

Conversation

tannervoas742
Copy link

@tannervoas742 tannervoas742 commented Nov 15, 2024

Requires associated changes on vllm-hpu-extension PR

Changes:

  • Added back alibi biases to decode stage.
  • Optimized ALiBI memory usage.
    • Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow
      large models to run with restricted prompt lengths.
    • Prompt biases instantiated once in init rather than each
      forward.
    • Prompt and decode biases are shared across encoder/decoder layers.
  • Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve
    accuracy issue on long sequences.
  • Updated jais, mpt, falcon, baichuan, and bloom to work with ALiBI.
    • Due to bloom's 176B parameter size I was unable to test this model.
      Its changes are the simplest though.
  • Works in lazy and eager mode.
  • ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and
    "VLLM_CONTIGUOUS_PA=true".
  • Add position offsets to improve quality on BS > 1 with sequences of
    varying length.
  • BS > 1 may have accuracy issues if on FW < 1.19.0. This is due to
    limitation in softmax. Resolved on FW >= 1.19.0.
  • NTT patch for GQA

Co-authored-by: Tanner Voas [email protected]
Co-authored-by: Haihao Xiang [email protected]
Signed-off-by: Tanner Voas [email protected]

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 6 times, most recently from 4a0674d to 3959126 Compare November 18, 2024 10:33
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 4 times, most recently from b339767 to 3c3e18a Compare November 27, 2024 03:05
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 3c3e18a to 6c19183 Compare November 28, 2024 01:22
@zhouyuan zhouyuan mentioned this pull request Nov 28, 2024
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 6c19183 to 3cb455d Compare December 5, 2024 19:23
@tannervoas742
Copy link
Author

@itaraban @madamczykhabana @kzawora-intel has anyone gotten a chance to review this PR and the associated one on vllm-hpu-extension. I just pushed out a significant update that minimizes changes to non-alibi code sections. It also has significant accuracy and memory optimization changes.

With the current changes ALiBi is now fully functional as long as FW >= 1.19.0 is being used.

Please help review. Any feedback would be appreciated.

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 49fcaaa to 64822b0 Compare December 10, 2024 16:16
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 64822b0 to 684384e Compare December 11, 2024 15:04
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 3 times, most recently from 214885e to d3fa482 Compare December 12, 2024 20:01
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 787d66c to 1c63b12 Compare January 20, 2025 13:26
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 1c63b12 to 2d7b0a3 Compare January 20, 2025 13:32
@tannervoas742
Copy link
Author

@michalkuligowski @kwisniewski98 conflicts have been resolved. Yapf and ruff issues should also be resolved now.

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 2d7b0a3 to ec99176 Compare January 22, 2025 02:32
@PatrykWo
Copy link

PatrykWo commented Feb 5, 2025

@kwisniewski98 please resolve conflicts before merge, plus some tests failed.

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from ec99176 to bd813f2 Compare February 5, 2025 15:46
@tannervoas742
Copy link
Author

@kwisniewski98 please resolve conflicts before merge, plus some tests failed.

Resolved merge conflicts. Waiting for tests to re-run.

@kwisniewski98
Copy link

@tannervoas742 Could you please switch for now vllm-hpu-extension version to branch that you've added to check if tests will go through?

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from bd813f2 to fec0f83 Compare February 6, 2025 10:02
@tannervoas742
Copy link
Author

@tannervoas742 Could you please switch for now vllm-hpu-extension version to branch that you've added to check if tests will go through?

Updated the reference. The cpu-test passed when I ran it locally.

@PatrykWo
Copy link

@tannervoas742 please resolve issue in yapf

yapf.....................................................................Failed

  • hook id: yapf
  • exit code: 1

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from fb5523c to c627f44 Compare February 10, 2025 14:08
@tannervoas742
Copy link
Author

@tannervoas742 please resolve issue in yapf

yapf.....................................................................Failed

  • hook id: yapf
  • exit code: 1

Fixed. I setup my pre-commit hooks locally and they passed now including yapf.

@PatrykWo
Copy link

@tannervoas742 please rebase

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from c627f44 to d15e759 Compare February 13, 2025 11:47
@tannervoas742
Copy link
Author

@tannervoas742 please rebase

Rebased

@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch 2 times, most recently from 847c3f7 to 6100fa0 Compare February 21, 2025 02:20
@tannervoas742
Copy link
Author

Freshly rebased. ALiBi fixes still work. Pre-commit hooks all passed.

Changes:
- Added back alibi biases to decode stage.
- Optimized ALiBI memory usage.
  - Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow
    large models to run with restricted prompt lengths.
  - Prompt biases instantiated once rather than each forward.
  - Prompt and decode biases are shared across encoder/decoder layers.
- Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve
  accuracy issue on long sequences.
- Works in lazy and eager mode.
- ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and
  "VLLM_CONTIGUOUS_PA=true".
- NTT patch for GQA

Co-authored-by: Tanner Voas <[email protected]>
Co-authored-by: Haihao Xiang <[email protected]>
Signed-off-by: Tanner Voas <[email protected]>
@tannervoas742 tannervoas742 force-pushed the restore_alibi_for_flat_pa_final branch from 6100fa0 to c49239f Compare February 26, 2025 02:49
@tannervoas742
Copy link
Author

@kwisniewski98 @PatrykWo @michalkuligowski any progress towards getting this merged? There are customers in PRC who care about ALiBi support in vLLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants