-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolved ALIBI bias regression due to porting flat PA #503
base: habana_main
Are you sure you want to change the base?
Resolved ALIBI bias regression due to porting flat PA #503
Conversation
4a0674d
to
3959126
Compare
b339767
to
3c3e18a
Compare
3c3e18a
to
6c19183
Compare
6c19183
to
3cb455d
Compare
@itaraban @madamczykhabana @kzawora-intel has anyone gotten a chance to review this PR and the associated one on vllm-hpu-extension. I just pushed out a significant update that minimizes changes to non-alibi code sections. It also has significant accuracy and memory optimization changes. With the current changes ALiBi is now fully functional as long as FW >= 1.19.0 is being used. Please help review. Any feedback would be appreciated. |
49fcaaa
to
64822b0
Compare
64822b0
to
684384e
Compare
214885e
to
d3fa482
Compare
787d66c
to
1c63b12
Compare
1c63b12
to
2d7b0a3
Compare
@michalkuligowski @kwisniewski98 conflicts have been resolved. Yapf and ruff issues should also be resolved now. |
2d7b0a3
to
ec99176
Compare
@kwisniewski98 please resolve conflicts before merge, plus some tests failed. |
ec99176
to
bd813f2
Compare
Resolved merge conflicts. Waiting for tests to re-run. |
@tannervoas742 Could you please switch for now vllm-hpu-extension version to branch that you've added to check if tests will go through? |
bd813f2
to
fec0f83
Compare
Updated the reference. The cpu-test passed when I ran it locally. |
fec0f83
to
fb5523c
Compare
@tannervoas742 please resolve issue in yapf yapf.....................................................................Failed
|
fb5523c
to
c627f44
Compare
Fixed. I setup my pre-commit hooks locally and they passed now including yapf. |
@tannervoas742 please rebase |
c627f44
to
d15e759
Compare
Rebased |
847c3f7
to
6100fa0
Compare
Freshly rebased. ALiBi fixes still work. Pre-commit hooks all passed. |
Changes: - Added back alibi biases to decode stage. - Optimized ALiBI memory usage. - Added environment variable "VLLM_PROMPT_ALIBI_MAX_SEQ_LEN" to allow large models to run with restricted prompt lengths. - Prompt biases instantiated once rather than each forward. - Prompt and decode biases are shared across encoder/decoder layers. - Added environment variable "VLLM_ALIBI_USE_FLOAT32_BIASES" to resolve accuracy issue on long sequences. - Works in lazy and eager mode. - ALiBI is restricted to "VLLM_PROMPT_USE_FUSEDSDPA=false", and "VLLM_CONTIGUOUS_PA=true". - NTT patch for GQA Co-authored-by: Tanner Voas <[email protected]> Co-authored-by: Haihao Xiang <[email protected]> Signed-off-by: Tanner Voas <[email protected]>
6100fa0
to
c49239f
Compare
@kwisniewski98 @PatrykWo @michalkuligowski any progress towards getting this merged? There are customers in PRC who care about ALiBi support in vLLM. |
Requires associated changes on vllm-hpu-extension PR
Changes:
large models to run with restricted prompt lengths.
forward.
accuracy issue on long sequences.
Its changes are the simplest though.
"VLLM_CONTIGUOUS_PA=true".
varying length.
limitation in softmax. Resolved on FW >= 1.19.0.
Co-authored-by: Tanner Voas [email protected]
Co-authored-by: Haihao Xiang [email protected]
Signed-off-by: Tanner Voas [email protected]