New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Regression in llama model inference due to #3005 #3282

Closed

Qubitium opened this issue Mar 8, 2024 · 1 comment

Contributor

Qubitium commented Mar 8, 2024

Right now vllm tip/master running llama model result in the following:

File "/root/python/github.com/vllm/vllm/model_executor/layers/attention/backends/flash_attn.py", line 100, in forward
    output = PagedAttentionImpl.forward_prefix(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: PagedAttentionImpl.forward_prefix() takes 7 positional arguments but 9 were given

We have isolated the bug to merged PR #3005

The text was updated successfully, but these errors were encountered:

Collaborator

WoosukKwon commented Mar 12, 2024

Fixed by #3286

WoosukKwon closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment