Skip to content

Commit

Permalink
Fixes assertion failure in prefix caching: the lora index mapping sho…
Browse files Browse the repository at this point in the history
…uld respect prefix_len (vllm-project#2688)

Signed-off-by: Tao He <[email protected]>
  • Loading branch information
sighingnow authored and jimpang committed Feb 20, 2024
1 parent a90d068 commit 3f4374a
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions vllm/worker/model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,10 +142,10 @@ def _prepare_prompt(
if lora_id > 0:
lora_requests.add(seq_group_metadata.lora_request)

lora_index_mapping.append([lora_id] * prompt_len)
lora_index_mapping.append([lora_id] * (prompt_len - prefix_len))
lora_prompt_mapping.extend(
[lora_id] *
(prompt_len
(prompt_len - prefix_len
if seq_group_metadata.sampling_params.prompt_logprobs else 1))

if seq_group_metadata.block_tables is None:
Expand Down

0 comments on commit 3f4374a

Please sign in to comment.