Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] [Core] don't schedule prefill if freeing kv cache #5633

Closed
wants to merge 9 commits into from
19 changes: 11 additions & 8 deletions vllm/core/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -1345,18 +1345,21 @@ def _schedule_chunked_prefill(self) -> SchedulerOutputs:
partial_prefill_metadata=partial_prefill_metadata,
)

# Schedule swapped out requests.
# If preemption happens, it means we don't have space for swap-in.
# If preemption happens, it means we don't have space for other
# requests.
if len(running_scheduled.preempted) + len(
running_scheduled.swapped_out) == 0:
# Schedule swapped out requests.
swapped_in = self._schedule_swapped(budget, curr_loras)

prefills = self._schedule_prefills(
budget,
curr_loras,
enable_chunking=True,
partial_prefill_metadata=partial_prefill_metadata,
)
# Schedule new prefills.
if len(self.swapped) == 0:
prefills = self._schedule_prefills(
budget,
curr_loras,
enable_chunking=True,
partial_prefill_metadata=partial_prefill_metadata,
)

assert (budget.num_batched_tokens
<= self.scheduler_config.max_num_batched_tokens)
Expand Down