-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] simplify seq group code #9569
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously, we use n > 1
sequence group to trigger swapping. since we don't have n>1
sequence group now, swapping-related tests can be removed.
if isinstance(params, SamplingParams) and params.n > 1: | ||
ParallelSampleSequenceGroup.add_request( | ||
request_id, | ||
self, | ||
params, | ||
processed_inputs=processed_inputs, | ||
arrival_time=arrival_time, | ||
lora_request=lora_request, | ||
trace_headers=trace_headers, | ||
prompt_adapter_request=prompt_adapter_request, | ||
priority=priority, | ||
) | ||
return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the implementation from add_request
to _add_processed_request
, so that async llm engine can also enjoy the benefit.
# TODO: Add support for async for beam search | ||
assert not is_async | ||
|
||
# Process samples | ||
samples = outputs.samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed large chunk of dead code
# All sequences in the group should have the same prompt. | ||
# We use the prompt of an arbitrary sequence. | ||
return self.seqs[0].prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify code, self.first_seq.prompt
is faster than self.seqs[0]
.
more optimizations of this kind will come later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve to unblock the future development. Left some small comments.
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Erkin Sagiroglu <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Shanshan Wang <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Shanshan Wang <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: qishuai <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: NickLucche <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: NickLucche <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Sumit Dubey <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
many if-else condition will be dead after #9302 . we can prune the code.