-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1][Help Wanted] Porting missing sampling parameters to V1 #13058
Comments
Seems like good first issues - @WoosukKwon I'd like to take a stab at |
Is this to say that support for logits_processors is going to be permanently dropped in V1, or is it just out of scope for this particular GH issue? |
@22quinn Thanks for volunteering! Could you please submit a PR by EoW? |
We plan to re-design the API for that. We will probably not allow per-request logits processor (because this is too complex and slow). We are exploring other options. Please refer to this comment from @njhill: #12688 (comment) |
We'll still have a logits processor plugin mechanism (not per request) but the interface will be different. I think that @AlpinDale was working on a proposal for this. It might be nice for some of these sampling parameters to be implemented via that same abstraction but assuming it won't be ready before we need them we can plan to retrofit where it makes sense. |
@22quinn, are you still working on this task? If not, I'd be happy to take it up! Let me know what you think. cc @WoosukKwon |
@maliknaik16 Please feel free to take it! @22quinn Let us know if you already have the PR. |
@maliknaik16 @WoosukKwon I'm on it, PR will be up soon |
@22quinn Oh great. Thanks! |
Anything you want to discuss about vllm.
To switch the engine from V0 to V1, we need to comprehensively support the sampling parameters in https://github.com/vllm-project/vllm/blob/main/vllm/sampling_params.py
While most of the key parameters are already supported, some of them are missing:
TODO (help wanted):
n
(parallel sampling) [V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) #10980 @afeldman-nmguided_decoding
(structured decoding) [V1][Core] Structured decoding #12388 @aarnphmlogit_bias
Support logit_bias in v1 Sampler #13079 @houseroadmin_p
[V1][Core] min_p sampling support #13191 @AoyuQCbad_words
(originally implemented via logits processor) [V1] Support bad_words in sampler #13376 @22quinnallowed_token_ids
(originally implemented via logits processor) [v1] Support allowed_token_ids in v1 Sampler #13210 @houseroadParameters that will not be supported in V1:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: