-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: Llama 3 8B Instruct Inference #4180
Comments
What you are doing here with Generation config support multiple eos. |
It does not appear that the |
thank you so much! |
Based on the sample code on the HF page for LLaMA3 (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it is necessary to manually add a new EOS token as follow.
Therefore, I think this might not be an issue that the vllm team needs to address, but rather something that requires manually adding this EOS token when using vllm to generate with LLaMA3. Here's the sample code for dealing it for batch inference:
|
yes, in the README file, it is said we need to add terminators ([tokenizer.eos_token_id,tokenizer.convert_tokens_to_ids("<|eot_id|>")]) manually to stop the inference. I tried it (but in the mlx-lm library)and it works well. |
How to. use with vllm-openai, please suggest me |
@eav-solution to use with docker image, I think you need to wait for The respective PR #4182 has been merged but it will be released with 0.4.1. You can create a docker image and install vLLM from source, install your model too because you will need to change the Or an alternative solution that is also confirmed by @njhill is to just changing the |
That great! Thanks!!!!!! |
@aliozts Hi Ali, I'm doing the same thing. But I got unexpected answer with <|im_end|> and <|im_start|>. I guess I'm using the wrong chat template. Can you share your chat template if I may ask? thanks a lot |
@ericg108 I'm not using a custom chat template so I wouldn't want to misinform about it. |
vllm v0.3.0 |
This is fantastic, the code works perfectly with this. Do you know how I should modify it if I want to deploy the API using |
@MoGuGuai-hzr You won't need to alter any lines for the server setup. Just add the {
"model": "/your/path/to/meta-llama/Meta-Llama-3-70B-Instruct",
"messages": [{
"role": "user",
"content": "Hello!"
}],
"stop_token_ids": [128001, 128009] // THIS LINE
} Note that these two IDs correspond to |
Nice, the perfect solution. Thank you so much. |
Change the file tokenizer_config.json: chat_template tokenizer_config.json: chat_template eot_id ==> end_of_text (No) In tokenizer_config.json and changing eos_token to <|eot_id|> works. (YES) it work! |
A noob here. Will using openai chat.completions.create's |
The new version of vLLM (https://github.com/vllm-project/vllm/releases/tag/v0.4.1) has been released, which is now compatible with the new llama3's end to turn stop token. |
Hello, may I ask that eos_token should be correct when I use the latest configuration file llama3-instruct at present, which is consistent with the modification in github issue. But still can not stop correctly, may I ask what may be the cause? My vllm version is 0.4.2 Here is my configuration file |
Your current environment
Using the latest version of vLLM on 2 L4 GPUs.
How would you like to use vllm
I was trying to utilize vLLM to deploy meta-llama/Meta-Llama-3-8B-Instruct model and use OpenAI compatible server with the latest docker image. When I did, it was not stopping generation for a while when
max_tokens=None
. I saw that it's generating<|eot_id|>
token which is its eos token apparently but in theirtokenizer_config
and in other configs it is<|end_of_text|>
.I can fix this by setting the
eos_token
parameter intokenizer_config.json
as<|eot_id|>
or usingstop_token_ids
in my request. I wanted to ask the optimal way to solve this problem.There is an existing discussion/PR in their repo which is updating the
generation_config.json
but unless I clone myself, I saw that vLLM does not install thegeneration_config.json
file. I also tried with thisrevision
but it still was not stopping generating after<|eot_id|>
. Moreover, I tried with thisrevision
as well but it did not stop generating as well.tldr; Llama-3-8B-Instruct model does not stop generation because of the
eos token
.generation_config.json
does not work.config.json
also does not work.tokenizer_config.json
works but it overwrites the existingeos_token
. Is this problematic or is there a more elegant way to solve this?May I ask the optimal way to solve this issue?
The text was updated successfully, but these errors were encountered: