You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is", "max_tokens": 1000}'
and the response is:
{"context_logits":0.0,"cum_log_probs":0.0,"generation_logits":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,...,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"the purpose of the meeting? What are the key issues to be discussed? What are the desired outcomes or decisions to be made?\n\n2. Identify the key stakeholders: Who are the key people that need to be involved in the meeting? What are their roles and responsibilities? What are their interests and perspectives?\n\n3. Determine the meeting format: Will the meeting be formal or informal? Will it be a presentation-style meeting or a discussion-style meeting? What is the appropriate level of formality and structure for the meeting?\n\n4. Choose a suitable location: Where will the meeting be held? Is the location easily accessible and comfortable for all attendees?\n\n5. Establish a clear agenda: What specific topics will be discussed during the meeting? What are the desired outcomes or decisions to be made? What are the key points to be covered?\n\n6. Set a time limit: How long will the meeting last? What is the appropriate length of time for the meeting?\n\n7. Identify any necessary materials: What materials or information will be needed during the meeting? Will any presentations or handouts be needed?\n\n8. Choose a suitable time: What is the best time for the meeting? Will all attendees be available at that time?\n\n9. Establish a clear communication plan: How will the meeting be conducted? Will it be in person, via video conference, or via phone? What is the appropriate communication method for the meeting?\n\n10. Identify any necessary follow-up actions: What actions need to be taken after the meeting? Who is responsible for taking these actions? What are the timelines for these actions?\n\nBy following these steps, you can ensure that your meetings are well-planned, productive, and effective."}
I'm also able to run preprocessing model:
curl -X POST localhost:8000/v2/models/preprocessing/generate -d '{"QUERY": "What is", "REQUEST_OUTPUT_LEN": 1000}'
System Info
x86_64
v0.8.0
(docker build viamake -C docker release_build CUDA_ARCHS="86-real"
)r24.02
(docker from NGC)Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I follow official examples for Llama model: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.8.0/examples/llama
I'm able to set everything up, and everything runs smoothly when using the
ensemble
model:and the response is:
I'm also able to run
preprocessing
model:and the response is:
Then, when I try to query the
tensorrt_llm
model directly:I get error:
Triton runs with debug logs on and there is no more information there:
I tried many different request versions, trying to wrap values in lists, etc. without any success.
What I found is that it works if
input_ids
is one element:with response:
Moreover, I'm able to query the
infer
endpoint successfully, like:with response:
I guess it's something simple and I'm querying the endpoint in a wrong way, but I really can't find a solution. Any help would be appreciated.
Expected behavior
generate
endpoint returns correct results.Error message is more meaningful.
actual behavior
generate
endpoint throws an error.additional notes
The text was updated successfully, but these errors were encountered: