You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Embedding an image requires fixed predictable compute and is easy to batch and run in a separate framework(for instance, tensorrt-based serving framework). See discussion in vllm-project/vllm#307 (comment)
Ideally, it is necessary to maintain the infrastructure to overlap (image (gpu) preprocessing + inference) and (llm inference) within the same process (avoiding the need for nvidia MPS)
Related resources
No response
The text was updated successfully, but these errors were encountered:
Checklist
Motivation
In vllm, the framework can accept either image or image-embedding. See vllm [Feature] Add vision language model support. #3042 and vllm-llava impl.
Embedding an image requires fixed predictable compute and is easy to batch and run in a separate framework(for instance, tensorrt-based serving framework). See discussion in vllm-project/vllm#307 (comment)
Ideally, it is necessary to maintain the infrastructure to overlap (image (gpu) preprocessing + inference) and (llm inference) within the same process (avoiding the need for nvidia MPS)
Related resources
No response
The text was updated successfully, but these errors were encountered: