Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Enhanced support/structure for Multi-modal models #2439

Closed
2 tasks
tp-nan opened this issue Dec 11, 2024 · 1 comment
Closed
2 tasks

[Feature] Enhanced support/structure for Multi-modal models #2439

tp-nan opened this issue Dec 11, 2024 · 1 comment

Comments

@tp-nan
Copy link

tp-nan commented Dec 11, 2024

Checklist

Motivation

In vllm, the framework can accept either image or image-embedding. See vllm [Feature] Add vision language model support. #3042 and vllm-llava impl.

Embedding an image requires fixed predictable compute and is easy to batch and run in a separate framework(for instance, tensorrt-based serving framework). See discussion in vllm-project/vllm#307 (comment)

Ideally, it is necessary to maintain the infrastructure to overlap (image (gpu) preprocessing + inference) and (llm inference) within the same process (avoiding the need for nvidia MPS)

Related resources

No response

@tp-nan
Copy link
Author

tp-nan commented Dec 11, 2024

Sorry, duplicated from #745

@tp-nan tp-nan closed this as completed Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant