Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM #12069

Merged
merged 118 commits into from
Jan 29, 2025

Conversation

HwwwwwwwH
Copy link
Contributor

@HwwwwwwwH HwwwwwwwH commented Jan 15, 2025

This PR aims to adapt and support all the features of MiniCPM-V and MiniCPM-o. It is designed to be compatible with various modalities (image, video, audio), different model versions (2.0, 2.5, 2.6, o), and diverse input types (raw, embeddings), while maintaining support for LORA, which might require significant effort.

Below is the roadmap for this PR:

  • Refactor the input processor code of MiniCPM-V for MultiModalInputsV2 of vLLM.
    • Support for image and video inputs.
    • Support for image embeddings inputs.
    • Support for video embeddings inputs.
    • Previous supports for MiniCPM-o.
    • Verify LORA support.
  • Adapt new features of MiniCPM-o.
    • Support for audio and audio embeddings inputs.
    • Support for image and audio interleave inputs.
    • Support for audio outputs (using hidden states)[Furture work].
    • Streaming multimodal inputs (may be complex; consider starting a new PR for this feature in the future)[Furture work].

This PR is still in development. Once I complete the support for audio, I will request to merge. I'll get this work done ASAP.

FIX #12162

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@ywang96
Copy link
Member

ywang96 commented Jan 15, 2025

Really appreciate your effort planned on this PR!

Support for audio outputs (using hidden states).
Streaming multimodal inputs (may be complex; consider starting a new PR for this feature in the future).

It would be great if you can share some design decisions for the these two items as an RFC (or two separate RFCs) first before we proceed with implementation. We (vLLM team) are also thinking about how we want to support multimodal output and streaming/realtime API on vLLM so it's probably the best time for us to discuss these items!

@HwwwwwwwH
Copy link
Contributor Author

Really appreciate your effort planned on this PR!

Support for audio outputs (using hidden states).
Streaming multimodal inputs (may be complex; consider starting a new PR for this feature in the future).

It would be great if you can share some design decisions for the these two items as an RFC (or two separate RFCs) first before we proceed with implementation. We (vLLM team) are also thinking about how we want to support multimodal output and streaming/realtime API on vLLM so it's probably the best time for us to discuss these items!

Thank you for suggestion! I'll start these two RFCs tomorrow.

@HwwwwwwwH
Copy link
Contributor Author

@DarkLight1337 I think I might need some help for verifying LoRA support. Should I do any changes for it?

@DarkLight1337
Copy link
Member

@jeejeelee can help with this. Please keep in mind though that currently LoRA is only supported for the language part of multi-modal models.

HwwwwwwwH and others added 19 commits January 22, 2025 14:28
…ect#11921)

Signed-off-by: shaochangxu.scx <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Signed-off-by: hzh <[email protected]>
…roject#11100)

Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Signed-off-by: hzh <[email protected]>
Copy link

mergify bot commented Jan 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HwwwwwwwH.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot removed the needs-rebase label Jan 28, 2025
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM and I have left a comment! I think this version is good to be merged!

Copy link

mergify bot commented Jan 29, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HwwwwwwwH.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 29, 2025
@mergify mergify bot removed the needs-rebase label Jan 29, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) January 29, 2025 04:57
@DarkLight1337 DarkLight1337 merged commit d93bf4d into vllm-project:main Jan 29, 2025
73 checks passed
@HwwwwwwwH
Copy link
Contributor Author

Finally get it merged, thanks for the help from @ywang96 @DarkLight1337 ! Then I 'll work on V1 support of MiniCPMV(O).

rasmith pushed a commit to rasmith/vllm that referenced this pull request Jan 30, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Isotr0py added a commit to Isotr0py/vllm that referenced this pull request Feb 2, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
@jasstionzyf
Copy link

@HwwwwwwwH
Cannot use FlashAttention-2 backend for head size 72 for miniCPM-V 2.6
#12656
if can not use FlashAttention-2 , much slow when generate

NickLucche added a commit to NickLucche/vllm that referenced this pull request Feb 7, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
ShangmingCai pushed a commit to ShangmingCai/vllm that referenced this pull request Feb 10, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
GWS0428 pushed a commit to GWS0428/VARserve that referenced this pull request Feb 12, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
@wangyuanxiong-hub
Copy link

The adaptation is timely, but the lack of support for streaming multimodal inputs currently results in slower vision inference speeds.

@wangyuanxiong-hub
Copy link

The adaptation is timely, but the lack of support for streaming multimodal inputs currently results in slower vision inference speeds.适应是及时的,但缺乏支持流的多式联运输入目前的结果在慢的愿景推断的速度。

Here is the transformer implementation https://github.com/thanhnienyeumeo/minicpm-o

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
@Jiltseb
Copy link

Jiltseb commented Feb 20, 2025

@HwwwwwwwH @ywang96

How is VLLM omni-mode support is implemented, multimodal support is a crucial part of this model.

With transformers below part implements it.

sys_msg = model.get_sys_prompt(mode='omni', language='en')
contents = get_video_chunk_content(video_path) #contains both audio and image 1sec contents TDM
msg = {"role":"user", "content": contents}

@HwwwwwwwH
Copy link
Contributor Author

@HwwwwwwwH @ywang96

How is VLLM omni-mode support is implemented, multimodal support is a crucial part of this model.

With transformers below part implements it.

sys_msg = model.get_sys_prompt(mode='omni', language='en')
contents = get_video_chunk_content(video_path) #contains both audio and image 1sec contents TDM
msg = {"role":"user", "content": contents}

You need to split and concatenate video frames and audio chunks by your self and use prompt like:

[(<video>./</video>)(<audio>./</audio>)] * length

@Jiltseb
Copy link

Jiltseb commented Feb 20, 2025

@HwwwwwwwH Great, there will be a corresponding multimodal data containing multiple images and audios (1sec each). I got it working for a single image, audio pair via:

input_data = {
    "prompt": prompt,
    "multi_modal_data": contents[0]
}

where
contents = [{'image': Image obj, 'audio': audio_array},{'image': Image obj, 'audio': audio_array},{'image': Image obj, 'audio': audio_array},...]
How can I run a single generation with multiple image/audio pairs?

@wangyuanxiong-hub
Copy link

@HwwwwwwwH Great, there will be a corresponding multimodal data containing multiple images and audios (1sec each). I got it working for a single image, audio pair via:

input_data = {
    "prompt": prompt,
    "multi_modal_data": contents[0]
}

where contents = [{'image': Image obj, 'audio': audio_array},{'image': Image obj, 'audio': audio_array},{'image': Image obj, 'audio': audio_array},...] How can I run a single generation with multiple image/audio pairs?

same question

kerthcet pushed a commit to kerthcet/vllm that referenced this pull request Feb 21, 2025
…LM (vllm-project#12069)

Signed-off-by: hzh <[email protected]>
Signed-off-by: Sungjae Lee <[email protected]>
Signed-off-by: shaochangxu.scx <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Akshat Tripathi <[email protected]>
Signed-off-by: Oleg Mosalov <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Yida Wu <[email protected]>
Signed-off-by: Chenguang Li <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: elijah <[email protected]>
Signed-off-by: Yikun <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Konrad Zawora <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: Rui Qiao <[email protected]>
Co-authored-by: Sungjae Lee <[email protected]>
Co-authored-by: shaochangxu <[email protected]>
Co-authored-by: shaochangxu.scx <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Nicolò Lucchesi <[email protected]>
Co-authored-by: sixgod <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Akshat Tripathi <[email protected]>
Co-authored-by: Oleg Mosalov <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: Avshalom Manevich <[email protected]>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <[email protected]>
Co-authored-by: Siyuan Li <[email protected]>
Co-authored-by: Concurrensee <[email protected]>
Co-authored-by: Chenguang Li <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Co-authored-by: elijah <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Co-authored-by: Steve Luo <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Konrad Zawora <[email protected]>
Co-authored-by: TJian <[email protected]>
Co-authored-by: tjtanaa <[email protected]>
Co-authored-by: wangxiyuan <[email protected]>
Co-authored-by: maang-h <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Rui Qiao <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: openbmb/MiniCPM-o-2_6