[Model] Merged multimodal processor for Paligemma #13584

kylehh · 2025-02-20T03:40:05Z

Created merged MM processor for Paligemma
Issue to be solved:
- need newline at prompt (solved)
- paligemma 2 support (solved by adding token)

Signed-off-by: Kyle Huang <[email protected]>

github-actions · 2025-02-20T03:40:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-02-20T03:40:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylehh.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

kylehh · 2025-02-22T05:42:40Z

Newline issue is solved by overriding _apply_prompt_replacements

kylehh · 2025-02-24T21:50:51Z

paligemma2 is not adding "<bos>" token according to tokenizer config ( compare to paligemma's config here)
So add bos token to adapt with prompt replacment

DarkLight1337 · 2025-02-25T03:42:49Z

vllm/model_executor/models/paligemma.py

+        tokenizer = self.info.get_tokenizer()
+
+        mm_token_matches = {
+            modality: find_token_matches(token_ids, prompt_repls)
+            for modality, prompt_repls in mm_prompt_repls.items()
+        }
+        mm_match_counts = {
+            modality: len(matches)
+            for modality, matches in mm_token_matches.items()
+        }
+
+        # If the search text does not represent a special token,
+        # it may have different token IDs in the prompt, because
+        # the tokens may go across the boundaries of the search text.
+        # ----
+        # e.g. when searching for "foo" in "food", if "food" itself makes
+        # up a token, then the token ID of "foo" will not appear at all
+        # ----
+        # Since it is inefficient to search for all possible tokenizations
+        # of the search text in the prompt, we instead perform string
+        # replacement on the decoded token IDs, then encode them back.
+        if all(
+            mm_match_counts.get(modality, 0) >= item_count
+            for modality, item_count in mm_item_counts.items()
+        ):  # yapf: disable
+            token_ids = replace_token_matches(
+                token_ids,
+                mm_token_matches,
+                mm_item_counts,
+            )

-class PaliGemmaMultiModalProjector(nn.Module):
+            text = decode_tokens(tokenizer, token_ids)
+            matched_repls = {
+                modality: [match.prompt_repl for match in token_matches]
+                for modality, token_matches in mm_token_matches.items()
+            }
+        else:
+            text = decode_tokens(tokenizer, token_ids)
+
+            mm_text_matches = {
+                modality: find_text_matches(text, prompt_repls)
+                for modality, prompt_repls in mm_prompt_repls.items()
+            }
+            text = replace_text_matches(
+                text,
+                mm_text_matches,
+                mm_item_counts,
+            )

-    def __init__(self, vision_hidden_size: int, projection_dim: int):
-        super().__init__()
+            token_ids = encode_tokens(tokenizer,
+                                      text,
+                                      add_special_tokens=False)
+            matched_repls = {
+                modality: [match.prompt_repl for match in token_matches]
+                for modality, token_matches in mm_text_matches.items()
+            }
+
+        placeholders = self._find_mm_placeholders(
+            matched_repls,
+            token_ids,
+            mm_item_counts,
+        )


Can we simply call super()._apply_prompt_replacements?

mergify · 2025-02-25T03:43:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylehh.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-02-25T03:43:42Z

Please remember to test this by adding the model to tests/models/multimodal/processing/test_common.py

kylehh added 3 commits February 19, 2025 13:59

init commit

aebfb23

test

4f192aa

code clean

689cca6

Signed-off-by: Kyle Huang <[email protected]>

mergify bot added the needs-rebase label Feb 20, 2025

This was referenced Feb 20, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

kylehh added 2 commits February 21, 2025 21:37

apply newline at the end of the prompt

2c1218a

Merge branch 'main' into pali-mm-processor

6281c25

mergify bot removed the needs-rebase label Feb 22, 2025

DarkLight1337 self-assigned this Feb 22, 2025

adding <bos> for paligemma2

0df67c6

DarkLight1337 reviewed Feb 25, 2025

View reviewed changes

mergify bot added the needs-rebase label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Merged multimodal processor for Paligemma #13584

[Model] Merged multimodal processor for Paligemma #13584

kylehh commented Feb 20, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 20, 2025

mergify bot commented Feb 20, 2025

kylehh commented Feb 22, 2025

kylehh commented Feb 24, 2025

DarkLight1337 Feb 25, 2025

mergify bot commented Feb 25, 2025

DarkLight1337 commented Feb 25, 2025

[Model] Merged multimodal processor for Paligemma #13584

Are you sure you want to change the base?

[Model] Merged multimodal processor for Paligemma #13584

Conversation

kylehh commented Feb 20, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 20, 2025

mergify bot commented Feb 20, 2025

kylehh commented Feb 22, 2025

kylehh commented Feb 24, 2025

DarkLight1337 Feb 25, 2025

Choose a reason for hiding this comment

mergify bot commented Feb 25, 2025

DarkLight1337 commented Feb 25, 2025

kylehh commented Feb 20, 2025 •

edited by github-actions bot

Loading