-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add vision language model support. #3042
Changes from all commits
682e1a0
72f967a
d2ddd4e
07ee4d2
e657634
afc2078
c90faa9
4933c98
7e12364
096b758
17453ec
a2b2f78
6fc90cf
c61efd8
e98306d
32a6e3f
ed78229
bba6cb2
239c0a9
908798a
408402b
c3ca810
381559c
cd3fdb3
1fb1eac
4436b68
4e21f3a
c32905f
4185777
d722b5b
1d79460
76d0a3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#!/bin/bash | ||
|
||
set -ex | ||
set -o pipefail | ||
|
||
(which wget && which curl) || (apt-get update && apt-get install -y wget curl) | ||
|
||
# aws s3 sync s3://air-example-data-2/vllm_opensource_llava/ images/ | ||
mkdir -p images | ||
cd images | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign_pixel_values.pt | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign_image_features.pt | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom_pixel_values.pt | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom_image_features.pt | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/stop_sign.jpg | ||
wget https://air-example-data-2.s3.us-west-2.amazonaws.com/vllm_opensource_llava/cherry_blossom.jpg | ||
|
||
cd - |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,9 +39,15 @@ steps: | |
|
||
- label: Models Test | ||
commands: | ||
- pytest -v -s models --forked | ||
- bash ../.buildkite/download-images.sh | ||
- pytest -v -s models --ignore=models/test_llava.py --forked | ||
soft_fail: true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oof this is a bit tough because it is soft failed. do you think the test can run on a single L4 (with fp16)? If so maybe we can create another job for the test that are currently passing right now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @simon-mo is Llava support coming? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you say more about this point? I think the current CI should pass. It's still failing on the same huggingface error. I am confused about that. |
||
|
||
- label: Llava Test | ||
commands: | ||
- bash ../.buildkite/download-images.sh | ||
- pytest -v -s models/test_llava.py | ||
|
||
- label: Prefix Caching Test | ||
commands: | ||
- pytest -v -s prefix_caching | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
import argparse | ||
import os | ||
import subprocess | ||
|
||
import torch | ||
|
||
from vllm import LLM | ||
from vllm.sequence import MultiModalData | ||
|
||
# The assets are located at `s3://air-example-data-2/vllm_opensource_llava/`. | ||
|
||
|
||
def run_llava_pixel_values(): | ||
llm = LLM( | ||
model="llava-hf/llava-1.5-7b-hf", | ||
image_input_type="pixel_values", | ||
image_token_id=32000, | ||
image_input_shape="1,3,336,336", | ||
image_feature_size=576, | ||
) | ||
|
||
prompt = "<image>" * 576 + ( | ||
"\nUSER: What is the content of this image?\nASSISTANT:") | ||
|
||
# This should be provided by another online or offline component. | ||
images = torch.load("images/stop_sign_pixel_values.pt") | ||
|
||
outputs = llm.generate(prompt, | ||
multi_modal_data=MultiModalData( | ||
type=MultiModalData.Type.IMAGE, data=images)) | ||
for o in outputs: | ||
generated_text = o.outputs[0].text | ||
print(generated_text) | ||
|
||
|
||
def run_llava_image_features(): | ||
llm = LLM( | ||
model="llava-hf/llava-1.5-7b-hf", | ||
image_input_type="image_features", | ||
image_token_id=32000, | ||
image_input_shape="1,576,1024", | ||
image_feature_size=576, | ||
) | ||
|
||
prompt = "<image>" * 576 + ( | ||
"\nUSER: What is the content of this image?\nASSISTANT:") | ||
|
||
# This should be provided by another online or offline component. | ||
images = torch.load("images/stop_sign_image_features.pt") | ||
|
||
outputs = llm.generate(prompt, | ||
multi_modal_data=MultiModalData( | ||
type=MultiModalData.Type.IMAGE, data=images)) | ||
for o in outputs: | ||
generated_text = o.outputs[0].text | ||
print(generated_text) | ||
|
||
|
||
def main(args): | ||
if args.type == "pixel_values": | ||
run_llava_pixel_values() | ||
else: | ||
run_llava_image_features() | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(description="Demo on Llava") | ||
parser.add_argument("--type", | ||
type=str, | ||
choices=["pixel_values", "image_features"], | ||
default="pixel_values", | ||
help="image input type") | ||
args = parser.parse_args() | ||
# Download from s3 | ||
s3_bucket_path = "s3://air-example-data-2/vllm_opensource_llava/" | ||
local_directory = "images" | ||
|
||
# Make sure the local directory exists or create it | ||
os.makedirs(local_directory, exist_ok=True) | ||
|
||
# Use AWS CLI to sync the directory | ||
subprocess.check_call( | ||
["aws", "s3", "sync", s3_bucket_path, local_directory]) | ||
main(args) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,10 @@ openai | |
requests | ||
ray | ||
peft | ||
awscli | ||
|
||
# Benchmarking | ||
aiohttp | ||
|
||
# Multimodal | ||
pillow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for doing this!