Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

Open
bhavyajoshi-mahindra opened this issue Oct 31, 2024 · 0 comments

Comments

@bhavyajoshi-mahindra
Copy link

bhavyajoshi-mahindra commented Oct 31, 2024

I tried to infer my custom Qwen2-VL GPTQ 4bit model using vLLM as mentioned.

from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info

MODEL_PATH = "/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit"

llm = LLM(
    model=MODEL_PATH,
    limit_mm_per_prompt={"image": 10, "video": 10},
)

sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.001,
    repetition_penalty=1.05,
    max_tokens=256,
    stop_token_ids=[],
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "/content/drive/MyDrive/LLM/test/Vin_2023-12-22_14-47-37.jpg",
                "min_pixels": 224 * 224,
                "max_pixels": 1280 * 28 * 28,
            },
            {"type": "text", "text":
                                    '''
                                    Please extract the Vehicle Sr No, Engine No, and Model from this image.
                                    Response only json format nothing else.
                                    Analyze the font and double check for similar letters such as "V":"U", "8":"S":"0", "R":"P".
                                    '''
             },
        ],
    },
]

processor = AutoProcessor.from_pretrained(MODEL_PATH)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs = process_vision_info(messages)

mm_data = {}
if image_inputs is not None:
    mm_data["image"] = image_inputs
if video_inputs is not None:
    mm_data["video"] = video_inputs

llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text

print(generated_text)

But I got this error:

INFO 10-31 10:18:23 llm_engine.py:243] Initializing an LLM engine (v0.6.3.post2.dev174+g5608e611.d20241031) with config: model='/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit', speculative_config=None, tokenizer='/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
INFO 10-31 10:18:25 model_runner.py:1056] Starting to load model /content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit...
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-3-38eb218e62d2>](https://localhost:8080/#) in <cell line: 7>()
      5 MODEL_PATH = "/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit"
      6 
----> 7 llm = LLM(
      8     model=MODEL_PATH,
      9     limit_mm_per_prompt={"image": 10, "video": 10},

10 frames
[/content/drive/MyDrive/LLM/vllm/vllm/model_executor/models/qwen2_vl.py](https://localhost:8080/#) in load_weights(self, weights)
   1213                     param = params_dict[name]
   1214                 except KeyError:
-> 1215                     raise ValueError(f"Unexpected weight: {name}") from None
   1216 
   1217                 weight_loader = getattr(param, "weight_loader",

ValueError: Unexpected weight: model.layers.0.mlp.down_proj.g_idx

I have also mentioned the same issue in vLLM.
vllm-project/vllm#9832

@bhavyajoshi-mahindra bhavyajoshi-mahindra changed the title Custom Qwen2-VL GPTQ 4bit model inference using vLLM. Custom Qwen2-VL GPTQ 4bit model inference using vLLM, ValueError: Unexpected weight: Oct 31, 2024
@bhavyajoshi-mahindra bhavyajoshi-mahindra changed the title Custom Qwen2-VL GPTQ 4bit model inference using vLLM, ValueError: Unexpected weight: ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant