ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

bhavyajoshi-mahindra · 2024-10-31T10:24:29Z

I tried to infer my custom Qwen2-VL GPTQ 4bit model using vLLM as mentioned.

from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info

MODEL_PATH = "/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit"

llm = LLM(
    model=MODEL_PATH,
    limit_mm_per_prompt={"image": 10, "video": 10},
)

sampling_params = SamplingParams(
    temperature=0.1,
    top_p=0.001,
    repetition_penalty=1.05,
    max_tokens=256,
    stop_token_ids=[],
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "/content/drive/MyDrive/LLM/test/Vin_2023-12-22_14-47-37.jpg",
                "min_pixels": 224 * 224,
                "max_pixels": 1280 * 28 * 28,
            },
            {"type": "text", "text":
                                    '''
                                    Please extract the Vehicle Sr No, Engine No, and Model from this image.
                                    Response only json format nothing else.
                                    Analyze the font and double check for similar letters such as "V":"U", "8":"S":"0", "R":"P".
                                    '''
             },
        ],
    },
]

processor = AutoProcessor.from_pretrained(MODEL_PATH)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs = process_vision_info(messages)

mm_data = {}
if image_inputs is not None:
    mm_data["image"] = image_inputs
if video_inputs is not None:
    mm_data["video"] = video_inputs

llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text

print(generated_text)

But I got this error:

INFO 10-31 10:18:23 llm_engine.py:243] Initializing an LLM engine (v0.6.3.post2.dev174+g5608e611.d20241031) with config: model='/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit', speculative_config=None, tokenizer='/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
INFO 10-31 10:18:25 model_runner.py:1056] Starting to load model /content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit...
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-3-38eb218e62d2>](https://localhost:8080/#) in <cell line: 7>()
      5 MODEL_PATH = "/content/drive/MyDrive/LLM/vinplate2-gwen2-vl-gptq-4bit"
      6 
----> 7 llm = LLM(
      8     model=MODEL_PATH,
      9     limit_mm_per_prompt={"image": 10, "video": 10},

10 frames
[/content/drive/MyDrive/LLM/vllm/vllm/model_executor/models/qwen2_vl.py](https://localhost:8080/#) in load_weights(self, weights)
   1213                     param = params_dict[name]
   1214                 except KeyError:
-> 1215                     raise ValueError(f"Unexpected weight: {name}") from None
   1216 
   1217                 weight_loader = getattr(param, "weight_loader",

ValueError: Unexpected weight: model.layers.0.mlp.down_proj.g_idx

I have also mentioned the same issue in vLLM.
vllm-project/vllm#9832

The text was updated successfully, but these errors were encountered:

bhavyajoshi-mahindra changed the title ~~Custom Qwen2-VL GPTQ 4bit model inference using vLLM.~~ Custom Qwen2-VL GPTQ 4bit model inference using vLLM, ValueError: Unexpected weight: Oct 31, 2024

bhavyajoshi-mahindra changed the title ~~Custom Qwen2-VL GPTQ 4bit model inference using vLLM, ValueError: Unexpected weight:~~ ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

bhavyajoshi-mahindra commented Oct 31, 2024 •

edited

Loading

ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

ValueError: Unexpected weight while inferencing custom Qwen2-VL GPTQ 4bit using vLLM #503

Comments

bhavyajoshi-mahindra commented Oct 31, 2024 • edited Loading

bhavyajoshi-mahindra commented Oct 31, 2024 •

edited

Loading