Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM with OpenVINO is throwing error with ^0.7.0 upgrade #12786

Closed
1 task done
gavrissh opened this issue Feb 5, 2025 · 5 comments
Closed
1 task done

[Bug]: vLLM with OpenVINO is throwing error with ^0.7.0 upgrade #12786

gavrissh opened this issue Feb 5, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@gavrissh
Copy link

gavrissh commented Feb 5, 2025

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here
No module named 'vllm._version'
  from vllm.version import __version__ as VLLM_VERSION
INFO 02-05 18:37:22 __init__.py:183] Automatically detected platform openvino.
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.5.0-6ubuntu2) 9.5.0
Clang version: Could not collect
CMake version: version 3.31.4
Libc version: glibc-2.39

Python version: 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-48-generic-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               24
On-line CPU(s) list:                  0-23
Vendor ID:                            AuthenticAMD
Model name:                           AMD EPYC-Rome-v4 Processor (no XSAVES)
CPU family:                           23
Model:                                49
Thread(s) per core:                   1
Core(s) per socket:                   1
Socket(s):                            24
Stepping:                             0
BogoMIPS:                             4992.49
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat umip rdpid
Hypervisor vendor:                    KVM
Virtualization type:                  full
L1d cache:                            768 KiB (24 instances)
L1i cache:                            768 KiB (24 instances)
L2 cache:                             12 MiB (24 instances)
L3 cache:                             384 MiB (24 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-23
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] flake8==7.1.1
[pip3] mypy==0.991
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.570.86
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] onnx==1.17.0
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.48.2
[pip3] triton==3.1.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A (dev)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

LD_LIBRARY_PATH=/home/lib/python3.12/site-packages/cv2/../../lib64:
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

vllm serve meta-llama/Llama-3.2-1B-Instruct
INFO 02-05 18:41:30 __init__.py:183] Automatically detected platform openvino.
INFO 02-05 18:41:31 api_server.py:838] vLLM API server version 0.7.1
INFO 02-05 18:41:31 api_server.py:839] args: Namespace(subparser='serve', model_tag='meta-llama/Llama-3.2-1B-Instruct', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key='123', lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='meta-llama/Llama-3.2-1B-Instruct', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function serve at 0x71c1d786d620>)
INFO 02-05 18:41:31 api_server.py:204] Started engine process with PID 2295232
INFO 02-05 18:41:34 __init__.py:183] Automatically detected platform openvino.
INFO 02-05 18:41:39 config.py:526] This model supports multiple tasks: {'classify', 'generate', 'score', 'reward', 'embed'}. Defaulting to 'generate'.
WARNING 02-05 18:41:39 arg_utils.py:1129] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
WARNING 02-05 18:41:39 config.py:662] Async output processing is not supported on the current platform type openvino.
WARNING 02-05 18:41:39 openvino.py:80] Only float32 dtype is supported on OpenVINO, casting from torch.bfloat16.
WARNING 02-05 18:41:39 openvino.py:85] CUDA graph is not supported on OpenVINO backend, fallback to the eager mode.
INFO 02-05 18:41:39 openvino.py:119] OpenVINO CPU optimal block size is 32, overriding currently set 16
WARNING 02-05 18:41:39 openvino.py:134] Environment variable VLLM_OPENVINO_KVCACHE_SPACE (GB) for OpenVINO backend is not set, using 4 by default.
INFO 02-05 18:41:43 config.py:526] This model supports multiple tasks: {'score', 'classify', 'generate', 'reward', 'embed'}. Defaulting to 'generate'.
WARNING 02-05 18:41:43 arg_utils.py:1129] The model has a long context length (131072). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
WARNING 02-05 18:41:43 config.py:662] Async output processing is not supported on the current platform type openvino.
WARNING 02-05 18:41:43 openvino.py:80] Only float32 dtype is supported on OpenVINO, casting from torch.bfloat16.
WARNING 02-05 18:41:43 openvino.py:85] CUDA graph is not supported on OpenVINO backend, fallback to the eager mode.
INFO 02-05 18:41:43 openvino.py:119] OpenVINO CPU optimal block size is 32, overriding currently set 16
WARNING 02-05 18:41:43 openvino.py:134] Environment variable VLLM_OPENVINO_KVCACHE_SPACE (GB) for OpenVINO backend is not set, using 4 by default.
INFO 02-05 18:41:43 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='meta-llama/Llama-3.2-1B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.2-1B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float32, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=<Type: 'float16'>,  device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=meta-llama/Llama-3.2-1B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, 
No CUDA runtime is found, using CUDA_HOME='/usr'
INFO 02-05 18:41:45 openvino.py:36] Cannot use None backend on OpenVINO.
INFO 02-05 18:41:45 openvino.py:37] Using OpenVINO Attention backend.
WARNING 02-05 18:41:45 _custom_ops.py:19] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 02-05 18:41:45 config.py:3414] Current VLLM config is not set.
ERROR 02-05 18:41:45 engine.py:387] 'NoneType' object has no attribute 'dtype'
ERROR 02-05 18:41:45 engine.py:387] Traceback (most recent call last):
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
ERROR 02-05 18:41:45 engine.py:387]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-05 18:41:45 engine.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
ERROR 02-05 18:41:45 engine.py:387]     return cls(ipc_path=ipc_path,
ERROR 02-05 18:41:45 engine.py:387]            ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in __init__
ERROR 02-05 18:41:45 engine.py:387]     self.engine = LLMEngine(*args, **kwargs)
ERROR 02-05 18:41:45 engine.py:387]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 271, in __init__
ERROR 02-05 18:41:45 engine.py:387]     self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-05 18:41:45 engine.py:387]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 49, in __init__
ERROR 02-05 18:41:45 engine.py:387]     self._init_executor()
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 40, in _init_executor
ERROR 02-05 18:41:45 engine.py:387]     self.collective_rpc("load_model")
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc
ERROR 02-05 18:41:45 engine.py:387]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 02-05 18:41:45 engine.py:387]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/utils.py", line 2208, in run_method
ERROR 02-05 18:41:45 engine.py:387]     return func(*args, **kwargs)
ERROR 02-05 18:41:45 engine.py:387]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/worker/openvino_worker.py", line 253, in load_model
ERROR 02-05 18:41:45 engine.py:387]     self.model_runner.load_model()
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/worker/openvino_model_runner.py", line 82, in load_model
ERROR 02-05 18:41:45 engine.py:387]     self.model = get_model(model_config=self.model_config,
ERROR 02-05 18:41:45 engine.py:387]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/model_executor/model_loader/openvino.py", line 202, in get_model
ERROR 02-05 18:41:45 engine.py:387]     return OpenVINOCausalLM(ov_core, model_config, device_config,
ERROR 02-05 18:41:45 engine.py:387]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/model_executor/model_loader/openvino.py", line 108, in __init__
ERROR 02-05 18:41:45 engine.py:387]     self.logits_processor = LogitsProcessor(
ERROR 02-05 18:41:45 engine.py:387]                             ^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/model_executor/layers/logits_processor.py", line 48, in __init__
ERROR 02-05 18:41:45 engine.py:387]     parallel_config = get_current_vllm_config().parallel_config
ERROR 02-05 18:41:45 engine.py:387]                       ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/config.py", line 3416, in get_current_vllm_config
ERROR 02-05 18:41:45 engine.py:387]     return VllmConfig()
ERROR 02-05 18:41:45 engine.py:387]            ^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387]   File "<string>", line 19, in __init__
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/config.py", line 3253, in __post_init__
ERROR 02-05 18:41:45 engine.py:387]     current_platform.check_and_update_config(self)
ERROR 02-05 18:41:45 engine.py:387]   File "/home/lib/python3.12/site-packages/vllm/platforms/openvino.py", line 79, in check_and_update_config
ERROR 02-05 18:41:45 engine.py:387]     if model_config.dtype != torch.float32:
ERROR 02-05 18:41:45 engine.py:387]        ^^^^^^^^^^^^^^^^^^
ERROR 02-05 18:41:45 engine.py:387] AttributeError: 'NoneType' object has no attribute 'dtype'
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine
    raise e
  File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
    return cls(ipc_path=ipc_path,
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 271, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 49, in __init__
    self._init_executor()
  File "/home/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 40, in _init_executor
    self.collective_rpc("load_model")
  File "/home/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/utils.py", line 2208, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/worker/openvino_worker.py", line 253, in load_model
    self.model_runner.load_model()
  File "/home/lib/python3.12/site-packages/vllm/worker/openvino_model_runner.py", line 82, in load_model
    self.model = get_model(model_config=self.model_config,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/model_executor/model_loader/openvino.py", line 202, in get_model
    return OpenVINOCausalLM(ov_core, model_config, device_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/model_executor/model_loader/openvino.py", line 108, in __init__
    self.logits_processor = LogitsProcessor(
                            ^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/model_executor/layers/logits_processor.py", line 48, in __init__
    parallel_config = get_current_vllm_config().parallel_config
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/config.py", line 3416, in get_current_vllm_config
    return VllmConfig()
           ^^^^^^^^^^^^
  File "<string>", line 19, in __init__
  File "/home/lib/python3.12/site-packages/vllm/config.py", line 3253, in __post_init__
    current_platform.check_and_update_config(self)
  File "/home/lib/python3.12/site-packages/vllm/platforms/openvino.py", line 79, in check_and_update_config
    if model_config.dtype != torch.float32:
       ^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'dtype'
Traceback (most recent call last):
  File "/home/bin/vllm", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/scripts.py", line 202, in main
    args.dispatch_function(args)
  File "/home/lib/python3.12/site-packages/vllm/scripts.py", line 42, in serve
    uvloop.run(run_server(args))
  File "/home/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 873, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 134, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 228, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@gavrissh gavrissh added the bug Something isn't working label Feb 5, 2025
@gavrissh gavrissh changed the title [Bug]: vLLM with OpenVINO is throwing error with 0.7.0 upgrade [Bug]: vLLM with OpenVINO is throwing error with ^0.7.0 upgrade Feb 5, 2025
@mgoin
Copy link
Member

mgoin commented Feb 5, 2025

cc @ilya-lavrenov @helena-intel

@ilya-lavrenov
Copy link
Contributor

I suppose it's fixed by #12750 ?

@gavrissh
Copy link
Author

gavrissh commented Feb 6, 2025

Will it be part of 7.2 release? Do we know the timeline?

@ilya-lavrenov
Copy link
Contributor

According to diff v0.7.1...v0.7.2 - yes, it will be a part of 0.7.2

@hmellor
Copy link
Collaborator

hmellor commented Feb 11, 2025

Closing as solved as v0.7.2 is now released!

@hmellor hmellor closed this as completed Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants