You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO 02-16 02:29:16 init.py:183] Automatically detected platform cuda.
INFO 02-16 02:29:16 api_server.py:838] vLLM API server version 0.7.1
INFO 02-16 02:29:16 api_server.py:839] args: Namespace(subparser='serve', model_tag='/vllm-workspace/deepseek-r1', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='/vllm-workspace/deepseek-r1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=4096, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, pipeline_parallel_size=2, tensor_parallel_size=8, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=True, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['deepseek-r1'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function serve at 0x7f1254b0d3a0>)
INFO 02-16 02:29:16 config.py:135] Replacing legacy 'type' key with 'rope_type'
INFO 02-16 02:29:20 config.py:526] This model supports multiple tasks: {'score', 'reward', 'generate', 'classify', 'embed'}. Defaulting to 'generate'.
INFO 02-16 02:29:21 config.py:1383] Defaulting to use ray for distributed inference
INFO 02-16 02:29:21 config.py:1538] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 02-16 02:29:21 config.py:653] Async output processing can not be enabled with pipeline parallel
WARNING 02-16 02:29:21 fp8.py:50] Detected fp8 checkpoint. Please note that the format is experimental and subject to change.
INFO 02-16 02:29:21 config.py:3257] MLA is enabled; forcing chunked prefill and prefix caching to be disabled.
INFO 02-16 02:29:21 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='/vllm-workspace/deepseek-r1', speculative_config=None, tokenizer='/vllm-workspace/deepseek-r1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=2, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-r1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
2025-02-16 02:29:21,888 INFO worker.py:1654 -- Connecting to existing Ray cluster at address: vllm-head-service.maas.svc.cluster.local:6379...
2025-02-16 02:29:21,931 INFO worker.py:1832 -- Connected to Ray cluster. View the dashboard at http://10.233.92.44:8265/
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,228 E 630 630] logging.cc:108: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,258 E 630 630] logging.cc:115: Stack trace:
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x11f785a) [0x7fcfdbab985a] ray::operator<<()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x11fadf2) [0x7fcfdbabcdf2] ray::TerminateHandler()
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7fcfda73a20c]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7fcfda73a277]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7fcfda73a4d8]
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x6b74c0) [0x7fcfdaf794c0] boost::throw_exception<>()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1271c2b) [0x7fcfdbb33c2b] boost::asio::detail::do_throw_error()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x127264b) [0x7fcfdbb3464b] boost::asio::detail::posix_thread::start_thread()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1272aac) [0x7fcfdbb34aac] boost::asio::thread_pool::thread_pool()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc87a14) [0x7fcfdb549a14] ray::rpc::(anonymous namespace)::_GetServerCallExecutor()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray3rpc21GetServerCallExecutorEv+0x9) [0x7fcfdb549aa9] ray::rpc::GetServerCallExecutor()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(ZNSt17_Function_handlerIFvN3ray6StatusESt8functionIFvvEES4_EZNS0_3rpc14ServerCallImplINS6_24CoreWorkerServiceHandlerENS6_15PushTaskRequestENS6_13PushTaskReplyELNS6_8AuthTypeE0EE17HandleRequestImplEbEUlS1_S4_S4_E0_E9_M_invokeERKSt9_Any_dataOS1_OS4_SJ+0x12b) [0x7fcfdb1e141b] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x967e36) [0x7fcfdb229e36] ray::core::TaskReceiver::HandleTask()::{lambda()https://github.com/vllm-project/vllm/pull/1}::operator()()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x968e5a) [0x7fcfdb22ae5a] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x970192) [0x7fcfdb232192] ray::core::InboundRequest::Accept()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x98c7cd) [0x7fcfdb24e7cd] ray::core::NormalSchedulingQueue::ScheduleRequests()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc9e728) [0x7fcfdb560728] EventTracker::RecordExecution()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc996fe) [0x7fcfdb55b6fe] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc99b76) [0x7fcfdb55bb76] boost::asio::detail::completion_handler<>::do_complete()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x126f2bb) [0x7fcfdbb312bb] boost::asio::detail::scheduler::do_run_one()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1270c39) [0x7fcfdbb32c39] boost::asio::detail::scheduler::run()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1271342) [0x7fcfdbb33342] boost::asio::io_context::run()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x117) [0x7fcfdb171407] ray::core::CoreWorker::RunTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x7fcfdb22e971] ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess20RunTaskExecutionLoopEv+0x1d) [0x7fcfdb22eb8d] ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x725467) [0x7fcfdafe7467] __pyx_pw_3ray_7_raylet_10CoreWorker_5run_task_loop()
(bundle_reservation_check_func pid=630) ray::IDLE() [0x582f0c]
(bundle_reservation_check_func pid=630) ray::IDLE(PyObject_Vectorcall+0x36) [0x56ca46] PyObject_Vectorcall
(bundle_reservation_check_func pid=630) ray::IDLE(_PyEval_EvalFrameDefault+0x705) [0x553785] _PyEval_EvalFrameDefault
(bundle_reservation_check_func pid=630) ray::IDLE(PyEval_EvalCode+0x99) [0x6261d9] PyEval_EvalCode
(bundle_reservation_check_func pid=630) ray::IDLE() [0x64c93b]
(bundle_reservation_check_func pid=630) ray::IDLE() [0x647bb6]
(bundle_reservation_check_func pid=630) ray::IDLE() [0x65fdf5]
(bundle_reservation_check_func pid=630) ray::IDLE(_PyRun_SimpleFileObject+0x1a5) [0x65f3c5] _PyRun_SimpleFileObject
(bundle_reservation_check_func pid=630) ray::IDLE(_PyRun_AnyFileObject+0x47) [0x65f057] _PyRun_AnyFileObject
(bundle_reservation_check_func pid=630) ray::IDLE(Py_RunMain+0x2e8) [0x658138] Py_RunMain
(bundle_reservation_check_func pid=630) ray::IDLE(Py_BytesMain+0x2d) [0x611aad] Py_BytesMain
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fcfdc71fd90]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fcfdc71fe40] __libc_start_main
(bundle_reservation_check_func pid=630) ray::IDLE(_start+0x25) [0x611925] _start
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) *** SIGABRT received at time=1739701763 on cpu 167 ***
(bundle_reservation_check_func pid=630) PC: @ 0x7fcfdc78c9fc (unknown) pthread_kill
(bundle_reservation_check_func pid=630) @ 0x7fcfdc738520 (unknown) (unknown)
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: *** SIGABRT received at time=1739701763 on cpu 167 ***
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: PC: @ 0x7fcfdc78c9fc (unknown) pthread_kill
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: @ 0x7fcfdc738520 (unknown) (unknown)
(bundle_reservation_check_func pid=630) Fatal Python error: Aborted
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) Stack (most recent call first):
(bundle_reservation_check_func pid=630) File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 935 in main_loop
(bundle_reservation_check_func pid=630) File "/usr/local/lib/python3.12/dist-packages/ray/_private/workers/default_worker.py", line 297 in
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet (total: 11)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: b75636ea82a1234d33406a784cbd4744f757ab5801000000 Worker ID: 672b001daae17f75e8772ea31befb22d0ec222eb8347b05a1d6491d9 Node ID: b39138d229dbb2aa8d7b088f3f59c5b16b59bb38b390e93e25b0d481 Worker IP address: 10.233.96.40 Worker port: 10003 Worker PID: 630 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(raylet) E0216 02:29:24.852307304 158 thd.cc:157] pthread_create failed: Resource temporarily unavailable
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7fa1b2fac20c]
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7fa1b2fac277]
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7fa1b2fac4d8]
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker24HandleGetCoreWorkerStatsENS_3rpc25GetCoreWorkerStatsRequestEPNS2_23GetCoreWorkerStatsReplyESt8functionIFvNS_6StatusES6_IFvvEES9_EE+0x8a9) [0x7fa1b3a99729] ray::core::CoreWorker::HandleGetCoreWorkerStats()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray3rpc14ServerCallImplINS0_24CoreWorkerServiceHandlerENS0_25GetCoreWorkerStatsRequestENS0_23GetCoreWorkerStatsReplyELNS0_8AuthTypeE0EE17HandleRequestImplEb+0x104) [0x7fa1b3a80814] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker12RunIOServiceEv+0x91) [0x7fa1b39c9101] ray::core::CoreWorker::RunIOService()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xd4fe90) [0x7fa1b3e83e90] thread_proxy
(pid=663) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fa1b4ffcac3]
(pid=663) /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fa1b508dbf4] __clone
(pid=663)
(pid=663)
(pid=663)
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7ff704ebd20c]
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7ff704ebd277]
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7ff704ebd4d8]
(pid=624) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff706f0dac3]
(pid=624)
(pid=624)
(pid=624)
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f49a14cf20c]
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f49a14cf277]
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f49a14cf4d8]
(pid=634) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f49a351fac3]
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
root@vllm-worker-6dd66f997f-pj6g4:~# ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 8253291
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
🐛 Describe the bug
root@vllm-worker-6dd66f997f-pj6g4:~# ulimit -a
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 8253291
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
root@vllm-worker-6dd66f997f-pj6g4:/vllm-workspace# vllm serve /vllm-workspace/deepseek-r1 --served-model-name deepseek-r1 --enable-prefix-caching --max-model-len 4096 --gpu-memory-utilization 0.95 --tensor-parallel-size 8 --pipeline-parallel-size 2 --enable-chunked-prefill --trust-remote-code --port 8000
INFO 02-16 02:29:16 init.py:183] Automatically detected platform cuda.
INFO 02-16 02:29:16 api_server.py:838] vLLM API server version 0.7.1
INFO 02-16 02:29:16 api_server.py:839] args: Namespace(subparser='serve', model_tag='/vllm-workspace/deepseek-r1', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='/vllm-workspace/deepseek-r1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=4096, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, pipeline_parallel_size=2, tensor_parallel_size=8, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=True, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['deepseek-r1'], qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function serve at 0x7f1254b0d3a0>)
INFO 02-16 02:29:16 config.py:135] Replacing legacy 'type' key with 'rope_type'
INFO 02-16 02:29:20 config.py:526] This model supports multiple tasks: {'score', 'reward', 'generate', 'classify', 'embed'}. Defaulting to 'generate'.
INFO 02-16 02:29:21 config.py:1383] Defaulting to use ray for distributed inference
INFO 02-16 02:29:21 config.py:1538] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 02-16 02:29:21 config.py:653] Async output processing can not be enabled with pipeline parallel
WARNING 02-16 02:29:21 fp8.py:50] Detected fp8 checkpoint. Please note that the format is experimental and subject to change.
INFO 02-16 02:29:21 config.py:3257] MLA is enabled; forcing chunked prefill and prefix caching to be disabled.
INFO 02-16 02:29:21 llm_engine.py:232] Initializing a V0 LLM engine (v0.7.1) with config: model='/vllm-workspace/deepseek-r1', speculative_config=None, tokenizer='/vllm-workspace/deepseek-r1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=2, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=deepseek-r1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
2025-02-16 02:29:21,888 INFO worker.py:1654 -- Connecting to existing Ray cluster at address: vllm-head-service.maas.svc.cluster.local:6379...
2025-02-16 02:29:21,931 INFO worker.py:1832 -- Connected to Ray cluster. View the dashboard at http://10.233.92.44:8265/
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,228 E 630 630] logging.cc:108: Unhandled exception: N5boost10wrapexceptINS_6system12system_errorEEE. what(): thread: Resource temporarily unavailable [system:11]
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,258 E 630 630] logging.cc:115: Stack trace:
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x11f785a) [0x7fcfdbab985a] ray::operator<<()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x11fadf2) [0x7fcfdbabcdf2] ray::TerminateHandler()
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7fcfda73a20c]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7fcfda73a277]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7fcfda73a4d8]
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x6b74c0) [0x7fcfdaf794c0] boost::throw_exception<>()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1271c2b) [0x7fcfdbb33c2b] boost::asio::detail::do_throw_error()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x127264b) [0x7fcfdbb3464b] boost::asio::detail::posix_thread::start_thread()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1272aac) [0x7fcfdbb34aac] boost::asio::thread_pool::thread_pool()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc87a14) [0x7fcfdb549a14] ray::rpc::(anonymous namespace)::_GetServerCallExecutor()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray3rpc21GetServerCallExecutorEv+0x9) [0x7fcfdb549aa9] ray::rpc::GetServerCallExecutor()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(ZNSt17_Function_handlerIFvN3ray6StatusESt8functionIFvvEES4_EZNS0_3rpc14ServerCallImplINS6_24CoreWorkerServiceHandlerENS6_15PushTaskRequestENS6_13PushTaskReplyELNS6_8AuthTypeE0EE17HandleRequestImplEbEUlS1_S4_S4_E0_E9_M_invokeERKSt9_Any_dataOS1_OS4_SJ+0x12b) [0x7fcfdb1e141b] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x967e36) [0x7fcfdb229e36] ray::core::TaskReceiver::HandleTask()::{lambda()https://github.com/vllm-project/vllm/pull/1}::operator()()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x968e5a) [0x7fcfdb22ae5a] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x970192) [0x7fcfdb232192] ray::core::InboundRequest::Accept()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x98c7cd) [0x7fcfdb24e7cd] ray::core::NormalSchedulingQueue::ScheduleRequests()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc9e728) [0x7fcfdb560728] EventTracker::RecordExecution()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc996fe) [0x7fcfdb55b6fe] std::_Function_handler<>::_M_invoke()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xc99b76) [0x7fcfdb55bb76] boost::asio::detail::completion_handler<>::do_complete()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x126f2bb) [0x7fcfdbb312bb] boost::asio::detail::scheduler::do_run_one()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1270c39) [0x7fcfdbb32c39] boost::asio::detail::scheduler::run()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x1271342) [0x7fcfdbb33342] boost::asio::io_context::run()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker20RunTaskExecutionLoopEv+0x117) [0x7fcfdb171407] ray::core::CoreWorker::RunTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImpl26RunWorkerTaskExecutionLoopEv+0x41) [0x7fcfdb22e971] ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess20RunTaskExecutionLoopEv+0x1d) [0x7fcfdb22eb8d] ray::core::CoreWorkerProcess::RunTaskExecutionLoop()
(bundle_reservation_check_func pid=630) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0x725467) [0x7fcfdafe7467] __pyx_pw_3ray_7_raylet_10CoreWorker_5run_task_loop()
(bundle_reservation_check_func pid=630) ray::IDLE() [0x582f0c]
(bundle_reservation_check_func pid=630) ray::IDLE(PyObject_Vectorcall+0x36) [0x56ca46] PyObject_Vectorcall
(bundle_reservation_check_func pid=630) ray::IDLE(_PyEval_EvalFrameDefault+0x705) [0x553785] _PyEval_EvalFrameDefault
(bundle_reservation_check_func pid=630) ray::IDLE(PyEval_EvalCode+0x99) [0x6261d9] PyEval_EvalCode
(bundle_reservation_check_func pid=630) ray::IDLE() [0x64c93b]
(bundle_reservation_check_func pid=630) ray::IDLE() [0x647bb6]
(bundle_reservation_check_func pid=630) ray::IDLE() [0x65fdf5]
(bundle_reservation_check_func pid=630) ray::IDLE(_PyRun_SimpleFileObject+0x1a5) [0x65f3c5] _PyRun_SimpleFileObject
(bundle_reservation_check_func pid=630) ray::IDLE(_PyRun_AnyFileObject+0x47) [0x65f057] _PyRun_AnyFileObject
(bundle_reservation_check_func pid=630) ray::IDLE(Py_RunMain+0x2e8) [0x658138] Py_RunMain
(bundle_reservation_check_func pid=630) ray::IDLE(Py_BytesMain+0x2d) [0x611aad] Py_BytesMain
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fcfdc71fd90]
(bundle_reservation_check_func pid=630) /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fcfdc71fe40] __libc_start_main
(bundle_reservation_check_func pid=630) ray::IDLE(_start+0x25) [0x611925] _start
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) *** SIGABRT received at time=1739701763 on cpu 167 ***
(bundle_reservation_check_func pid=630) PC: @ 0x7fcfdc78c9fc (unknown) pthread_kill
(bundle_reservation_check_func pid=630) @ 0x7fcfdc738520 (unknown) (unknown)
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: *** SIGABRT received at time=1739701763 on cpu 167 ***
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: PC: @ 0x7fcfdc78c9fc (unknown) pthread_kill
(bundle_reservation_check_func pid=630) [2025-02-16 02:29:23,259 E 630 630] logging.cc:460: @ 0x7fcfdc738520 (unknown) (unknown)
(bundle_reservation_check_func pid=630) Fatal Python error: Aborted
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) Stack (most recent call first):
(bundle_reservation_check_func pid=630) File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 935 in main_loop
(bundle_reservation_check_func pid=630) File "/usr/local/lib/python3.12/dist-packages/ray/_private/workers/default_worker.py", line 297 in
(bundle_reservation_check_func pid=630)
(bundle_reservation_check_func pid=630) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, uvloop.loop, ray._raylet (total: 11)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: b75636ea82a1234d33406a784cbd4744f757ab5801000000 Worker ID: 672b001daae17f75e8772ea31befb22d0ec222eb8347b05a1d6491d9 Node ID: b39138d229dbb2aa8d7b088f3f59c5b16b59bb38b390e93e25b0d481 Worker IP address: 10.233.96.40 Worker port: 10003 Worker PID: 630 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
(raylet) E0216 02:29:24.852307304 158 thd.cc:157] pthread_create failed: Resource temporarily unavailable
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7fa1b2fac20c]
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7fa1b2fac277]
(pid=663) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7fa1b2fac4d8]
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker24HandleGetCoreWorkerStatsENS_3rpc25GetCoreWorkerStatsRequestEPNS2_23GetCoreWorkerStatsReplyESt8functionIFvNS_6StatusES6_IFvvEES9_EE+0x8a9) [0x7fa1b3a99729] ray::core::CoreWorker::HandleGetCoreWorkerStats()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray3rpc14ServerCallImplINS0_24CoreWorkerServiceHandlerENS0_25GetCoreWorkerStatsRequestENS0_23GetCoreWorkerStatsReplyELNS0_8AuthTypeE0EE17HandleRequestImplEb+0x104) [0x7fa1b3a80814] ray::rpc::ServerCallImpl<>::HandleRequestImpl()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(_ZN3ray4core10CoreWorker12RunIOServiceEv+0x91) [0x7fa1b39c9101] ray::core::CoreWorker::RunIOService()
(pid=663) /usr/local/lib/python3.12/dist-packages/ray/_raylet.so(+0xd4fe90) [0x7fa1b3e83e90] thread_proxy
(pid=663) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fa1b4ffcac3]
(pid=663) /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7fa1b508dbf4] __clone
(pid=663)
(pid=663)
(pid=663)
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7ff704ebd20c]
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7ff704ebd277]
(pid=624) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7ff704ebd4d8]
(pid=624) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff706f0dac3]
(pid=624)
(pid=624)
(pid=624)
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7f49a14cf20c]
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7f49a14cf277]
(pid=634) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8) [0x7f49a14cf4d8]
(pid=634) /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f49a351fac3]
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: