b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

purp1e-ace · 2025-02-08T08:05:13Z

I am encountering a persistent issue when attempting to serve a model from an S3 bucket using the vllm serve command with the --load-format runai_streamer option. Despite having proper access to the S3 bucket and all required files being present, the process fails with a b'S3 not supported'. Below are the details of the issue:

Command Used:
AWS_ACCESS_KEY_ID=my_ak AWS_SECRET_ACCESS_KEY=my_sk python -m vllm.entrypoints.openai.api_server --model s3://ai-peta-model-storage-bucket/model/qwen/Qwen2.5-1.5B-Instruct/main --load-format runai_streamer --dtype half

Error Message:

Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/python3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 909, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/python3/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/python3/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 873, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/local/python3/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 134, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/local/python3/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 228, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Environment Details:
VLLM version: 0.7.1
Python version: 3.10
RunAI Model Streamer version: 0.12.0

The text was updated successfully, but these errors were encountered:

omer-dayan · 2025-02-08T10:01:52Z

Hey @purp1e-ace !

Though not entirely specified in the error what cause the engine process to fail, we have had issue with libz in version 0.12.0 (#32).

We have had removed this version from PyPi, and now the latest version is 0.11.2.

Please try to install it.

purp1e-ace · 2025-02-10T06:48:01Z

Hey @purp1e-ace !

Though not entirely specified in the error what cause the engine process to fail, we have had issue with libz in version 0.12.0 (#32).

We have had removed this version from PyPi, and now the latest version is 0.11.2.

Please try to install it.

I found that this happened because the linker failed to find libstreamers3.so because zlib is not installed, which is required by libstreamers3.so. So I solved this by install zlib. But after that, I got another error similar to #28 .
Here is my command:
AWS_ACCESS_KEY_ID=my_ak AWS_SECRET_ACCESS_KEY=my_sk AWS_ENDPOINT_URL=my_endpoint RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true python -m vllm.entrypoints.openai.api_server --model s3://ai-peta-model-storage-bucket/model/qwen/Qwen2.5-1.5B-Instruct/main --load-format runai_streamer --dtype half
Here is the error:

INFO 02-10 14:36:47 model_runner.py:1111] Starting to load model /tmp/tmpvtpqhlbs...
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine
    raise e
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
    return cls(ipc_path=ipc_path,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in __init__
    self.engine = LLMEngine(*args, **kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 271, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 49, in __init__
    self._init_executor()
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 40, in _init_executor
    self.collective_rpc("load_model")
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/utils.py", line 2208, in run_method
    return func(*args, **kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/worker/worker.py", line 182, in load_model
    self.model_runner.load_model()
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1113, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 1376, in load_model
    model.load_weights(
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 513, in load_weights
    return loader.load_weights(weights)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 233, in load_weights
    autoloaded_weights = set(self._load_module("", self.module, weights))
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 185, in _load_module
    for child_prefix, child_weights in self._groupby_prefix(weights):
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 99, in _groupby_prefix
    for prefix, group in itertools.groupby(weights_by_parts,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 96, in <genexpr>
    weights_by_parts = ((weight_name.split(".", 1), weight_data)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 436, in runai_safetensors_weights_iterator
    streamer.stream_file(st_file)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_streamer.py", line 37, in stream_file
    safetensors_pytorch.prepare_request(self.file_streamer, path)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 101, in prepare_request
    safetensors_metadata = SafetensorsMetadata.from_file(fs, path)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 56, in from_file
    header_size_buffer = fs.read_file(filename, 0, SAFETENSORS_HEADER_BUFFER_SIZE)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 38, in read_file
    runai_read(self.streamer, path, offset, len, dst_buffer)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/libstreamer/libstreamer.py", line 39, in runai_read
    raise Exception(
Exception: Could not send runai_request to libstreamer due to: b'File access error'

purp1e-ace · 2025-02-10T07:58:13Z

I got it. The vllm document here Loading models with Run:ai Model Streamer is incorrect. When you trying to run model from a S3 compatible object store, you should set env RUNAI_STREAMER_S3_ENDPOINT instead of AWS_ENDPOINT_URL as mentioned in vllm document.
Problem solved.

noa-neria · 2025-02-10T13:04:30Z

@purp1e-ace thanks for the update!

Are you reading from AWS S3 bucket or from another object storage provider such as GCS (google cloud) or Minio?

If reading from AWS S3, there is no need to specify any of the flags AWS_ENDPOINT_URL=my_endpoint RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true
Those flags are needed only for non AWS e.g. the default endpoint for Google Cloud Storage is https://storage.googleapis.com

The flag RUNAI_STREAMER_MEMORY_LIMIT is an optional limit for the number of bytes allocated by the Streamer. If not specified, the Streamer tries to allocate a RAM buffer in the total size of the model weights.
RUNAI_STREAMER_MEMORY_LIMIT is a general flag of the Streamer, not related to the object storage. If you don't have sufficient RAM or the model size is too big, configuring this limit should solve the problem.

You can both read from compatible S3 storage like GCS with the appropriate flags and use the memory limit
e.g. by running

RUNAI_STREAMER_MEMORY_LIMIT=8388608000 RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true AWS_ENDPOINT_URL=https://storage.googleapis.com vllm serve s3://core-llm/Llama-3-8b --load-format runai_streamer

or by running

RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true AWS_ENDPOINT_URL=https://storage.googleapis.com vllm serve s3://core-llm/Llama-3-8b --load-format runai_streamer --model-loader-extra-config '{"memory_limit":8388608000}'

Setting the environment variable RUNAI_STREAMER_MEMORY_LIMIT or vLLM's extra config memory_limit is the same thing

purp1e-ace · 2025-02-10T13:12:58Z

@noa-neria
Sorry there is a typo in previous comment: RUNAI_STREAMER_S3_ENDPOINT instead of RUNAI_STREAMER_MEMORY_LIMIT.
I am reading from another object storage provider of my own company.
I just want to mention in the case where reading from non-AWS source, you need set both AWS_ENDPOINT_URL=my_endpoint and RUNAI_STREAMER_S3_ENDPOINT=my_endpoint. It seems that when searching for s3 files, AWS_ENDPOINT_URL is used, and when streaming the model, RUNAI_STREAMER_S3_ENDPOINT is used.

noa-neria · 2025-02-10T13:37:20Z

Thank you for noting this!

In fact, currently both flags are needed
RUNAI_STREAMER_S3_ENDPOINT for streaming the safetensors files
AWS_ENDPOINT_URL for downloading the model metadata files with boto3 in the vLLM integration code

We will fix this to use a single flag. Sorry for the inconvenience

DellCurry · 2025-02-24T15:21:48Z

hello @noa-neria! sry for disturb on that closed issue. But I am still confused about this same issue.

I try to set both RUNAI_STREAMER_S3_ENDPOINT and AWS_ENDPOINT_URL in vllm, loading models from Minio which is a s3 compatible storage. I tried many times but it still shows Exception: Could not send runai_request to libstreamer due to: b'S3 not supported.

Here is some screenshot about my env and commands:

Hoping for your advice!

noa-neria · 2025-02-24T16:20:34Z

Can you please try with the latest vLLM (and runai) version?
Compilation issue in previous version might be the cause

Also, in latest version RUNAI_STREAMER_S3_ENDPOINT was replaced by AWS_ENDPOINT_URL, so no need to define both

DellCurry · 2025-02-24T16:27:06Z

Can you please try with the latest vLLM (and runai) version? Compilation issue in previous version might be the cause

Also, in latest version RUNAI_STREAMER_S3_ENDPOINT was replaced by AWS_ENDPOINT_URL

Hi @noa-neria, thanks for quick reply! I am using runai with version 0.11.2 and the lastest main branch of vLLM. Is that the question (maybe I need to install runai == 0.12.0)?

For the second, I have noticed that in the lastest version RUNAI_STREAMER_S3_ENDPOINT is automatically filled the same value as AWS_ENDPOINT_URL. Thanks for notice!

noa-neria · 2025-02-24T17:04:39Z

Can you please run pip freeze | grep runai?
The output should be two packages

runai-model-streamer==0.11.2
runai-model-streamer-s3==0.11.2

If both exist, please run with enabled logs RUNAI_STREAMER_LOG_TO_STDERR=1 RUNAI_STREAMER_LOG_LEVEL=DEBUG

DellCurry · 2025-02-25T09:44:58Z

Can you please run pip freeze | grep runai? The output should be two packages
runai-model-streamer==0.11.2
runai-model-streamer-s3==0.11.2
If both exist, please run with enabled logs RUNAI_STREAMER_LOG_TO_STDERR=1 RUNAI_STREAMER_LOG_LEVEL=DEBUG

Thanks for advice! I tried with log, and found I did not satisfy glibc version requirement for libstreams3.so (2.29 or higher is required). I rebuilt the kernel and finally worked well. Thanks a lot!

purp1e-ace closed this as completed Feb 11, 2025

noa-neria mentioned this issue Feb 11, 2025

Download model from Object Storage vllm-project/production-stack#69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

purp1e-ace commented Feb 8, 2025

omer-dayan commented Feb 8, 2025

purp1e-ace commented Feb 10, 2025

purp1e-ace commented Feb 10, 2025 •

edited

Loading

noa-neria commented Feb 10, 2025 •

edited

Loading

purp1e-ace commented Feb 10, 2025 •

edited

Loading

noa-neria commented Feb 10, 2025

DellCurry commented Feb 24, 2025

noa-neria commented Feb 24, 2025 •

edited

Loading

DellCurry commented Feb 24, 2025

noa-neria commented Feb 24, 2025 •

edited

Loading

DellCurry commented Feb 25, 2025

b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

Comments

purp1e-ace commented Feb 8, 2025

omer-dayan commented Feb 8, 2025

purp1e-ace commented Feb 10, 2025

purp1e-ace commented Feb 10, 2025 • edited Loading

noa-neria commented Feb 10, 2025 • edited Loading

purp1e-ace commented Feb 10, 2025 • edited Loading

noa-neria commented Feb 10, 2025

DellCurry commented Feb 24, 2025

noa-neria commented Feb 24, 2025 • edited Loading

DellCurry commented Feb 24, 2025

noa-neria commented Feb 24, 2025 • edited Loading

DellCurry commented Feb 25, 2025

purp1e-ace commented Feb 10, 2025 •

edited

Loading

noa-neria commented Feb 10, 2025 •

edited

Loading

purp1e-ace commented Feb 10, 2025 •

edited

Loading

noa-neria commented Feb 24, 2025 •

edited

Loading

noa-neria commented Feb 24, 2025 •

edited

Loading