Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b'S3 not supported' When Using RunAI Model Streamer with S3 in VLLM #34

Closed
purp1e-ace opened this issue Feb 8, 2025 · 11 comments
Closed

Comments

@purp1e-ace
Copy link

I am encountering a persistent issue when attempting to serve a model from an S3 bucket using the vllm serve command with the --load-format runai_streamer option. Despite having proper access to the S3 bucket and all required files being present, the process fails with a b'S3 not supported'. Below are the details of the issue:

Command Used:
AWS_ACCESS_KEY_ID=my_ak AWS_SECRET_ACCESS_KEY=my_sk python -m vllm.entrypoints.openai.api_server --model s3://ai-peta-model-storage-bucket/model/qwen/Qwen2.5-1.5B-Instruct/main --load-format runai_streamer --dtype half

Error Message:

Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/python3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 909, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/python3/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
    return loop.run_until_complete(wrapper())
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/python3/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 873, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/local/python3/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 134, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/local/python3/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 228, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Environment Details:
VLLM version: 0.7.1
Python version: 3.10
RunAI Model Streamer version: 0.12.0

@omer-dayan
Copy link
Collaborator

Hey @purp1e-ace !

Though not entirely specified in the error what cause the engine process to fail, we have had issue with libz in version 0.12.0 (#32).

We have had removed this version from PyPi, and now the latest version is 0.11.2.

Please try to install it.

@purp1e-ace
Copy link
Author

Hey @purp1e-ace !

Though not entirely specified in the error what cause the engine process to fail, we have had issue with libz in version 0.12.0 (#32).

We have had removed this version from PyPi, and now the latest version is 0.11.2.

Please try to install it.

I found that this happened because the linker failed to find libstreamers3.so because zlib is not installed, which is required by libstreamers3.so. So I solved this by install zlib. But after that, I got another error similar to #28 .
Here is my command:
AWS_ACCESS_KEY_ID=my_ak AWS_SECRET_ACCESS_KEY=my_sk AWS_ENDPOINT_URL=my_endpoint RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true python -m vllm.entrypoints.openai.api_server --model s3://ai-peta-model-storage-bucket/model/qwen/Qwen2.5-1.5B-Instruct/main --load-format runai_streamer --dtype half
Here is the error:

INFO 02-10 14:36:47 model_runner.py:1111] Starting to load model /tmp/tmpvtpqhlbs...
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 389, in run_mp_engine
    raise e
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 378, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 121, in from_engine_args
    return cls(ipc_path=ipc_path,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 73, in __init__
    self.engine = LLMEngine(*args, **kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 271, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 49, in __init__
    self._init_executor()
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 40, in _init_executor
    self.collective_rpc("load_model")
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 49, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/utils.py", line 2208, in run_method
    return func(*args, **kwargs)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/worker/worker.py", line 182, in load_model
    self.model_runner.load_model()
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1113, in load_model
    self.model = get_model(vllm_config=self.vllm_config)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 12, in get_model
    return loader.load_model(vllm_config=vllm_config)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 1376, in load_model
    model.load_weights(
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 513, in load_weights
    return loader.load_weights(weights)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 233, in load_weights
    autoloaded_weights = set(self._load_module("", self.module, weights))
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 185, in _load_module
    for child_prefix, child_weights in self._groupby_prefix(weights):
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 99, in _groupby_prefix
    for prefix, group in itertools.groupby(weights_by_parts,
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 96, in <genexpr>
    weights_by_parts = ((weight_name.split(".", 1), weight_data)
  File "/usr/local/python3/lib/python3.10/site-packages/vllm/model_executor/model_loader/weight_utils.py", line 436, in runai_safetensors_weights_iterator
    streamer.stream_file(st_file)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_streamer.py", line 37, in stream_file
    safetensors_pytorch.prepare_request(self.file_streamer, path)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 101, in prepare_request
    safetensors_metadata = SafetensorsMetadata.from_file(fs, path)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/safetensors_streamer/safetensors_pytorch.py", line 56, in from_file
    header_size_buffer = fs.read_file(filename, 0, SAFETENSORS_HEADER_BUFFER_SIZE)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/file_streamer/file_streamer.py", line 38, in read_file
    runai_read(self.streamer, path, offset, len, dst_buffer)
  File "/usr/local/python3/lib/python3.10/site-packages/runai_model_streamer/libstreamer/libstreamer.py", line 39, in runai_read
    raise Exception(
Exception: Could not send runai_request to libstreamer due to: b'File access error'

@purp1e-ace
Copy link
Author

purp1e-ace commented Feb 10, 2025

I got it. The vllm document here Loading models with Run:ai Model Streamer is incorrect. When you trying to run model from a S3 compatible object store, you should set env RUNAI_STREAMER_S3_ENDPOINT instead of AWS_ENDPOINT_URL as mentioned in vllm document.
Problem solved.

@noa-neria
Copy link
Collaborator

noa-neria commented Feb 10, 2025

@purp1e-ace thanks for the update!

Are you reading from AWS S3 bucket or from another object storage provider such as GCS (google cloud) or Minio?

If reading from AWS S3, there is no need to specify any of the flags AWS_ENDPOINT_URL=my_endpoint RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true
Those flags are needed only for non AWS e.g. the default endpoint for Google Cloud Storage is https://storage.googleapis.com

The flag RUNAI_STREAMER_MEMORY_LIMIT is an optional limit for the number of bytes allocated by the Streamer. If not specified, the Streamer tries to allocate a RAM buffer in the total size of the model weights.
RUNAI_STREAMER_MEMORY_LIMIT is a general flag of the Streamer, not related to the object storage. If you don't have sufficient RAM or the model size is too big, configuring this limit should solve the problem.

You can both read from compatible S3 storage like GCS with the appropriate flags and use the memory limit
e.g. by running

RUNAI_STREAMER_MEMORY_LIMIT=8388608000 RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true AWS_ENDPOINT_URL=https://storage.googleapis.com vllm serve s3://core-llm/Llama-3-8b --load-format runai_streamer

or by running

RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING=0 AWS_EC2_METADATA_DISABLED=true AWS_ENDPOINT_URL=https://storage.googleapis.com vllm serve s3://core-llm/Llama-3-8b --load-format runai_streamer --model-loader-extra-config '{"memory_limit":8388608000}'

Setting the environment variable RUNAI_STREAMER_MEMORY_LIMIT or vLLM's extra config memory_limit is the same thing

@purp1e-ace
Copy link
Author

purp1e-ace commented Feb 10, 2025

@noa-neria
Sorry there is a typo in previous comment: RUNAI_STREAMER_S3_ENDPOINT instead of RUNAI_STREAMER_MEMORY_LIMIT.
I am reading from another object storage provider of my own company.
I just want to mention in the case where reading from non-AWS source, you need set both AWS_ENDPOINT_URL=my_endpoint and RUNAI_STREAMER_S3_ENDPOINT=my_endpoint. It seems that when searching for s3 files, AWS_ENDPOINT_URL is used, and when streaming the model, RUNAI_STREAMER_S3_ENDPOINT is used.

@noa-neria
Copy link
Collaborator

Thank you for noting this!

In fact, currently both flags are needed
RUNAI_STREAMER_S3_ENDPOINT for streaming the safetensors files
AWS_ENDPOINT_URL for downloading the model metadata files with boto3 in the vLLM integration code

We will fix this to use a single flag. Sorry for the inconvenience

@DellCurry
Copy link

hello @noa-neria! sry for disturb on that closed issue. But I am still confused about this same issue.

I try to set both RUNAI_STREAMER_S3_ENDPOINT and AWS_ENDPOINT_URL in vllm, loading models from Minio which is a s3 compatible storage. I tried many times but it still shows Exception: Could not send runai_request to libstreamer due to: b'S3 not supported.

Here is some screenshot about my env and commands:

Image Image

Hoping for your advice!

@noa-neria
Copy link
Collaborator

noa-neria commented Feb 24, 2025

Can you please try with the latest vLLM (and runai) version?
Compilation issue in previous version might be the cause

Also, in latest version RUNAI_STREAMER_S3_ENDPOINT was replaced by AWS_ENDPOINT_URL, so no need to define both

@DellCurry
Copy link

Can you please try with the latest vLLM (and runai) version? Compilation issue in previous version might be the cause

Also, in latest version RUNAI_STREAMER_S3_ENDPOINT was replaced by AWS_ENDPOINT_URL

Hi @noa-neria, thanks for quick reply! I am using runai with version 0.11.2 and the lastest main branch of vLLM. Is that the question (maybe I need to install runai == 0.12.0)?

For the second, I have noticed that in the lastest version RUNAI_STREAMER_S3_ENDPOINT is automatically filled the same value as AWS_ENDPOINT_URL. Thanks for notice!

@noa-neria
Copy link
Collaborator

noa-neria commented Feb 24, 2025

Can you please run pip freeze | grep runai?
The output should be two packages

runai-model-streamer==0.11.2
runai-model-streamer-s3==0.11.2

If both exist, please run with enabled logs RUNAI_STREAMER_LOG_TO_STDERR=1 RUNAI_STREAMER_LOG_LEVEL=DEBUG

@DellCurry
Copy link

Can you please run pip freeze | grep runai? The output should be two packages

runai-model-streamer==0.11.2
runai-model-streamer-s3==0.11.2

If both exist, please run with enabled logs RUNAI_STREAMER_LOG_TO_STDERR=1 RUNAI_STREAMER_LOG_LEVEL=DEBUG

Thanks for advice! I tried with log, and found I did not satisfy glibc version requirement for libstreams3.so (2.29 or higher is required). I rebuilt the kernel and finally worked well. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants