Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: llama_load_model_from_url split support #6192

Merged
merged 15 commits into from
Mar 23, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
server: tests: add split tests, and HF options params
  • Loading branch information
phymbert committed Mar 23, 2024
commit b4a2ed85853b081cc553181475af69fe4c8e90e3
3 changes: 2 additions & 1 deletion examples/server/tests/features/parallel.feature
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ Feature: Parallel

Background: Server startup
Given a server listening on localhost:8080
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
And a model file test-model-00001-of-00003.gguf
And 42 as server seed
And 128 as batch size
And 256 KV cache size
Expand Down
4 changes: 2 additions & 2 deletions examples/server/tests/features/server.feature
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Feature: llama.cpp server

Background: Server startup
Given a server listening on localhost:8080
And a model url https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf
And a model file stories260K.gguf
And a model file tinyllamas/stories260K.gguf from HF repo ggml-org/models
And a model file test-model.gguf
And a model alias tinyllama-2
And 42 as server seed
# KV Cache corresponds to the total amount of tokens
Expand Down
13 changes: 9 additions & 4 deletions examples/server/tests/features/steps/steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
import openai
from behave import step
from behave.api.async_step import async_run_until_complete
from huggingface_hub import hf_hub_download
from prometheus_client import parser


Expand All @@ -39,6 +38,8 @@ def step_server_config(context, server_fqdn, server_port):

context.model_alias = None
context.model_file = None
context.model_hf_repo = None
context.model_hf_file = None
context.model_url = None
context.n_batch = None
context.n_ubatch = None
Expand Down Expand Up @@ -68,9 +69,9 @@ def step_server_config(context, server_fqdn, server_port):

@step('a model file {hf_file} from HF repo {hf_repo}')
def step_download_hf_model(context, hf_file, hf_repo):
context.model_file = hf_hub_download(repo_id=hf_repo, filename=hf_file)
if context.debug:
print(f"model file: {context.model_file}")
context.model_hf_repo = hf_repo
context.model_hf_file = hf_file
context.model_file = os.path.basename(hf_file)


@step('a model file {model_file}')
Expand Down Expand Up @@ -1079,6 +1080,10 @@ def start_server_background(context):
server_args.extend(['--model', context.model_file])
if context.model_url:
server_args.extend(['--model-url', context.model_url])
if context.model_hf_repo:
server_args.extend(['--hf-repo', context.model_hf_repo])
if context.model_hf_file:
server_args.extend(['--hf-file', context.model_hf_file])
if context.n_batch:
server_args.extend(['--batch-size', context.n_batch])
if context.n_ubatch:
Expand Down
Loading