Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for vllm model deployment from UI #1531

Merged
merged 3 commits into from
Jun 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
${FLAN_STORAGE_URI}= s3://${S3.BUCKET_3.NAME}/${FLAN_MODEL_S3_DIR}/
${TGIS_RUNTIME_NAME}= tgis-runtime
@{SEARCH_METRICS}= tgi_ istio_

${VLLM_RUNTIME_NAME}= vllm-runtime

*** Test Cases ***
Verify Non Admin Can Serve And Query A Model Using The UI # robocop: disable
Expand Down Expand Up @@ -78,6 +78,27 @@
... namespace=${test_namespace} protocol=grpc validate_response=${FALSE}
Delete Model Via UI ${isvc__name}

Verify Model Can Be Served And Query On A GPU Node Using The UI For VLMM # robocop: disable
[Documentation] Basic tests for preparing, deploying and querying a LLM model on GPU node
... using Single-model platform with vllm runtime.
[Tags] Sanity Tier1 RHOAIENG-6344 Resources-GPU
${test_namespace}= Set Variable ${TEST_NS}
${isvc__name}= Set Variable gpt2-gpu
${model_name}= Set Variable gpt2

Check notice

Code scanning / Robocop

Variable '{{ name }}' is assigned but not used Note test

Variable '${model_name}' is assigned but not used
${requests}= Create Dictionary nvidia.com/gpu=1
Fixed Show fixed Hide fixed
${limits}= Create Dictionary nvidia.com/gpu=1
Fixed Show fixed Hide fixed
Deploy Kserve Model Via UI model_name=${isvc__name} serving_runtime=vLLM ServingRuntime for KServe
... data_connection=kserve-connection model_framework=vLLM path=gpt2
... no_gpus=${1}
Wait For Model KServe Deployment To Be Ready label_selector=serving.kserve.io/inferenceservice=${isvc__name}
... namespace=${test_namespace} runtime=${VLLM_RUNTIME_NAME} timeout=1200s
Container Hardware Resources Should Match Expected container_name=kserve-container
... pod_label_selector=serving.kserve.io/inferenceservice=${isvc__name}
... namespace=${test_namespace} exp_requests=${requests} exp_limits=${limits}
Query Model Multiple Times model_name=${isvc__name} isvc_name=${isvc__name} runtime=${VLLM_RUNTIME_NAME} protocol=http

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (133/120)
... inference_type=chat-completions n_times=3 query_idx=8
... namespace=${test_namespace} string_check_only=${TRUE} validate_response=${FALSE}
Delete Model Via UI ${isvc__name}

*** Keywords ***
Non-Admin Setup Kserve UI Test
Expand Down
Loading