-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for embedded endpoint for vllm #1511
Conversation
Signed-off-by: Tarun Kumar <[email protected]>
@@ -593,6 +593,47 @@ | |||
... AND | |||
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process llm-query-process kill=true | |||
|
|||
Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model |
Check warning
Code scanning / Robocop
Test case '{{ test_name }}' is too long ({{ test_length }}/{{ allowed_length }}) Warning test
@@ -593,6 +593,47 @@ | |||
... AND | |||
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process llm-query-process kill=true | |||
|
|||
Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model |
Check warning
Code scanning / Robocop
Test case '{{ test_name }}' has too many keywords inside ({{ keyword_count }}/{{ max_allowed_count }}) Warning test
Set Project And Runtime runtime=${RUNTIME_NAME} namespace=${test_namespace} | ||
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${model_name} protocol=${PROTOCOL} | ||
... storage_size=40Gi model_path=${model_path} | ||
${requests}= Create Dictionary memory=20Gi |
Check notice
Code scanning / Robocop
{{ create_keyword }} can be replaced with VAR Note test
... storage_size=40Gi model_path=${model_path} | ||
${requests}= Create Dictionary memory=20Gi | ||
IF "${OVERLAY}" != "${EMPTY}" | ||
${overlays}= Create List ${OVERLAY} |
Check notice
Code scanning / Robocop
{{ create_keyword }} can be replaced with VAR Note test
IF "${OVERLAY}" != "${EMPTY}" | ||
${overlays}= Create List ${OVERLAY} | ||
ELSE | ||
${overlays}= Create List |
Check notice
Code scanning / Robocop
{{ create_keyword }} can be replaced with VAR Note test
... namespace=${test_namespace} | ||
Wait For Model KServe Deployment To Be Ready label_selector=serving.kserve.io/inferenceservice=${model_name} | ||
... namespace=${test_namespace} runtime=${RUNTIME_NAME} timeout=900s | ||
${pod_name}= Get Pod Name namespace=${test_namespace} label_selector=serving.kserve.io/inferenceservice=${model_name} |
Check warning
Code scanning / Robocop
Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test
Wait For Model KServe Deployment To Be Ready label_selector=serving.kserve.io/inferenceservice=${model_name} | ||
... namespace=${test_namespace} runtime=${RUNTIME_NAME} timeout=900s | ||
${pod_name}= Get Pod Name namespace=${test_namespace} label_selector=serving.kserve.io/inferenceservice=${model_name} | ||
Run Keyword If "${KSERVE_MODE}"=="RawDeployment" |
Check warning
Code scanning / Robocop
'{{ statement_name }}' is deprecated since Robot Framework version {{ version }}, use '{{ alternative }}' instead Warning test
Robot Results
|
Run Keyword If "${KSERVE_MODE}"=="RawDeployment" | ||
... Start Port-forwarding namespace=${test_namespace} pod_name=${pod_name} | ||
IF "${RUNTIME_NAME}" == "tgis-runtime" or "${KSERVE_MODE}" == "RawDeployment" | ||
Skip msg=Embedding endpoint is not supported for tgis as well as model architectures with "XXModel" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of doing this? Wouldn't it make more sense to skip the test entirely if it's RawDeployment or using tgis?
If I understand the goal of this correctly it's to test the embeddings endpoint, so IMHO there's no benefit to deploying the model regardless and then skipping the query if it doesn't support the endpoint under test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included this step to ensure smooth model deployment, whether it's in raw mode or otherwise. That's why I incorporated the skip part into the query section rather than at the initial stage.
Signed-off-by: Tarun Kumar <[email protected]>
...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot
Outdated
Show resolved
Hide resolved
Signed-off-by: Tarun Kumar <[email protected]>
Quality Gate failedFailed conditions |
The model which support generation text are not supported for embedding. If model architectures is a XXXForCausalLM. CausalLM means that it generates text. It should not be used for embedding. If model architectures is a XXModel. This means that the model just generates embeddings. It should be used for embeddings.