Add support for embedded endpoint for vllm #1511

tarukumar · 2024-06-07T08:25:48Z

The model which support generation text are not supported for embedding. If model architectures is a XXXForCausalLM. CausalLM means that it generates text. It should not be used for embedding. If model architectures is a XXModel. This means that the model just generates embeddings. It should be used for embeddings.

Signed-off-by: Tarun Kumar <[email protected]>

...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

@@ -593,6 +593,47 @@
    ...    AND
    ...    Run Keyword If    "${KSERVE_MODE}"=="RawDeployment"    Terminate Process    llm-query-process    kill=true

+Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

@@ -593,6 +593,47 @@
    ...    AND
    ...    Run Keyword If    "${KSERVE_MODE}"=="RawDeployment"    Terminate Process    llm-query-process    kill=true

+Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    Set Project And Runtime    runtime=${RUNTIME_NAME}     namespace=${test_namespace}
+    ...    download_in_pvc=${DOWNLOAD_IN_PVC}    model_name=${model_name}    protocol=${PROTOCOL}
+    ...    storage_size=40Gi    model_path=${model_path}
+    ${requests}=    Create Dictionary    memory=20Gi


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    ...    storage_size=40Gi    model_path=${model_path}
+    ${requests}=    Create Dictionary    memory=20Gi
+    IF    "${OVERLAY}" != "${EMPTY}"
+          ${overlays}=   Create List    ${OVERLAY}


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    IF    "${OVERLAY}" != "${EMPTY}"
+          ${overlays}=   Create List    ${OVERLAY}
+    ELSE
+          ${overlays}=   Create List


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    ...    namespace=${test_namespace}
+    Wait For Model KServe Deployment To Be Ready    label_selector=serving.kserve.io/inferenceservice=${model_name}
+    ...    namespace=${test_namespace}    runtime=${RUNTIME_NAME}    timeout=900s
+    ${pod_name}=  Get Pod Name    namespace=${test_namespace}    label_selector=serving.kserve.io/inferenceservice=${model_name}


...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    Wait For Model KServe Deployment To Be Ready    label_selector=serving.kserve.io/inferenceservice=${model_name}
+    ...    namespace=${test_namespace}    runtime=${RUNTIME_NAME}    timeout=900s
+    ${pod_name}=  Get Pod Name    namespace=${test_namespace}    label_selector=serving.kserve.io/inferenceservice=${model_name}
+    Run Keyword If    "${KSERVE_MODE}"=="RawDeployment"


github-actions · 2024-06-07T08:29:24Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %
475	0	0	475	100

lugi0 · 2024-06-07T08:49:19Z

...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

+    Run Keyword If    "${KSERVE_MODE}"=="RawDeployment"
+    ...    Start Port-forwarding    namespace=${test_namespace}    pod_name=${pod_name}
+    IF     "${RUNTIME_NAME}" == "tgis-runtime" or "${KSERVE_MODE}" == "RawDeployment"
+            Skip   msg=Embedding endpoint is not supported for tgis as well as model architectures with "XXModel"


What's the point of doing this? Wouldn't it make more sense to skip the test entirely if it's RawDeployment or using tgis?
If I understand the goal of this correctly it's to test the embeddings endpoint, so IMHO there's no benefit to deploying the model regardless and then skipping the query if it doesn't support the endpoint under test

I included this step to ensure smooth model deployment, whether it's in raw mode or otherwise. That's why I incorporated the skip part into the query section rather than at the initial stage.

Signed-off-by: Tarun Kumar <[email protected]>

...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot

Signed-off-by: Tarun Kumar <[email protected]>

sonarqubecloud · 2024-06-11T06:04:11Z

Quality Gate failed

Failed conditions
9.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Add support for embedded endpoint

12e64b2

Signed-off-by: Tarun Kumar <[email protected]>

tarukumar self-assigned this Jun 7, 2024

tarukumar added verified This PR has been tested with Jenkins new test New test(s) added (PR will be listed in release-notes) labels Jun 7, 2024

github-advanced-security bot found potential problems Jun 7, 2024

View reviewed changes

lugi0 reviewed Jun 7, 2024

View reviewed changes

Add msg

89c5fc3

Signed-off-by: Tarun Kumar <[email protected]>

bdattoma reviewed Jun 7, 2024

View reviewed changes

...i/tests/Tests/400__ods_dashboard/420__model_serving/LLMs/422__model_serving_llm_models.robot Outdated Show resolved Hide resolved

modify tag

fa51ff9

Signed-off-by: Tarun Kumar <[email protected]>

tarukumar requested review from lugi0 and bdattoma June 7, 2024 09:19

bdattoma previously approved these changes Jun 7, 2024

View reviewed changes

mwaykole and others added 2 commits June 7, 2024 17:01

Merge branch 'master' into test-sa

dc449d2

Merge branch 'master' into test-sa

6a05fa8

tarukumar dismissed bdattoma’s stale review via 6a05fa8 June 7, 2024 12:57

mwaykole approved these changes Jun 11, 2024

View reviewed changes

aloganat approved these changes Jun 11, 2024

View reviewed changes

Merge branch 'master' into test-sa

d4daa80

tarukumar merged commit 6dfb73a into red-hat-data-services:master Jun 11, 2024
5 of 6 checks passed

jgarciao pushed a commit to jgarciao/ods-ci that referenced this pull request Jun 12, 2024

Add support for embedded endpoint for vllm (red-hat-data-services#1511)

6f3215e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for embedded endpoint for vllm #1511

Add support for embedded endpoint for vllm #1511

tarukumar commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024 •

edited

Loading

lugi0 Jun 7, 2024

tarukumar Jun 7, 2024

sonarqubecloud bot commented Jun 11, 2024

Add support for embedded endpoint for vllm #1511

Add support for embedded endpoint for vllm #1511

Conversation

tarukumar commented Jun 7, 2024 • edited Loading

github-actions bot commented Jun 7, 2024 • edited Loading

Robot Results

lugi0 Jun 7, 2024

Choose a reason for hiding this comment

tarukumar Jun 7, 2024

Choose a reason for hiding this comment

sonarqubecloud bot commented Jun 11, 2024

Quality Gate failed

tarukumar commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024 •

edited

Loading