Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for embedded endpoint for vllm #1511

Merged
merged 6 commits into from
Jun 11, 2024
Merged

Add support for embedded endpoint for vllm #1511

merged 6 commits into from
Jun 11, 2024

Conversation

tarukumar
Copy link
Contributor

@tarukumar tarukumar commented Jun 7, 2024

The model which support generation text are not supported for embedding. If model architectures is a XXXForCausalLM. CausalLM means that it generates text. It should not be used for embedding. If model architectures is a XXModel. This means that the model just generates embeddings. It should be used for embeddings.

@tarukumar tarukumar self-assigned this Jun 7, 2024
@tarukumar tarukumar added verified This PR has been tested with Jenkins new test New test(s) added (PR will be listed in release-notes) labels Jun 7, 2024
@@ -593,6 +593,47 @@
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process llm-query-process kill=true

Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model

Check warning

Code scanning / Robocop

Test case '{{ test_name }}' is too long ({{ test_length }}/{{ allowed_length }}) Warning test

Test case 'Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model' is too long (40/20)
@@ -593,6 +593,47 @@
... AND
... Run Keyword If "${KSERVE_MODE}"=="RawDeployment" Terminate Process llm-query-process kill=true

Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model

Check warning

Code scanning / Robocop

Test case '{{ test_name }}' has too many keywords inside ({{ keyword_count }}/{{ max_allowed_count }}) Warning test

Test case 'Verify User Can Serve And Query A intfloat/e5-mistral-7b-instruct Model' has too many keywords inside (12/10)
Set Project And Runtime runtime=${RUNTIME_NAME} namespace=${test_namespace}
... download_in_pvc=${DOWNLOAD_IN_PVC} model_name=${model_name} protocol=${PROTOCOL}
... storage_size=40Gi model_path=${model_path}
${requests}= Create Dictionary memory=20Gi

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create Dictionary can be replaced with VAR
... storage_size=40Gi model_path=${model_path}
${requests}= Create Dictionary memory=20Gi
IF "${OVERLAY}" != "${EMPTY}"
${overlays}= Create List ${OVERLAY}

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create List can be replaced with VAR
IF "${OVERLAY}" != "${EMPTY}"
${overlays}= Create List ${OVERLAY}
ELSE
${overlays}= Create List

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create List can be replaced with VAR
... namespace=${test_namespace}
Wait For Model KServe Deployment To Be Ready label_selector=serving.kserve.io/inferenceservice=${model_name}
... namespace=${test_namespace} runtime=${RUNTIME_NAME} timeout=900s
${pod_name}= Get Pod Name namespace=${test_namespace} label_selector=serving.kserve.io/inferenceservice=${model_name}

Check warning

Code scanning / Robocop

Line is too long ({{ line_length }}/{{ allowed_length }}) Warning test

Line is too long (128/120)
Wait For Model KServe Deployment To Be Ready label_selector=serving.kserve.io/inferenceservice=${model_name}
... namespace=${test_namespace} runtime=${RUNTIME_NAME} timeout=900s
${pod_name}= Get Pod Name namespace=${test_namespace} label_selector=serving.kserve.io/inferenceservice=${model_name}
Run Keyword If "${KSERVE_MODE}"=="RawDeployment"

Check warning

Code scanning / Robocop

'{{ statement_name }}' is deprecated since Robot Framework version {{ version }}, use '{{ alternative }}' instead Warning test

'Run Keyword If' is deprecated since Robot Framework version 5.*, use 'IF' instead
Copy link
Contributor

github-actions bot commented Jun 7, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass %
475 0 0 475 100

Comment on lines +622 to +625
Run Keyword If "${KSERVE_MODE}"=="RawDeployment"
... Start Port-forwarding namespace=${test_namespace} pod_name=${pod_name}
IF "${RUNTIME_NAME}" == "tgis-runtime" or "${KSERVE_MODE}" == "RawDeployment"
Skip msg=Embedding endpoint is not supported for tgis as well as model architectures with "XXModel"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of doing this? Wouldn't it make more sense to skip the test entirely if it's RawDeployment or using tgis?
If I understand the goal of this correctly it's to test the embeddings endpoint, so IMHO there's no benefit to deploying the model regardless and then skipping the query if it doesn't support the endpoint under test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this step to ensure smooth model deployment, whether it's in raw mode or otherwise. That's why I incorporated the skip part into the query section rather than at the initial stage.

Signed-off-by: Tarun Kumar <[email protected]>
Signed-off-by: Tarun Kumar <[email protected]>
@tarukumar tarukumar requested review from lugi0 and bdattoma June 7, 2024 09:19
bdattoma
bdattoma previously approved these changes Jun 7, 2024
@tarukumar tarukumar merged commit 6dfb73a into red-hat-data-services:master Jun 11, 2024
5 of 6 checks passed
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
9.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

jgarciao pushed a commit to jgarciao/ods-ci that referenced this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new test New test(s) added (PR will be listed in release-notes) verified This PR has been tested with Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants