Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken sanity tests + Add skip wait option in test teardown in model serving #1157

Merged
merged 23 commits into from
Jan 26, 2024

Conversation

bdattoma
Copy link
Contributor

Fixing:

  • Verify RHODS Admins Can Import A Custom Serving Runtime Template For Each Serving Platform
  • Verify RHODS Users Can Deploy A Model Using A Custom Serving Runtime
  • Verify Model Upgrade Using Canaray Rollout
  • Verify User Can Autoscale Using Concurrency

In addition, the PR is adding the option to skip waiting for project to be deleted in test teardown. The reason is that the project deletion takes very much time in model serving test (OCP takes from 3 to 6 minutes to delete it)


Verify User Can Autoscale Using Concurrency
[Documentation] Checks if model successfully scale up based on concurrency metrics (KPA)
[Tags] Sanity Tier1 ODS-2377
[Setup] Set Project And Runtime namespace=autoscale-con
${test_namespace}= Set Variable autoscale-con
${flan_model_name}= Set Variable flan-t5-small-caikit
${model_name}= Create List ${flan_model_name}
${models_names}= Create List ${flan_model_name}

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create List can be replaced with VAR
@@ -292,6 +292,7 @@
[Documentation] Group together the test steps for preparing, deploying
... and querying a model
[Arguments] ${model_storage_uri} ${model_name} ${isvc_name}=${model_name}
... ${runtime}=caikit-tgis-runtime ${protocol}=grpc ${inference_type}=all-tokens

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
ods_ci/tests/Resources/CLI/ModelServing/llm.resource Fixed Show resolved Hide resolved
ods_ci/tests/Resources/CLI/ModelServing/llm.resource Fixed Show resolved Hide resolved
ods_ci/tests/Resources/CLI/ModelServing/llm.resource Dismissed Show dismissed Hide dismissed
@bdattoma
Copy link
Contributor Author

bdattoma commented Jan 25, 2024

PR validation:

  1. CustomServingRuntime tests + some of Model Serving Sanity rhods-ci-pr-test/2386 PASS
  2. Model Serving Sanity: rhods-ci-pr-test/2394 - 2 partial failures* - think they happen because the model is not fully loaded in memory yet. Needs further investigation (out of this PR)
  3. TrustyAI: rhods-ci-pr-test/2399/ 2 failures related to product build issue. The code patched by this PR ran OK in the second failed test

Dry-run failures are not due to changes in this PR

'* partial failure to investigate:

@bdattoma bdattoma self-assigned this Jan 25, 2024
@bdattoma bdattoma added needs testing Needs to be tested in Jenkins enhancements Bugfixes, enhancements, refactoring, ... in tests or libraries (PR will be listed in release-notes) labels Jan 25, 2024
jgarciao
jgarciao previously approved these changes Jan 26, 2024
ods_ci/tests/Resources/CLI/ModelServing/llm.resource Outdated Show resolved Hide resolved
@@ -580,5 +579,9 @@ Clean Up Test Project
... namespace=${test_ns}
${rc} ${out}= Run And Return Rc And Output oc delete project ${test_ns}
Should Be Equal As Integers ${rc} ${0}
${rc} ${out}= Run And Return Rc And Output oc wait --for=delete namespace ${test_ns} --timeout=300s
Should Be Equal As Integers ${rc} ${0}
IF ${wait_prj_deletion}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of keywords deleting projects, for example Delete Data Science Project From CLI in ods_ci/tests/Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/Projects.resource

Could you consider enhancing the existing one for your purposes in another PR?

Copy link
Contributor Author

@bdattoma bdattoma Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that while reviewing the code. I think the main difference is that the first one you mention handle DS Projects, while in this case we're handling basic OCP projects.

I didn't apply enhancements for now, need to thinki a bit more about how to implement it, but I agree

@@ -276,6 +274,24 @@
Fail msg=comparison between expected and actual failed, ${list}
END

Verify Model Inference With Retries

Check warning

Code scanning / Robocop

Keyword '{{ keyword_name }}' has too many arguments ({{ arguments_count }}/{{ max_allowed_count }}) Warning test

Keyword 'Verify Model Inference With Retries' has too many arguments (10/5)
... timing: model not ready to reply yet, despite the pod is up and running and the
... endpoint exposed.
... This is a temporary mitigation meanwhile we find a better way to check the model
[Arguments] ${model_name} ${inference_input} ${expected_inference_output}

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
Copy link
Contributor

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass %
405 0 0 405 100

... endpoint exposed.
... This is a temporary mitigation meanwhile we find a better way to check the model
[Arguments] ${model_name} ${inference_input} ${expected_inference_output}
... ${token_auth}=${FALSE} ${project_title}=${NONE} ${retries}=${5}

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
Copy link
Member

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there are some small things to be fixed. Otherwise LGTM from my restricted knowledge point of view.

@bdattoma bdattoma requested review from jstourac and jgarciao January 26, 2024 15:16
Copy link
Member

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment, please let me know if you want to touch it (note it needs a change few lines above too). Otherwise I'll approve.

@bdattoma bdattoma added verified This PR has been tested with Jenkins and removed needs testing Needs to be tested in Jenkins labels Jan 26, 2024
@bdattoma bdattoma requested a review from jstourac January 26, 2024 15:23

Verify User Can Validate Scale To Zero
[Documentation] Checks if model successfully scale down to 0 if there's no traffic
[Tags] Sanity Tier1 ODS-2379
[Setup] Set Project And Runtime namespace=autoscale-zero
${flan_model_name}= Set Variable flan-t5-small-caikit
${model_name}= Create List ${flan_model_name}
${models_names}= Create List ${flan_model_name}

Check notice

Code scanning / Robocop

{{ create_keyword }} can be replaced with VAR Note test

Create List can be replaced with VAR
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@jgarciao jgarciao merged commit 29b646f into red-hat-data-services:master Jan 26, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancements Bugfixes, enhancements, refactoring, ... in tests or libraries (PR will be listed in release-notes) verified This PR has been tested with Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants