Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken sanity tests + Add skip wait option in test teardown in model serving #1157

Merged
merged 23 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions ods_ci/tests/Resources/CLI/ModelServing/llm.resource
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@
${BUCKET_SECRET_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/bucket_secret.yaml
${BUCKET_SA_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/bucket_sa.yaml
${USE_BUCKET_HTTPS}= "1"
${UWM_ENABLE_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/uwm_cm_enable.yaml
${UWM_CONFIG_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/uwm_cm_conf.yaml
${MODELS_BUCKET}= ${S3.BUCKET_3}
${SERVICEMESH_CR_NS}= istio-system
&{RUNTIME_FLIEPATHS}= caikit-tgis-runtime=${LLM_RESOURCES_DIRPATH}/serving_runtimes/caikit_tgis_servingruntime_{{protocol}}.yaml
Expand Down Expand Up @@ -62,8 +60,8 @@
... region=${MODELS_BUCKET.REGION} namespace=${namespace}
Deploy Serving Runtime namespace=${namespace} runtime=${runtime} protocol=${protocol}
IF ${enable_metrics} == ${TRUE}
Oc Apply kind=ConfigMap src=${UWM_CONFIG_FILEPATH}
Oc Apply kind=ConfigMap src=${UWM_ENABLE_FILEPATH}
Enable User Workload Monitoring
Configure User Workload Monitoring
ELSE
Log message=Skipping UserWorkloadMonitoring enablement.
END
Expand Down Expand Up @@ -292,6 +290,7 @@
[Documentation] Group together the test steps for preparing, deploying
... and querying a model
[Arguments] ${model_storage_uri} ${model_name} ${isvc_name}=${model_name}
... ${runtime}=caikit-tgis-runtime ${protocol}=grpc ${inference_type}=all-tokens

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
github-advanced-security[bot] marked this conversation as resolved.
Fixed
Show resolved Hide resolved
github-advanced-security[bot] marked this conversation as resolved.
Fixed
Show resolved Hide resolved
... ${canaryTrafficPercent}=${EMPTY} ${namespace}=${TEST_NS} ${sa_name}=${DEFAULT_BUCKET_SA_NAME}
... ${n_queries}=${1} ${query_idx}=${0} ${validate_response}=${TRUE}
Compile Inference Service YAML isvc_name=${isvc_name}
Expand All @@ -303,8 +302,9 @@
Wait For Pods To Be Ready label_selector=serving.kserve.io/inferenceservice=${isvc_name}
... namespace=${namespace}
Query Model Multiple Times isvc_name=${isvc_name} model_name=${model_name}
... endpoint=${CAIKIT_ALLTOKENS_ENDPOINT} n_times=${n_queries} streamed_response=${FALSE}
... namespace=${namespace} query_idx=${query_idx} validate_response=${validate_response}
... n_times=${n_queries} namespace=${namespace} query_idx=${query_idx}
... validate_response=${validate_response} protocol=${protocol}
... runtime=${runtime} inference_type=${inference_type}

Upgrade Caikit Runtime Image
[Documentation] Replaces the image URL of the Caikit Runtim with the given
Expand Down Expand Up @@ -567,7 +567,7 @@
Clean Up Test Project
[Documentation] Deletes the given InferenceServices, check the NS gets removed from ServiceMeshMemberRoll
... and deletes the DS Project
[Arguments] ${test_ns} ${isvc_names} ${isvc_delete}=${TRUE}
[Arguments] ${test_ns} ${isvc_names} ${isvc_delete}=${TRUE} ${wait_prj_deletion}=${TRUE}
IF ${isvc_delete} == ${TRUE}
FOR ${index} ${isvc_name} IN ENUMERATE @{isvc_names}
Log Deleting ${isvc_name}
Expand All @@ -580,5 +580,9 @@
... namespace=${test_ns}
${rc} ${out}= Run And Return Rc And Output oc delete project ${test_ns}
Should Be Equal As Integers ${rc} ${0}
${rc} ${out}= Run And Return Rc And Output oc wait --for=delete namespace ${test_ns} --timeout=300s
Should Be Equal As Integers ${rc} ${0}
IF ${wait_prj_deletion}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of keywords deleting projects, for example Delete Data Science Project From CLI in ods_ci/tests/Resources/Page/ODH/ODHDashboard/ODHDataScienceProject/Projects.resource

Could you consider enhancing the existing one for your purposes in another PR?

Copy link
Contributor Author

@bdattoma bdattoma Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that while reviewing the code. I think the main difference is that the first one you mention handle DS Projects, while in this case we're handling basic OCP projects.

I didn't apply enhancements for now, need to thinki a bit more about how to implement it, but I agree

${rc} ${out}= Run And Return Rc And Output oc wait --for=delete namespace ${test_ns} --timeout=300s
Should Be Equal As Integers ${rc} ${0}
ELSE
Log Project deletion started, but won't wait for it to finish..
END
Dismissed Show dismissed Hide dismissed
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ ${SUBMIT_RUNTIME_BTN_XP}= //button[text()="Create"]
${UPLOAD_RUNTIME_BTN_XP}= //button[text()="Upload files"]
${SCRATCH_RUNTIME_BTN_XP}= //button[text()="Start from scratch"]
${EDITOR_RUNTIME_BTN_XP}= //div[contains(@class, "odh-dashboard__code-editor")]
&{PLATFORM_NAMES_MAPPING}= single=Single model serving platform multi=Multi-model serving platform
... both=Both single and multi-model serving platforms
&{PLATFORM_LABELS_MAPPING}= single=Single model multi=Multi-model
&{PLATFORM_NAMES_MAPPING}= single=Single-model serving platform multi=Multi-model serving platform
... both=Single-model and multi-model serving platforms
&{PLATFORM_LABELS_MAPPING}= single=Single-model multi=Multi-model


*** Keywords ***
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@
${KSERVE_MODAL_HEADER}= //header[@class="pf-v5-c-modal-box__header"]/h1[.="Deploy model"]
${KSERVE_RUNTIME_DROPDOWN}= //span[.="Serving runtime"]/../../..//div[@id="serving-runtime-template-selection"]
${LLM_RESOURCES_DIRPATH}= ods_ci/tests/Resources/Files/llm
${UWM_ENABLE_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/uwm_cm_enable.yaml
${UWM_CONFIG_FILEPATH}= ${LLM_RESOURCES_DIRPATH}/uwm_cm_conf.yaml


*** Keywords ***
Expand Down Expand Up @@ -56,8 +54,8 @@
[Arguments] ${project_name} ${model_name} ${framework} ${data_connection_name} ${model_path}
... ${existing_data_connection}=${TRUE} ${model_server}=Model Serving Test
# TODO: Does not work if there's already a model deployed
SeleniumLibrary.Wait Until Page Does Not Contain Element //article[@id="multi-serving-platform-card"]
SeleniumLibrary.Wait Until Page Does Not Contain Element //article[@id="single-serving-platform-card"]
SeleniumLibrary.Wait Until Page Does Not Contain Element //div[@id="multi-serving-platform-card"]
SeleniumLibrary.Wait Until Page Does Not Contain Element //div[@id="single-serving-platform-card"]
SeleniumLibrary.Wait Until Page Contains Deploy model
SeleniumLibrary.Click Button Deploy model
SeleniumLibrary.Wait Until Page Contains Element xpath://h1[.="Deploy model"]
Expand Down Expand Up @@ -248,7 +246,7 @@
... ${project_title}=${NONE}
${self_managed} = Is RHODS Self-Managed
${url}= Get Model Route via UI ${model_name}
${curl_cmd}= Set Variable curl -s ${url} -d @${inference_input}
${curl_cmd}= Set Variable curl -s ${url} -d ${inference_input}
IF ${token_auth}
IF "${project_title}" == "${NONE}"
${project_title}= Get Model Project ${model_name}
Expand Down Expand Up @@ -276,6 +274,33 @@
Fail msg=comparison between expected and actual failed, ${list}
END

Verify Model Inference With Retries

Check warning

Code scanning / Robocop

Keyword '{{ keyword_name }}' has too many arguments ({{ arguments_count }}/{{ max_allowed_count }}) Warning test

Keyword 'Verify Model Inference With Retries' has too many arguments (10/5)
[Documentation] We see the inference failing often in the tests. One possible cause might be
... timing: model not ready to reply yet, despite the pod is up and running and the
... endpoint exposed.
... This is a temporary mitigation meanwhile we find a better way to check the model
[Arguments] ${model_name} ${inference_input} ${expected_inference_output}

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
... ${token_auth}=${FALSE} ${project_title}=${NONE} ${retries}=${5}
Fixed Show fixed Hide fixed

Check notice

Code scanning / Robocop

There is too many arguments per continuation line ({{ arguments_count }} / {{ max_arguments_count }}) Note test

There is too many arguments per continuation line (3 / 1)
${status}= Run Keyword And Return Status Verify Model Inference
... ${model_name} ${inference_input} ${expected_inference_output} ${token_auth} ${project_title}
IF not ${status}
${retry}= Set Variable ${0}
WHILE ${retry} < ${retries}
IF ${retry} > 0
Log message=Modelmesh inference call failed ${retry}/${retries}.
... level=WARN
END
${status}= Run Keyword And Return Status Verify Model Inference
... ${model_name} ${inference_input} ${expected_inference_output} ${token_auth}
... ${project_title}
IF ${status}
BREAK
END
${retry}= Evaluate ${retry} + 1
Sleep 5s
END
END

Clean Up Model Serving Page
[Documentation] Deletes all currently deployed models, if any are present.
# Returns an empty list if no matching elements found
Expand Down Expand Up @@ -491,8 +516,8 @@
... aws_bucket_name=${S3.BUCKET_3.NAME} aws_s3_endpoint=${S3.BUCKET_3.ENDPOINT}
... aws_region=${S3.BUCKET_3.REGION}
IF ${enable_metrics}
Oc Apply kind=ConfigMap src=${UWM_ENABLE_FILEPATH}
Oc Apply kind=ConfigMap src=${UWM_CONFIG_FILEPATH}
Enable User Workload Monitoring
Configure User Workload Monitoring
ELSE
Log message=Skipping UserWorkloadMonitoring enablement.
END
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@
[Documentation] Test the inference result after having deployed a model that requires Token Authentication
[Tags] Sanity Tier1
... ODS-1920
Run Keyword And Continue On Failure Verify Model Inference ${MODEL_NAME} ${INFERENCE_INPUT} ${EXPECTED_INFERENCE_OUTPUT} token_auth=${TRUE}
# Run Keyword And Continue On Failure Verify Model Inference ${MODEL_NAME} ${INFERENCE_INPUT} ${EXPECTED_INFERENCE_OUTPUT} token_auth=${TRUE} # robocop: ignore
Fixed Show fixed Hide fixed
Run Keyword And Continue On Failure Verify Model Inference With Retries
... ${MODEL_NAME} ${INFERENCE_INPUT} ${EXPECTED_INFERENCE_OUTPUT} token_auth=${TRUE}
# Testing the same endpoint without token auth, should receive login page
Open Model Serving Home Page
${out}= Get Model Inference ${MODEL_NAME} ${INFERENCE_INPUT} token_auth=${FALSE}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,14 @@ Verify RHODS Users Can Deploy A Model Using A Custom Serving Runtime
... aws_access_key=${S3.AWS_ACCESS_KEY_ID} aws_secret_access=${S3.AWS_SECRET_ACCESS_KEY}
... aws_bucket_name=ods-ci-s3
Create Model Server server_name=${MODEL_SERVER_NAME} runtime=${UPLOADED_OVMS_DISPLAYED_NAME}
Serve Model project_name=${PRJ_TITLE} model_name=${model_name} framework=onnx existing_data_connection=${TRUE}
Serve Model project_name=${PRJ_TITLE} model_name=${model_name} framework=onnx
... existing_data_connection=${TRUE}
... data_connection_name=model-serving-connection model_path=mnist-8.onnx
Wait Until Runtime Pod Is Running server_name=${MODEL_SERVER_NAME}
... project_title=${PRJ_TITLE} timeout=15s
... project_title=${PRJ_TITLE} timeout=40s
Verify Model Status ${model_name} success
Verify Model Inference ${model_name} ${inference_input} ${exp_inference_output} token_auth=${TRUE}
Verify Model Inference With Retries ${model_name} ${inference_input} ${exp_inference_output}
... token_auth=${TRUE}
... project_title=${PRJ_TITLE}


Expand All @@ -94,7 +96,6 @@ Custom Serving Runtime Suite Setup
[Documentation] Suite setup steps for testing DSG. It creates some test variables
... and runs RHOSi setup
Set Library Search Order SeleniumLibrary
Launch Data Science Project Main Page username=${TEST_USER_3.USERNAME}
RHOSi Setup
Fetch CA Certificate If RHODS Is Self-Managed

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@
[Documentation] Send Batch Inference data to the already deployed model using Curl commands
[Arguments] ${model_name} ${project_name} ${lower_range}=1 ${upper_range}=5
FOR ${counter} IN RANGE ${lower_range} ${upper_range}
${inference_input}= Set Variable ods_ci/tests/Resources/Files/TrustyAI/loan_default_batched/batch_${counter}.json
${inference_input}= Set Variable @ods_ci/tests/Resources/Files/TrustyAI/loan_default_batched/batch_${counter}.json
bdattoma marked this conversation as resolved.
Show resolved Hide resolved
${inference_output} = Get Model Inference ${model_name} ${inference_input} token_auth=${FALSE}
... project_title=${project_name}
Should Contain ${inference_output} model_name
Expand Down
Loading
Loading