Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OVMS on Kserve GPU test #1212

Merged
merged 13 commits into from
Feb 20, 2024

Conversation

lugi0
Copy link
Contributor

@lugi0 lugi0 commented Feb 16, 2024

No description provided.

Signed-off-by: Luca Giorgi <[email protected]>
@lugi0 lugi0 added needs testing Needs to be tested in Jenkins new test New test(s) added (PR will be listed in release-notes) labels Feb 16, 2024
@lugi0 lugi0 self-assigned this Feb 16, 2024
Signed-off-by: Luca Giorgi <[email protected]>
Signed-off-by: Luca Giorgi <[email protected]>
Signed-off-by: Luca Giorgi <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 16, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass %
407 0 0 407 100

@lugi0
Copy link
Contributor Author

lugi0 commented Feb 16, 2024

Verified in rhods-ci-pr-test/2512 but need to figure out why the nvidia prometheus query is returning 0

@lugi0 lugi0 added verified This PR has been tested with Jenkins do not merge Do not merge this yet please and removed needs testing Needs to be tested in Jenkins labels Feb 16, 2024
Copy link
Member

@jstourac jstourac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put one comment, otherwise LGTM, but I don't have much knowledge behind this 🙂

${node}= Get Node Pod Is Running On namespace=${PRJ_TITLE_GPU}
... label=serving.kserve.io/inferenceservice=${MODEL_NAME_GPU}
${type}= Get Instance Type Of Node ${node}
Should Be Equal As Strings ${type} "g4dn.xlarge"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to check the node has gpu without pinning to the node title? We may run on different GPU types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other option I can think of is looking at the labels, but that is dependent on the nfd operator / nvidia label which wouldn't work for other accelerator types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to make the test compatible with different GPU node types. I think it can be done in another PR once we understand how

@lugi0 lugi0 requested a review from jstourac February 19, 2024 16:24
@lugi0 lugi0 requested a review from bdattoma February 19, 2024 16:24
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@lugi0
Copy link
Contributor Author

lugi0 commented Feb 19, 2024

Validated again with rhods-ci-pr-test/2514, the TC fails but it's tagged with it's own product bug ID

@lugi0 lugi0 removed the do not merge Do not merge this yet please label Feb 19, 2024
${node}= Get Node Pod Is Running On namespace=${PRJ_TITLE_GPU}
... label=serving.kserve.io/inferenceservice=${MODEL_NAME_GPU}
${type}= Get Instance Type Of Node ${node}
Should Be Equal As Strings ${type} "g4dn.xlarge"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use any other GPU this test will fail correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I said in another comment above the only other option I can think of is looking at the labels, but that is dependent on the nfd operator / nvidia label which wouldn't work for other accelerator types

@lugi0 lugi0 merged commit 1b9ea7a into red-hat-data-services:master Feb 20, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new test New test(s) added (PR will be listed in release-notes) verified This PR has been tested with Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants