-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Refactor cpu metrics tests to make L0_metrics more stable #7476
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TmLg! Great work Ryan! Look forward to seeing the greener pipelines 🚀
And thanks for fixing things up to be Windows CI friendly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
The previous implementation of tests for the CPU/RAM related metrics were:
nv_cpu_utilization
because there was no work going on in the background, making duplicate values appear fairly frequently.This PR:
pytest
style that generates nice reports on the gitlab side for easier regression detection at the unit test level. I also noticed the use ofcheck_test_results
without using thetu.ResultCollector
for the pinned memory metrics tests section - so it wasn't actually checking the right test results file anyway, since none was being generated. I moved this one to the pytest style to be consistent with the similar cpu metrics test.kill_server
andTRITONSERVER_IPADDR
.Average results of the PR from testing so far typically makes about 20 observations (1 per second for 20 second test), and finds no duplicate values. However, keeping the duplicate tolerance will help stabilize in the event we occasionally find some: