-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for Top Level Request Caching for Ensemble Models #7074
Conversation
qa/L0_response_cache/test.sh
Outdated
MODEL_LOAD_TYPE="${3}" | ||
EXTRA_ARGS="--model-control-mode=${MODEL_CONTROL_MODE} --load-model=${MODEL_LOAD_TYPE}" | ||
SERVER_ARGS="--model-repository=${MODEL_REPOSITORY} --cache-config local,size=1048576000 ${EXTRA_ARGS}" | ||
source ../common/util.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved higher up and outside of the function - shouldn't need to keep re-sourcing the file in each function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looks like it's already included higher up, so this can just be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the SERVER_ARGS defined outside the function as it is not used. This should be inside the function as ${MODEL_REPOSITORY} changes for decoupled_cache test case and ensemble cache testcase. Other testcases use different SERVER_ARGS.
…omposing models.\n 2.Removed duplicate SERVER_ARGS L0_response_cache/test.sh
|
||
def _run_inference_and_validate(self, model): | ||
# Utility function to call run_ensemble for inference and validate expected baseline output and stats | ||
self.triton_client.load_model(self.ensemble_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the past argument be model
and not self.ensemble_model
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter for _run_inference_and_validate should be model because the model can be ensemble model or composing model. The passed model's stats will be validated according to testcase.
In case of 3rd testcase which has response cache enabled only in composing model, the ensemble model stats will have empty fields for cache related metrics. That's the reason why I'm separately passing model parameter to define which model's stats to be verified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was mostly for load_model
argument. It was a slightly confusing and not clear from the start why we do that, so that is why I asked to clarify test plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is in a very good state. Nice work on re-factoring since the original commit! Just a couple of small additions and we'll be able to merge it
model_stats = self.triton_client.get_inference_statistics( | ||
model_name=model, as_json=True | ||
) | ||
return model_stats["model_stats"][1]["inference_stats"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment on why [1]
here -> which model? If it's for a specific model, I would add some small assert that model_stats["model_stats"][1]["name"]
equals the expected model you're checking for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model have two versions version 1 and version 3. Version 1 stats are at index 0 and version 3 stats are index 1.
Version 3 is loaded. so to access it's stats, index is set to 1.
Added the comment in the test file too.
self._update_config( | ||
self.composing_config_file, RESPONSE_CACHE_PATTERN, RESPONSE_CACHE_CONFIG | ||
) | ||
self._run_inference_and_validate(self.composing_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be running inference on the ensemble model here, and validating that ensemble did inference but has no cache stats, and that composing model does have correct cache stats? Looks like we're doing inference on composing model directly here so not actually testing the ensemble flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are running inference on ensemble model only. The model parameter is only to verify baseline stats in run_inference_and_validate. For this testcase, ensemble model stats are going to be empty. So I passed model as a parameter to correctly verify corresponding model's stats, for this testcase it's composing model.
TEST_RESULT_FILE='test_results.txt' | ||
SERVER_LOG=./inference_server.log | ||
RESET_CONFIG_FUNCTION="_reset_config_files" | ||
CACHE_SIZE=10840 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment on size choice - why this number? what's the requirements/goal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a random number. Something that worked for me.
For test_cache_insertion testcase, I used CACHE_SIZE=200 because the data to be inserted is more than 200. Found the value through calculations and testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good! Only minor comments
Helper function that takes model as a parameter to verify the corresponding model's stats | ||
The passed model is composing model for test case `test_ensemble_composing_model_cache_enabled` | ||
For other testcases, the top-level ensemble model stats are verified. | ||
* loads the simple_graphdef_float32_float32_float32 and graphdef_float32_float32_float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: clarify which of these models is an ensemble and which is not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Harshini! 🚀
Can you add a link on slack to a CI pipeline running with all your latest changes to make sure everything looks good?
Related PR: triton-inference-server/core#338
Added 4 new tests in L0_response_cache to test top level request caching for ensemble models
Test 1: When cache and decoupled enabled in ensemble model config: Error
Test 2: When cache enabled in ensemble model config and decoupled enabled in composing model: Error
Test 3: When cache enabled only in ensemble model: If cache hit = 0 or cache hit count doesn't exist - Fail
Expected: Non-zero cache hit count
Test 4: When cache enabled in all models: If cache hit = 0 or cache hit > inference count - Fail
Expected: Non zero cache hit count and cache hit count < successful inference count.
No modifications in L0_perf_analyzer_report test. Removed wait $SERVER_PID as is not returning/exiting with 0 because Triton server is taking a long time to shut down.