Tests for Top Level Request Caching for Ensemble Models #7074

lkomali · 2024-04-05T01:56:17Z

Related PR: triton-inference-server/core#338

Added 4 new tests in L0_response_cache to test top level request caching for ensemble models
Test 1: When cache and decoupled enabled in ensemble model config: Error
Test 2: When cache enabled in ensemble model config and decoupled enabled in composing model: Error
Test 3: When cache enabled only in ensemble model: If cache hit = 0 or cache hit count doesn't exist - Fail
Expected: Non-zero cache hit count
Test 4: When cache enabled in all models: If cache hit = 0 or cache hit > inference count - Fail
Expected: Non zero cache hit count and cache hit count < successful inference count.

No modifications in L0_perf_analyzer_report test. Removed wait $SERVER_PID as is not returning/exiting with 0 because Triton server is taking a long time to shut down.

qa/L0_response_cache/ensemble_cache_test.py

qa/L0_perf_analyzer_report/test.sh

qa/L0_response_cache/ensemble_cache_test.py

qa/L0_response_cache/test.sh

rmccorm4 · 2024-04-15T18:28:53Z

qa/L0_response_cache/test.sh

+  MODEL_LOAD_TYPE="${3}"
+  EXTRA_ARGS="--model-control-mode=${MODEL_CONTROL_MODE} --load-model=${MODEL_LOAD_TYPE}"
+  SERVER_ARGS="--model-repository=${MODEL_REPOSITORY} --cache-config local,size=1048576000 ${EXTRA_ARGS}"
+  source ../common/util.sh


This should be moved higher up and outside of the function - shouldn't need to keep re-sourcing the file in each function call.

Actually looks like it's already included higher up, so this can just be removed.

I removed the SERVER_ARGS defined outside the function as it is not used. This should be inside the function as ${MODEL_REPOSITORY} changes for decoupled_cache test case and ensemble cache testcase. Other testcases use different SERVER_ARGS.

…omposing models.\n 2.Removed duplicate SERVER_ARGS L0_response_cache/test.sh

… structure

…ture

qa/L0_response_cache/ensemble_cache_test.py

qa/L0_perf_analyzer_report/test.sh

qa/L0_response_cache/test.sh

qa/L0_response_cache/ensemble_cache_test.py

oandreeva-nv · 2024-05-02T19:19:54Z

qa/L0_response_cache/ensemble_cache_test.py

+
+    def _run_inference_and_validate(self, model):
+        # Utility function to call run_ensemble for inference and validate expected baseline output and stats
+        self.triton_client.load_model(self.ensemble_model)


Should the past argument be model and not self.ensemble_model ?

The parameter for _run_inference_and_validate should be model because the model can be ensemble model or composing model. The passed model's stats will be validated according to testcase.
In case of 3rd testcase which has response cache enabled only in composing model, the ensemble model stats will have empty fields for cache related metrics. That's the reason why I'm separately passing model parameter to define which model's stats to be verified.

My question was mostly for load_model argument. It was a slightly confusing and not clear from the start why we do that, so that is why I asked to clarify test plan.

qa/L0_response_cache/ensemble_cache_test.py

qa/L0_response_cache/test.sh

oandreeva-nv

I think this PR is in a very good state. Nice work on re-factoring since the original commit! Just a couple of small additions and we'll be able to merge it

qa/L0_perf_analyzer_report/test.sh

qa/L0_response_cache/ensemble_cache_test.py

rmccorm4 · 2024-05-03T05:12:42Z

qa/L0_response_cache/ensemble_cache_test.py

+        model_stats = self.triton_client.get_inference_statistics(
+            model_name=model, as_json=True
+        )
+        return model_stats["model_stats"][1]["inference_stats"]


Add comment on why [1] here -> which model? If it's for a specific model, I would add some small assert that model_stats["model_stats"][1]["name"] equals the expected model you're checking for

The model have two versions version 1 and version 3. Version 1 stats are at index 0 and version 3 stats are index 1.
Version 3 is loaded. so to access it's stats, index is set to 1.
Added the comment in the test file too.

rmccorm4 · 2024-05-03T05:17:39Z

qa/L0_response_cache/ensemble_cache_test.py

+        self._update_config(
+            self.composing_config_file, RESPONSE_CACHE_PATTERN, RESPONSE_CACHE_CONFIG
+        )
+        self._run_inference_and_validate(self.composing_model)


Should we be running inference on the ensemble model here, and validating that ensemble did inference but has no cache stats, and that composing model does have correct cache stats? Looks like we're doing inference on composing model directly here so not actually testing the ensemble flow.

We are running inference on ensemble model only. The model parameter is only to verify baseline stats in run_inference_and_validate. For this testcase, ensemble model stats are going to be empty. So I passed model as a parameter to correctly verify corresponding model's stats, for this testcase it's composing model.

qa/L0_response_cache/ensemble_cache_test.py

rmccorm4 · 2024-05-03T05:23:29Z

qa/L0_response_cache/test.sh

+TEST_RESULT_FILE='test_results.txt'
+SERVER_LOG=./inference_server.log
+RESET_CONFIG_FUNCTION="_reset_config_files"
+CACHE_SIZE=10840


Add comment on size choice - why this number? what's the requirements/goal?

I used a random number. Something that worked for me.
For test_cache_insertion testcase, I used CACHE_SIZE=200 because the data to be inserted is more than 200. Found the value through calculations and testing.

rmccorm4

Looking really good! Only minor comments

oandreeva-nv · 2024-05-06T22:24:16Z

qa/L0_response_cache/ensemble_cache_test.py

+        Helper function that takes model as a parameter to verify the corresponding model's stats
+        The passed model is composing model for test case `test_ensemble_composing_model_cache_enabled`
+        For other testcases, the top-level ensemble model stats are verified.
+            * loads the simple_graphdef_float32_float32_float32 and graphdef_float32_float32_float32


nit: clarify which of these models is an ensemble and which is not

rmccorm4

Great work Harshini! 🚀

Can you add a link on slack to a CI pipeline running with all your latest changes to make sure everything looks good?

lkomali added 2 commits April 4, 2024 18:43

"Tests for DLIS-4626"

7b61e05

Fix Copyright and remove unwanted line

c040dce

lkomali changed the title ~~Lkomali dlis 4626 tests~~ Tests for Top Level Request Caching for Ensemble Models Apr 5, 2024

github-advanced-security bot found potential problems Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Fixed Show fixed Hide fixed

qa/L0_response_cache/ensemble_cache_test.py Fixed Show fixed Hide fixed

qa/L0_response_cache/ensemble_cache_test.py Fixed Show resolved Hide resolved

lkomali requested review from rmccorm4 and dyastremsky April 5, 2024 19:07

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_perf_analyzer_report/test.sh Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_perf_analyzer_report/test.sh Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 5, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

lkomali added 2 commits April 12, 2024 10:42

Fix L0_response_cache comments

54fbace

Pre-commit fix

f1bcb67

rmccorm4 reviewed Apr 12, 2024

View reviewed changes

qa/L0_response_cache/test.sh Show resolved Hide resolved

Added named args

4f903c5

rmccorm4 reviewed Apr 15, 2024

View reviewed changes

lkomali added 3 commits April 15, 2024 20:35

1.Added a testcase to test the output when cache is enabled only in c…

a7b447c

…omposing models.\n 2.Removed duplicate SERVER_ARGS L0_response_cache/test.sh

Added new Testcase to check cache insertion failure and modified code…

4dffcfe

… structure

New testcase to check cache insertion failure and modified code struc…

28903d6

…ture

github-advanced-security bot found potential problems Apr 25, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Fixed Show fixed Hide fixed

qa/L0_response_cache/ensemble_cache_test.py Fixed Show fixed Hide fixed

Fix CodeQL warnings

3f4c54a

lkomali requested a review from rmccorm4 April 25, 2024 08:32

rmccorm4 requested review from Tabrizian and oandreeva-nv and removed request for dyastremsky April 29, 2024 18:06

oandreeva-nv reviewed Apr 29, 2024

View reviewed changes

qa/L0_perf_analyzer_report/test.sh Outdated Show resolved Hide resolved

oandreeva-nv reviewed Apr 29, 2024

View reviewed changes

qa/L0_response_cache/test.sh Outdated Show resolved Hide resolved

lkomali added 2 commits May 1, 2024 14:02

Refactor tests using load APIs

c45de85

Fix codepell issue

8ebc45d

github-advanced-security bot found potential problems May 1, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Fixed Show fixed Hide fixed

Remove unused variable

e044a85

lkomali requested a review from oandreeva-nv May 1, 2024 21:13

lkomali added 3 commits May 1, 2024 14:22

Removed unused decoupled variables

9a48f05

Comments formatting in test.sh

5f30b43

Fix Copyright

fccd634

oandreeva-nv reviewed May 2, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed May 2, 2024

View reviewed changes

qa/L0_response_cache/test.sh Outdated Show resolved Hide resolved

oandreeva-nv reviewed May 2, 2024

View reviewed changes

rmccorm4 reviewed May 3, 2024

View reviewed changes

qa/L0_perf_analyzer_report/test.sh Outdated Show resolved Hide resolved

rmccorm4 reviewed May 3, 2024

View reviewed changes

qa/L0_perf_analyzer_report/test.sh Show resolved Hide resolved

rmccorm4 reviewed May 3, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Outdated Show resolved Hide resolved

rmccorm4 reviewed May 3, 2024

View reviewed changes

qa/L0_response_cache/ensemble_cache_test.py Show resolved Hide resolved

rmccorm4 reviewed May 3, 2024

View reviewed changes

lkomali added 2 commits May 5, 2024 23:03

Fix comments

4e42147

Clang-format

b55f58a

oandreeva-nv reviewed May 6, 2024

View reviewed changes

rmccorm4 approved these changes May 7, 2024

View reviewed changes

Merge branch 'main' into lkomali-dlis-4626-tests

8efd408

rmccorm4 merged commit 33db1eb into main May 9, 2024
3 checks passed

rmccorm4 deleted the lkomali-dlis-4626-tests branch May 9, 2024 20:38

lkomali added a commit that referenced this pull request May 9, 2024

Tests for Top Level Request Caching for Ensemble Models (#7074)

339f72d

mc-nv added a commit that referenced this pull request May 10, 2024

Tests for Top Level Request Caching for Ensemble Models (#7074) #7201

598445b

mc-nv pushed a commit that referenced this pull request May 10, 2024

Tests for Top Level Request Caching for Ensemble Models (#7074) (#7201)

35d50d3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for Top Level Request Caching for Ensemble Models #7074

Tests for Top Level Request Caching for Ensemble Models #7074

lkomali commented Apr 5, 2024

rmccorm4 Apr 15, 2024 •

edited

Loading

rmccorm4 Apr 15, 2024

lkomali Apr 16, 2024

oandreeva-nv May 2, 2024

lkomali May 2, 2024 •

edited

Loading

oandreeva-nv May 2, 2024

oandreeva-nv left a comment

rmccorm4 May 3, 2024

lkomali May 6, 2024 •

edited

Loading

rmccorm4 May 3, 2024

lkomali May 6, 2024

rmccorm4 May 3, 2024

lkomali May 6, 2024

rmccorm4 left a comment

oandreeva-nv May 6, 2024

rmccorm4 left a comment

Tests for Top Level Request Caching for Ensemble Models #7074

Tests for Top Level Request Caching for Ensemble Models #7074

Conversation

lkomali commented Apr 5, 2024

rmccorm4 Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lkomali May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oandreeva-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lkomali May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmccorm4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmccorm4 left a comment

Choose a reason for hiding this comment

rmccorm4 Apr 15, 2024 •

edited

Loading

lkomali May 2, 2024 •

edited

Loading

lkomali May 6, 2024 •

edited

Loading