add more models, new num_logprobs #285

derekk-nm · 2024-06-06T15:07:41Z

adding the microsoft/phi-2, google/gemma-1.1-2b-it, and HuggingFaceH4/zephyr-7b-gemma-v0.1 models to test_basic_server_correctness.py. this required increasing the number of logprobs included in the evaluation to avoid unexpected failure for a few prompts with these models. this did not negatively impact the other models.

ran the test locally multiple times. each time we passed, like this:

/root/pyvenv/nmv3119a/bin/python3 /root/.local/share/JetBrains/IntelliJIdea2023.3/python/helpers/pycharm/_jb_pytest_runner.py --target test_basic_server_correctness.py::test_models_on_server -- --forked 
Testing started at 2:24 PM ...
Launching pytest with arguments --forked test_basic_server_correctness.py::test_models_on_server --no-header --no-summary -q in /network/derekk/testdev1/nm-vllm/tests/basic_correctness

============================= test session starts ==============================
collecting ... collected 7 items
Running 7 items in this shard: tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-mistralai/Mistral-7B-Instruct-v0.2-4096-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50-4096-sparse_w16a16-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-NousResearch/Llama-2-7b-chat-hf-4096-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-4096-sparse_w16a16-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-microsoft/phi-2-2048-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-google/gemma-1.1-2b-it-2056-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-HuggingFaceH4/zephyr-7b-gemma-v0.1-4096-None-None]

test_basic_server_correctness.py::test_models_on_server[None-5-32-mistralai/Mistral-7B-Instruct-v0.2-4096-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50-4096-sparse_w16a16-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-NousResearch/Llama-2-7b-chat-hf-4096-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-4096-sparse_w16a16-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-microsoft/phi-2-2048-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-google/gemma-1.1-2b-it-2056-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-HuggingFaceH4/zephyr-7b-gemma-v0.1-4096-None-None] 

======================== 7 passed in 1332.51s (0:22:12) ========================

adding the microsoft/phi-2, google/gemma-1.1-2b-it, and HuggingFaceH4/zephyr-7b-gemma-v0.1 models to test_basic_server_correctness.py. this required increasing the number of logprobs included in the evaluation to avoid unexpected failure for a few prompts with these models.

andy-neuma

yeah

derekk-nm · 2024-06-06T18:25:42Z

@andy-neuma , the single failing test is the one we have known issues for, test_gptq_marlin.py. would you please squash and merge.

derekk-nm requested review from dbarbuzzi, dhuangnm and robertgshaw2-redhat June 6, 2024 15:07

dbarbuzzi approved these changes Jun 6, 2024

View reviewed changes

andy-neuma approved these changes Jun 6, 2024

View reviewed changes

robertgshaw2-redhat merged commit 87571b8 into main Jun 6, 2024
12 checks passed

robertgshaw2-redhat deleted the new_models_for_test_basic_server_correctness branch June 6, 2024 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more models, new num_logprobs #285

add more models, new num_logprobs #285

derekk-nm commented Jun 6, 2024

andy-neuma left a comment

derekk-nm commented Jun 6, 2024

add more models, new num_logprobs #285

add more models, new num_logprobs #285

Conversation

derekk-nm commented Jun 6, 2024

andy-neuma left a comment

Choose a reason for hiding this comment

derekk-nm commented Jun 6, 2024