Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

add more models, new num_logprobs #285

Merged

Conversation

derekk-nm
Copy link

adding the microsoft/phi-2, google/gemma-1.1-2b-it, and HuggingFaceH4/zephyr-7b-gemma-v0.1 models to test_basic_server_correctness.py. this required increasing the number of logprobs included in the evaluation to avoid unexpected failure for a few prompts with these models. this did not negatively impact the other models.

ran the test locally multiple times. each time we passed, like this:

/root/pyvenv/nmv3119a/bin/python3 /root/.local/share/JetBrains/IntelliJIdea2023.3/python/helpers/pycharm/_jb_pytest_runner.py --target test_basic_server_correctness.py::test_models_on_server -- --forked 
Testing started at 2:24 PM ...
Launching pytest with arguments --forked test_basic_server_correctness.py::test_models_on_server --no-header --no-summary -q in /network/derekk/testdev1/nm-vllm/tests/basic_correctness

============================= test session starts ==============================
collecting ... collected 7 items
Running 7 items in this shard: tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-mistralai/Mistral-7B-Instruct-v0.2-4096-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50-4096-sparse_w16a16-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-NousResearch/Llama-2-7b-chat-hf-4096-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-4096-sparse_w16a16-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-microsoft/phi-2-2048-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-google/gemma-1.1-2b-it-2056-None-None], tests/basic_correctness/test_basic_server_correctness.py::test_models_on_server[None-5-32-HuggingFaceH4/zephyr-7b-gemma-v0.1-4096-None-None]

test_basic_server_correctness.py::test_models_on_server[None-5-32-mistralai/Mistral-7B-Instruct-v0.2-4096-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/OpenHermes-2.5-Mistral-7B-pruned50-4096-sparse_w16a16-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-NousResearch/Llama-2-7b-chat-hf-4096-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-neuralmagic/Llama-2-7b-pruned70-retrained-ultrachat-4096-sparse_w16a16-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-microsoft/phi-2-2048-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-google/gemma-1.1-2b-it-2056-None-None] 
test_basic_server_correctness.py::test_models_on_server[None-5-32-HuggingFaceH4/zephyr-7b-gemma-v0.1-4096-None-None] 

======================== 7 passed in 1332.51s (0:22:12) ========================

adding the microsoft/phi-2, google/gemma-1.1-2b-it, and HuggingFaceH4/zephyr-7b-gemma-v0.1 models to test_basic_server_correctness.py.  this required increasing the number of logprobs included in the evaluation to avoid unexpected failure for a few prompts with these models.
Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@derekk-nm
Copy link
Author

@andy-neuma , the single failing test is the one we have known issues for, test_gptq_marlin.py. would you please squash and merge.

@robertgshaw2-redhat robertgshaw2-redhat merged commit 87571b8 into main Jun 6, 2024
12 checks passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the new_models_for_test_basic_server_correctness branch June 6, 2024 20:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants