This document provides instructions on how to run the end-to-end tests.
The end-to-end tests are designed to validate end-to-end Gateway API Inference Extension functionality. These tests are executed against a Kubernetes cluster and use the Ginkgo testing framework to ensure the extension behaves as expected.
- Go installed on your machine.
- Make installed to run the end-to-end test target.
- A Hugging Face Hub token with access to the meta-llama/Llama-2-7b-hf model.
Follow these steps to run the end-to-end tests:
-
Clone the Repository: Clone the
gateway-api-inference-extension
repository:git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension.git && cd gateway-api-inference-extension
-
Export Your Hugging Face Hub Token: The token is required to run the test model server:
export HF_TOKEN=<MY_HF_TOKEN>
-
Run the Tests: Run the
test-e2e
target:make test-e2e
The test suite prints details for each step. Note that the
vllm-llama2-7b-pool
model server deployment may take several minutes to report anAvailable=True
status due to the time required for bootstraping.