The Gateway API Inference Extension came out of wg-serving and is sponsored by SIG Network. This repo contains: the load balancing algorithm, ext-proc code, CRDs, and controllers of the extension.
This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the proposal for more info.
This project is currently in development.
Follow this README to get the inference-extension up and running on your cluster!
Follow this README to learn more about running the inference-extension end-to-end test suite on your cluster.
Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/
Our community meeting is weekly at Thursday 10AM PDT (Zoom, Meeting Notes).
We currently utilize the #wg-serving slack channel for communications.
Contributions are readily welcomed, follow the dev guide to start contributing!
Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.