Gateway API Inference Extension

The Gateway API Inference Extension came out of wg-serving and is sponsored by SIG Network. This repo contains: the load balancing algorithm, ext-proc code, CRDs, and controllers of the extension.

This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the proposal for more info.

Status

This project is currently in development.

Getting Started

Follow this README to get the inference-extension up and running on your cluster!

End-to-End Tests

Follow this README to learn more about running the inference-extension end-to-end test suite on your cluster.

Website

Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/

Contributing

Our community meeting is weekly at Thursday 10AM PDT (Zoom, Meeting Notes).

We currently utilize the #wg-serving slack channel for communications.

Contributions are readily welcomed, follow the dev guide to start contributing!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Gateway API Inference Extension

Status

Getting Started

End-to-End Tests

Website

Contributing

Code of conduct

Files

README.md

Latest commit

History

README.md

File metadata and controls

Gateway API Inference Extension

Status

Getting Started

End-to-End Tests

Website

Contributing

Code of conduct