Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy update #18

Merged
merged 8 commits into from
Oct 16, 2024
Merged

Envoy update #18

merged 8 commits into from
Oct 16, 2024

Conversation

kfswain
Copy link
Collaborator

@kfswain kfswain commented Oct 10, 2024

This modifies our Ext-Proc integration to use Envoy best practices, via EnvoyExtensionPolicy and EnvoyPatchPolicy.

Here we create the ext-proc integration in EnvoyExtensionPolicy. We then create an ORIGINAL_DST cluster in envoy, and create a listener to attach our ext-proc deployment to this newly created cluster.

Also update the readme to reflect these changes.

fixes #10

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 10, 2024
@liu-cong
Copy link
Contributor

liu-cong commented Oct 10, 2024

Should we just move this out of example/poc, now it's more "official"?

EDIT: nvm, I think we can do this as a follow up, in which we should release the new ext proc images.

@terrytangyuan
Copy link
Member

Should we just move this out of example/poc, now it's more "official"?

+1.

@kfswain
Copy link
Collaborator Author

kfswain commented Oct 10, 2024

I agree with this being top level. I was thinking about that when I was writing this and was waiting for feedback from others.

What do we think about just a duplication? Similar to how we have done with the ext-proc image refactor.

@liu-cong
Copy link
Contributor

I agree with this being top level. I was thinking about that when I was writing this and was waiting for feedback from others.

What do we think about just a duplication? Similar to how we have done with the ext-proc image refactor.

Sounds good to me. There are a couple of things: 1. We should build new ext proc from pkg/ext-proc instead of the poc one. 2. The "pods" (still need "podIPs") and "ensureFairness" flags are removed, need to update the yaml accordingly.

Feel free to do this in a follow up though.

# Specifically the field `original_dst_lb_config` allows us to enable
# `use_http_header` and `http_header_name`.
# Source: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto
- type: "type.googleapis.com/envoy.config.cluster.v3.Cluster"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

…fests to the top level Ext-Proc implementation
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 14, 2024
@kfswain
Copy link
Collaborator Author

kfswain commented Oct 14, 2024

/label tide/merge-method-squash

A custom GatewayClass `llm-gateway` which is configured with the llm routing ext proc will be installed into the `llm-gateway` namespace. It's configured to listen on port 8081 for traffic through ext-proc (in addition to the default 8080), see the `EnvoyProxy` configuration in `installation.yaml`. When you create Gateways, make sure the `llm-gateway` GatewayClass is used.
1. **Update Envoy Gateway Config to enable Patch Policy**

Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have instructions for deploying the Envoy gateway controller?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! The quickstart on line 10 points them there

- name: instance-gateway-ext-proc
image: ghcr.io/tomatillo-and-multiverse/ext-proc:demo
args:
#TODO: specify label selector and dynamically update pods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should actually pass the name of the LLMServerPool

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, totally agree, I plan to have ext-proc pull the selection from the config on the LSP so there is always a single source of truth (once we have that set up)

@@ -0,0 +1,26 @@
apiVersion: v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the other yamls under pkg/manifests seem duplicate to the ones in the examples/poc folder, do we need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're absolutely right. I think eventually we just want to delete the poc directory. But while we are still using forks of vLLM to operate. I think keeping the poc folder and a way to set it up there kind of implies the WIP nature of our project.

Or we can always fast-follow this PR with removing the poc dir

LMKWYT!

@ahg-g
Copy link
Contributor

ahg-g commented Oct 16, 2024

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit fcad109 into kubernetes-sigs:main Oct 16, 2024
2 checks passed
@kfswain kfswain deleted the envoy-update branch October 28, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standardized usage for Envoy Gateway
7 participants