fix: Fix bug when targeting the TRT-LLM backend ensemble #7700

blongnv · 2024-10-14T21:45:34Z

What does the PR do?

If a user follows the example code and tries to use the ensemble instead of tensorrt_llm_bls, they will get this error:

curl -s http://localhost:9000/v1/chat/completions -H 'Content-Type: application/json' -d '{"model": "ensemble", "messages": [{"role": "user", "content": "Say this is a test!"}]}' | jq
INFO:     127.0.0.1:40064 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
{
  "detail": "Unknown backend"
}

After the PR fix, the response will be correct:

curl -s http://localhost:9000/v1/chat/completions -H 'Content-Type: application/json' -d '{"model": "ensemble", "messages": [{"role": "user", "content": "Say this is a test!"}]}' | jq
INFO:     127.0.0.1:47946 - "POST /v1/chat/completions HTTP/1.1" 200 OK
{
  "id": "cmpl-fefc9e2a-8a74-11ef-8e2a-0242c0a80102",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "This is a test!",
        "tool_calls": null,
        "role": "assistant",
        "function_call": null
      },
      "logprobs": null
    }
  ],
  "created": 1728942057,
  "model": "ensemble",
  "system_fingerprint": null,
  "object": "chat.completion",
  "usage": null
}

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

python/openai/openai_frontend/engine/triton_engine.py

rmccorm4 · 2024-10-14T22:00:41Z

ref: DLIS-7451 to follow-up with a test for ensemble path to maintain support and catch regressions

Co-authored-by: Ryan McCormick <[email protected]>

rmccorm4 · 2024-10-14T22:35:04Z

Pipeline: 19345468 ✅

nnshah1 · 2024-10-14T23:15:22Z

python/openai/openai_frontend/engine/triton_engine.py

@@ -250,6 +250,9 @@ def _get_model_metadata(self) -> Dict[str, TritonModelMetadata]:
        for name, _ in self.server.models().keys():
            model = self.server.model(name)
            backend = model.config()["backend"]
+            if not backend:
+                # Check platform field as a backup, this will support 'ensemble' models
+                backend = model.config()["platform"]


@rmccorm4 should we check against a list of supported platform types here to rule out any accidents

Sure, we can be strict here just in case.

There's currently no use case I'm aware of where the platform would be used to specify a backend (ex: platform: tensorrt_plan) that doesn't have a backend equivalent (other than ensemble), and we generally encourage use of backend for this purpose for pretty much everything other than ensembles - so I think it's fine to be strict for now.

@blongnv could you modify it to be something like this?

# Explicitly handle ensembles to avoid any runtime validation errors if not backend and model.config()["platform"] == "ensemble": backend = "ensemble"

Implemented, please check

Fix bug when targeting the TRT-LLM backend ensemble

36a55e9

rmccorm4 reviewed Oct 14, 2024

View reviewed changes

python/openai/openai_frontend/engine/triton_engine.py Outdated Show resolved Hide resolved

rmccorm4 changed the title ~~Fix bug when targeting the TRT-LLM backend ensemble~~ fix: Fix bug when targeting the TRT-LLM backend ensemble Oct 14, 2024

rmccorm4 added the PR: fix A bug fix label Oct 14, 2024

Update python/openai/openai_frontend/engine/triton_engine.py

fbd009c

Co-authored-by: Ryan McCormick <[email protected]>

rmccorm4 previously approved these changes Oct 14, 2024

View reviewed changes

nnshah1 reviewed Oct 14, 2024

View reviewed changes

Implement PR feedback

319bd2a

blongnv dismissed rmccorm4’s stale review via 319bd2a October 15, 2024 17:24

rmccorm4 approved these changes Oct 16, 2024

View reviewed changes

rmccorm4 merged commit 479486f into triton-inference-server:main Oct 16, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix bug when targeting the TRT-LLM backend ensemble #7700

fix: Fix bug when targeting the TRT-LLM backend ensemble #7700

blongnv commented Oct 14, 2024 •

edited by rmccorm4

Loading

rmccorm4 commented Oct 14, 2024

rmccorm4 commented Oct 14, 2024

nnshah1 Oct 14, 2024

rmccorm4 Oct 15, 2024 •

edited

Loading

rmccorm4 Oct 15, 2024

blongnv Oct 15, 2024

fix: Fix bug when targeting the TRT-LLM backend ensemble #7700

fix: Fix bug when targeting the TRT-LLM backend ensemble #7700

Conversation

blongnv commented Oct 14, 2024 • edited by rmccorm4 Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

rmccorm4 commented Oct 14, 2024

rmccorm4 commented Oct 14, 2024

nnshah1 Oct 14, 2024

Choose a reason for hiding this comment

rmccorm4 Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Oct 15, 2024

Choose a reason for hiding this comment

blongnv Oct 15, 2024

Choose a reason for hiding this comment

blongnv commented Oct 14, 2024 •

edited by rmccorm4

Loading

rmccorm4 Oct 15, 2024 •

edited

Loading