Serve models with vLLM #740

miguelgrc · 2025-02-03T14:18:57Z

Description

We should serve the main model to be queried from inspire with vLLM. This has some advantages over Ollama, like higher throughput and out-of-the-box metrics

Work involved

vLLM is already installed in the AI machine
Serve the preferred model via vLLM
Modify the AI backend to query this new port and model

Acceptance criteria

The model queried from inspire search is running in vLLM

miguelgrc added project: ai status: new labels Feb 3, 2025

miguelgrc self-assigned this Feb 6, 2025

miguelgrc mentioned this issue Feb 13, 2025

Deploy LLM in Kubeflow and point to it #762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serve models with vLLM #740

Serve models with vLLM #740

miguelgrc commented Feb 3, 2025

Serve models with vLLM #740

Serve models with vLLM #740

Comments

miguelgrc commented Feb 3, 2025

Description

Work involved

Acceptance criteria