You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should serve the main model to be queried from inspire with vLLM. This has some advantages over Ollama, like higher throughput and out-of-the-box metrics
Work involved
vLLM is already installed in the AI machine
Serve the preferred model via vLLM
Modify the AI backend to query this new port and model
Acceptance criteria
The model queried from inspire search is running in vLLM
The text was updated successfully, but these errors were encountered:
Description
We should serve the main model to be queried from inspire with vLLM. This has some advantages over Ollama, like higher throughput and out-of-the-box metrics
Work involved
Acceptance criteria
The text was updated successfully, but these errors were encountered: