vLLM-haystack-adapter

Simply use vLLM in your haystack pipeline, to utilize fast, self-hosted LLMs.

Installation

Install the wrapper via pip: pip install vllm-haystack

Usage

This integration provides two invocation layers:

vLLMInvocationLayer: To use models hosted on a vLLM server (or any other OpenAI compatible server)
vLLMLocalInvocationLayer: To use locally hosted vLLM models

Use a Model Hosted on a vLLM Server

To utilize the wrapper the vLLMInvocationLayer has to be used.

Here is a simple example of how a PromptNode can be created with the wrapper.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayer


model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
        "api_base" : API, # Replace this with your API-URL
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

The model will be inferred based on the model served on the vLLM server. For more configuration examples, take a look at the unit-tests.

Hosting a vLLM Server

To create an OpenAI-Compatible Server via vLLM you can follow the steps in the Quickstart section of their documentation.

Use a Model Hosted Locally

⚠️To run vLLM locally you need to have vllm installed and a supported GPU.

If you don't want to use an API-Server this wrapper also provides a vLLMLocalInvocationLayer which executes the vLLM on the same node Haystack is running on.

Here is a simple example of how a PromptNode can be created with the vLLMLocalInvocationLayer.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src/vllm_haystack		src/vllm_haystack
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM-haystack-adapter

Installation

Usage

Use a Model Hosted on a vLLM Server

Hosting a vLLM Server

Use a Model Hosted Locally

About

Releases 5

Packages

Contributors 5

Languages

License

LLukas22/vLLM-haystack-adapter

Folders and files

Latest commit

History

Repository files navigation

vLLM-haystack-adapter

Installation

Usage

Use a Model Hosted on a vLLM Server

Hosting a vLLM Server

Use a Model Hosted Locally

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 5

Languages

Packages