LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单好用的 pipeline。它的用法与大语言模型(LLM)推理 pipeline 类似。
目前,VLM pipeline 支持以下模型:
- Qwen-VL-Chat
- LLaVA series: v1.5, v1.6
- Yi-VL
我们诚挚邀请社区在 LMDeploy 中添加更多 VLM 模型的支持。
本文将以 liuhaotian/llava-v1.6-vicuna-7b 模型为例,展示 VLM pipeline 的用法。你将了解它的最基础用法,以及如何通过调整引擎参数和生成条件来逐步解锁更多高级特性,如张量并行,上下文窗口大小调整,随机采样,以及对话模板的定制。
此外,我们还提供针对多图、批量提示词等场景的实际推理示例。
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
如果在执行这个用例时,出现 ImportError
的错误,请按照提示安装相关的依赖包。
上面的例子中,推理时的提示词是 (prompt, image) 的 tuple 结构。除了这种结构外,pipeline 支持 openai 格式的提示词:
from lmdeploy import pipeline
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b')
prompts = [
{
'role': 'user',
'content': [
{'type': 'text', 'text': 'describe this image'},
{'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}}
]
}
]
response = pipe(prompts)
print(response)
设置引擎参数 tp
,可激活多卡并行能力
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(tp=2))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
创建 pipeline 时,通过设置引擎参数 session_len
,可以定制上下文窗口的最大长度
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(session_len=8192))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
可通过传入 GenerationConfig
修改 pipeline 的生成接口中的默认采样参数。
from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(tp=2, session_len=8192))
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.6)
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image), gen_config=gen_config)
print(response)
推理时,LMDeploy 会根据模型路径匹配内置的对话模板,并把对话模板应用到输入的提示词上。但是,对于类似 llava-v1.5-7b 视觉-语言模型,它使用的对话模板是 vicuna,但是这个模板名无法从模型路径中获取,所以需要用户指定。具体方式如下:
from lmdeploy import pipeline, ChatTemplateConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.5-7b',
chat_template_config=ChatTemplateConfig(model_name='vicuna'))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
关于如何自定义对话模版,请参考这里
对于多图的场景,在推理时,只要把它们放在一个列表中即可。不过,多图意味着输入 token 数更多,所以通常需要增大推理的上下文长度
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(session_len=8192))
image_urls=[
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg'
]
images = [load_image(img_url) for img_url in image_urls]
response = pipe(('describe these images', images))
print(response)
做批量提示词推理非常简单,只要把它们放在一个 list 结构中:
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(session_len=8192))
image_urls=[
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg"
]
prompts = [('describe this image', load_image(img_url)) for img_url in image_urls]
response = pipe(prompts)
print(response)
pipeline 进行多轮对话有两种方式,一种是按照 openai 的格式来构造 messages,另外一种是使用 pipeline.chat
接口。
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b',
backend_config=TurbomindEngineConfig(session_len=8192))
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.6)
sess = pipe.chat(('describe this image', image), gen_config=gen_config)
print(sess.response.text)
sess = pipe.chat('What is the woman doing?', session=sess, gen_config=gen_config)
print(sess.response.text)