-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frontend Improvements #47
Comments
For the OpenAI API server, you can learn from FastChat’s implementation (https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md), which supports all major OpenAI features such as completion, chat, and embedding. It can work with existing apps (e.g., LangChain) without code modifications Another option is to directly import FastChat and extend the existing the cacheflow integration in FastChat. |
@merrymercy Thanks for the suggestion! @zhuohan123 is working on the first direction. |
Thanks for the suggestion! We implemented a OpenAI API Server in #116 following fastchat's implementation. We currently implement the completion API. In the future, we are planning to import fastchat to implement the chat completion API. |
* support quark * using torch/all.h * loading weight from quark output * support both ammo and quark * Update doc * fix load ammo * fix linter * fix isort
remove expert_max hard code (vllm-project#47) vLLM-Ext: Full enabling of ALiBi (vllm-project#34) Add version inference via setuptools-scm (vllm-project#58) Revert "vLLM-Ext: Full enabling of ALiBi (vllm-project#34)" (vllm-project#59) Remove punica_hpu.py from vllm_hpu_extension (vllm-project#66) Removed previous (not-pipelined) pa implementation (vllm-project#72) Add flag to enable running softmax in fp32 (vllm-project#71) Update calibration readme link (vllm-project#73) allow lm_head quantization in calibration process (vllm-project#65) Pad to bmin if value is less (vllm-project#67) Update pyproject.toml (HabanaAI#75) --------- Co-authored-by: Michał Kuligowski <[email protected]>
The text was updated successfully, but these errors were encountered: