Use dtype from model config & Add Dolly V2 #63

WoosukKwon · 2023-05-03T23:28:20Z

This PR adds the default option to dtype, which uses FP16 for FP16 and FP32 models and BF16 for BF16 models. While this option will be used by default, users can specify the data type if they want to use BF16 for FP32 models.

In addition, the PR integrates Dolly V2, a recent LLM with the GPT-NeoX architecture. The model is trained and saved in BF16.

zhuohan123

LGTM!

SUMMARY: * delete NOTICE.txt file TEST PLAN: none Co-authored-by: andy-neuma <[email protected]>

sync release with main @ v0.5.0.post1-99-g8720c92e

* Add more detailed event names to profiler * Add more profiler stats * separate prompt and decode batch utilization * Add more metrics * revert engine/metrics.py changes * un-singletonify (what a funny word) habana profiler * formatting * add batch block utilization metric * fix division by zero * fix batch_block_utilization formula * minor refactors

Add default option to dtype & Add Dolly V2

7e10fac

WoosukKwon requested a review from zhuohan123 May 3, 2023 23:28

zhuohan123 approved these changes May 4, 2023

View reviewed changes

WoosukKwon added 2 commits May 4, 2023 10:01

Merge branch 'main' into dolly-v2

6eda50c

Minor

1d23ee9

WoosukKwon merged commit 189ae23 into main May 4, 2023

WoosukKwon deleted the dolly-v2 branch May 4, 2023 10:05

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Use dtype from model config & Add Dolly V2 (vllm-project#63)

f1bccfa

yuhuixu1993 mentioned this pull request Jun 2, 2024

[Bug]: loading squeezellm model #5190

Closed

yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024

delete NOTICE.txt (vllm-project#63)

1a59725

SUMMARY: * delete NOTICE.txt file TEST PLAN: none Co-authored-by: andy-neuma <[email protected]>

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024

fix for oob LDS fill in wvSpltK slm version (vllm-project#63)

c455e9c

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Jul 30, 2024

Merge pull request vllm-project#63 from dtrifiro/sync-release-with-main

7cc6a9b

sync release with main @ v0.5.0.post1-99-g8720c92e

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dtype from model config & Add Dolly V2 #63

Use dtype from model config & Add Dolly V2 #63

WoosukKwon commented May 3, 2023

zhuohan123 left a comment

Use dtype from model config & Add Dolly V2 #63

Use dtype from model config & Add Dolly V2 #63

Conversation

WoosukKwon commented May 3, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment