Add support for GPT-NeoX (Pythia) #50

WoosukKwon · 2023-04-26T10:41:52Z

This PR adds support for the GPT-NeoX (Pythia) model, which is the backbone of many popular models including Dolly V2, Stable-LM, and Open Assistant.

WoosukKwon · 2023-04-26T10:43:22Z

NOTE: Dolly V2 is not supported by this PR, because it uses Bfloat16, which some of our kernel do not support. It will be added by another PR.

zhuohan123

LGTM! See comments for more details.

zhuohan123 · 2023-04-27T14:36:07Z

cacheflow/models/gpt_neox.py

+
+    def initialize_dummy_weights(self) -> None:
+        for param in self.state_dict().values():
+            param.data.uniform_(-0.1, 0.1)


Nit: The U(-0.1, 0.1) initialization will lead to many out-of-ranges and NaNs during the model execution. Maybe use a smaller range like U(-1e-5, 1e-5)?

The (-0.1, 0.1) initialization actually works. However, to be cautious, I changed the range to (-1e-3, 1e-3).

zhuohan123 · 2023-04-27T14:38:50Z

cacheflow/models/memory_analyzer.py

+        self.max_position = 8192
+        self.tie_word_embeddings = config.tie_word_embeddings
+
+    def get_param_size(self) -> int:


Can we get the parameter size by counting the actual parameters after the models get initialized? Use some code like the following:

mem_params = sum([param.nelement()*param.element_size() for param in model.parameters()])

Good idea. Let's do that in another PR.

zhuohan123 · 2023-04-27T14:40:54Z

cacheflow/models/memory_analyzer.py

        dtype_size = get_dtype_size(self.dtype)
        return dtype_size * total

-    def get_max_num_gpu_blocks(
+    def get_max_act_size(


Similarly, can we profile the actual max activation size by running the model once without any KV cache?

SUMMARY: * "remote push" job for multi-gpu runner. * "remote push" job for single gpu runner. * patches for re-initialization of "ray". found other places in `vllm` where they are passing in `ignore_reinit_error=True`, it just looked like they missed a couple of places. * patch "find" command to only find *.py files starting with "test_". TEST PLAN: runs on remote push --------- Co-authored-by: andy-neuma <[email protected]>

* update quark quantizer command * typo * Using scaled_mm for untuned gemm * remove comment * fix yapf

WoosukKwon added 3 commits April 26, 2023 09:50

More generalized rotary embedding

6c55482

Add GPT-NeoX

5ea268a

Add TP support

1628ae8

Add pythia

1097ebc

WoosukKwon requested a review from zhuohan123 April 26, 2023 11:03

WoosukKwon linked an issue Apr 26, 2023 that may be closed by this pull request

Add support for Stable-LM and OpenAssistant #43

Closed

zhuohan123 approved these changes Apr 27, 2023

View reviewed changes

Fix dummy weight range

e5e373f

WoosukKwon merged commit a96d63c into main Apr 28, 2023

WoosukKwon deleted the gpt-neox branch April 28, 2023 08:00

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024

Add support for GPT-NeoX (Pythia) (vllm-project#50)

f75e56d

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024

Use scaled mm for untuned fp8 gemm (vllm-project#50)

0438499

* update quark quantizer command * typo * Using scaled_mm for untuned gemm * remove comment * fix yapf

JHLEE17 pushed a commit to JHLEE17/vllm that referenced this pull request Aug 1, 2024

Use int32 seeds for random sampler on HPU (vllm-project#50)

ab359ac

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GPT-NeoX (Pythia) #50

Add support for GPT-NeoX (Pythia) #50

WoosukKwon commented Apr 26, 2023

WoosukKwon commented Apr 26, 2023

zhuohan123 left a comment

zhuohan123 Apr 27, 2023

WoosukKwon Apr 28, 2023

zhuohan123 Apr 27, 2023

WoosukKwon Apr 28, 2023

zhuohan123 Apr 27, 2023

Add support for GPT-NeoX (Pythia) #50

Add support for GPT-NeoX (Pythia) #50

Conversation

WoosukKwon commented Apr 26, 2023

WoosukKwon commented Apr 26, 2023

zhuohan123 left a comment

Choose a reason for hiding this comment

zhuohan123 Apr 27, 2023

Choose a reason for hiding this comment

WoosukKwon Apr 28, 2023

Choose a reason for hiding this comment

zhuohan123 Apr 27, 2023

Choose a reason for hiding this comment

WoosukKwon Apr 28, 2023

Choose a reason for hiding this comment

zhuohan123 Apr 27, 2023

Choose a reason for hiding this comment