Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GPT-NeoX (Pythia) #50

Merged
merged 5 commits into from
Apr 28, 2023
Merged

Add support for GPT-NeoX (Pythia) #50

merged 5 commits into from
Apr 28, 2023

Conversation

WoosukKwon
Copy link
Collaborator

This PR adds support for the GPT-NeoX (Pythia) model, which is the backbone of many popular models including Dolly V2, Stable-LM, and Open Assistant.

@WoosukKwon
Copy link
Collaborator Author

NOTE: Dolly V2 is not supported by this PR, because it uses Bfloat16, which some of our kernel do not support. It will be added by another PR.

@WoosukKwon WoosukKwon requested a review from zhuohan123 April 26, 2023 11:03
@WoosukKwon WoosukKwon linked an issue Apr 26, 2023 that may be closed by this pull request
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! See comments for more details.


def initialize_dummy_weights(self) -> None:
for param in self.state_dict().values():
param.data.uniform_(-0.1, 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The U(-0.1, 0.1) initialization will lead to many out-of-ranges and NaNs during the model execution. Maybe use a smaller range like U(-1e-5, 1e-5)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (-0.1, 0.1) initialization actually works. However, to be cautious, I changed the range to (-1e-3, 1e-3).

self.max_position = 8192
self.tie_word_embeddings = config.tie_word_embeddings

def get_param_size(self) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get the parameter size by counting the actual parameters after the models get initialized? Use some code like the following:

mem_params = sum([param.nelement()*param.element_size() for param in model.parameters()])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let's do that in another PR.

dtype_size = get_dtype_size(self.dtype)
return dtype_size * total

def get_max_num_gpu_blocks(
def get_max_act_size(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, can we profile the actual max activation size by running the model once without any KV cache?

@WoosukKwon WoosukKwon merged commit a96d63c into main Apr 28, 2023
@WoosukKwon WoosukKwon deleted the gpt-neox branch April 28, 2023 08:00
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
yukavio pushed a commit to yukavio/vllm that referenced this pull request Jul 3, 2024
SUMMARY:
* "remote push" job for multi-gpu runner.
* "remote push" job for single gpu runner.
* patches for re-initialization of "ray". found other places in `vllm`
where they are passing in `ignore_reinit_error=True`, it just looked
like they missed a couple of places.
* patch "find" command to only find *.py files starting with "test_".


TEST PLAN:
runs on remote push

---------

Co-authored-by: andy-neuma <[email protected]>
dllehr-amd pushed a commit to dllehr-amd/vllm that referenced this pull request Jul 22, 2024
* update quark quantizer command

* typo

* Using scaled_mm for untuned gemm

* remove comment

* fix yapf
JHLEE17 pushed a commit to JHLEE17/vllm that referenced this pull request Aug 1, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Stable-LM and OpenAssistant
2 participants