forked from mekaneeky/pretrain-subnet
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 3.0.0 #89
Merged
Release 3.0.0 #89
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
added 3 spaces. A very small PR, but adds readability
Support for larger models at a future block.
Update README.md
Eval loop adjustments.
Add a new tokenizer for 7B.
ghost
approved these changes
Mar 28, 2024
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This release contains the code to prepare for the move to 7b parameters as well as concentrating the rewards on fewer models and improving the speed that validators pick up new best models.
In addition to the previously announced change from 8k to 4k sequence length, we have also adjusted the future tokenizer from gpt3_5 to gpt4. To compensate, the block at which these new changes will take effect has been moved out one week to April 15, 2024 ~8:00 AM at block 2,786,061.
To reiterate the final set of changes that will occur at that block are:
The parameter limit will be raised to 6.9 billion.
The size limit for the hugging face repo for the model will be raised to 15 gigabytes.
New The tokenizer used for evaluation will become https://huggingface.co/Xenova/gpt-4
New The sequence length used for inference will be 4096.
When loading the pretrained model for inference the torch_dtype will be bfloat16 and the attn_implementation will be flash_attention_2.
New Allowed model types has been adjusted to include new model types (Phi and Gemma) and remove those not supporting flash attention.
Validators: You should upgrade immediately to align your weight distributions to the new model. Additionally, you may need to upgrade your machine by April 15, 2024 to support the following requirement changes:
You must have a GPU with at least 48 gigabytes of memory that can support at least 38 TFLOPs for half precision (Bfloat 16) operations.
You must have a GPU that supports flash attention 2 and bfloat 16: https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features
New You must have at least 1 TB of disk space.