Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.0.0 #89

Merged
32 commits merged into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
4e5cc6d
Update README.md
dougsillars Feb 16, 2024
5409309
Update model size on downloads based on block.
Mar 17, 2024
8c13811
Use optimizations at new block for inference.
Mar 18, 2024
c8cb2b8
Limit model types based on block.
Mar 18, 2024
e8206a7
Run inference with sequence length based on block.
Mar 18, 2024
9f8ae23
Doc updates.
Mar 18, 2024
3f5748c
Adjust temperature to prioritize top 1 model.
Mar 19, 2024
bd7501c
Adjust to only keep 10 best models + eval up to 15 new per loop.
Mar 19, 2024
90b870f
Check for updates to models with incentive first.
Mar 19, 2024
28916ff
Remove notebook and update cadence for check.
Mar 19, 2024
ea91667
Update to only 6 min, 14 max models by default.
Mar 19, 2024
c83a787
Fix docs + increase time for eval + adjust sample model parameters.
Mar 19, 2024
4957e80
Refactor to use ModelParameters + pass sequence length.
Mar 20, 2024
e520ff1
Rename to Model Criteria for clarity.
Mar 20, 2024
d8af206
Update docs to point to correct line for ModelCriteria.
Mar 20, 2024
99afe25
Update to use 6.9 params, 8192 seqeuence length, and block 2735661.
Mar 23, 2024
56a3713
Update to 24 pages and add clarify TFLOPs required.
Mar 23, 2024
0f26862
Update documentation on vali requirements and flash-attn requirements.
Mar 24, 2024
fadbe82
Merge branch 'dev' into next_milestone
Mar 24, 2024
7ae6d0c
Merge pull request #83 from RaoFoundation/next_milestone
Mar 24, 2024
5213654
Merge branch 'dev' into eval_loop_adjustments
Mar 24, 2024
3c1c44a
Merge pull request #76 from dougsillars/main
Mar 24, 2024
99e0588
Merge pull request #84 from RaoFoundation/eval_loop_adjustments
Mar 24, 2024
fd4681c
Add a new tokenizer for 7B
Mar 21, 2024
cd9819a
Bump to 6 minute timeouts and go back to random iterator start.
Mar 24, 2024
fe2a0c3
Update to 4k seq length + lower pages + adjust tokenizer.
Mar 24, 2024
fca0dd4
Pass pad token id to avoid instantiating new tokenizer every loss com…
Mar 24, 2024
732f904
Add Model Criteria for block 0 and improve logging.
Mar 24, 2024
4309982
Calculate average loss correctly in log_step.
Mar 25, 2024
e8bfe81
Move to GPT4 tokenizer instead of GPT3_5.
Mar 27, 2024
0771aaa
Push switchover block out by a week.
Mar 27, 2024
c0cf96c
Merge pull request #88 from RaoFoundation/update_tokenizer
Mar 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update to 4k seq length + lower pages + adjust tokenizer.
  • Loading branch information
Sid committed Mar 24, 2024
commit fe2a0c39c8b27bc3edceb6cbcc4df65e7da81322
8 changes: 4 additions & 4 deletions constants/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,10 @@
SUBNET_UID = 9
# The root directory of this project.
ROOT_DIR = Path(__file__).parent.parent
# Block at which 7b models, 8192 sequence lengths, new tokenizer, bfloat16, and flash attention are used.
# Block at which 7b models, 4096 sequence lengths, new tokenizer, bfloat16, and flash attention are used.
BLOCK_7B = 2_735_661
SEQUENCE_LENGTH_1 = 1024
SEQUENCE_LENGTH_2 = 8192
SEQUENCE_LENGTH_2 = 4096
# A mapping of block numbers to the supported model types as of that block.
ALLOWED_MODEL_TYPES_1 = {
GPT2LMHeadModel,
Expand Down Expand Up @@ -75,7 +75,7 @@
max_model_bytes=15 * 1024 * 1024 * 1024,
max_model_parameters=6_900_000_000,
allowed_model_types=ALLOWED_MODEL_TYPES_2,
tokenizer_identifier=TokenizerIdentifier.GPT3_5_TURBO_16K,
tokenizer_identifier=TokenizerIdentifier.GPT3_5_TURBO,
),
),
]
Expand All @@ -97,7 +97,7 @@
# validator score boosting for earlier models.
timestamp_epsilon = 0.005
# validators number of pages to eval over miners on each step.
n_eval_pages = 24
n_eval_pages = 12
# validator eval batch size.
batch_size = 1
# validator eval batch min to keep for next loop.
Expand Down
2 changes: 1 addition & 1 deletion model/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class TokenizerIdentifier(IntEnum):
"""Identifiers the tokenizer to use. This may mean different tokenizers or different implementations."""

DISTILGPT_2 = 1
GPT3_5_TURBO_16K = 2
GPT3_5_TURBO = 2


@dataclasses.dataclass()
Expand Down
2 changes: 1 addition & 1 deletion pretrain/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def get_old_tokenizer(cache_dir: str = None):
def get_tokenizer(cache_dir: str = None):
"""Returns the tokenizer used by the latest models."""
tokenizer = GPT2TokenizerFast.from_pretrained(
"Xenova/gpt-3.5-turbo-16k", cache_dir=cache_dir
"Xenova/gpt-3.5-turbo", cache_dir=cache_dir
)
tokenizer.pad_token = tokenizer.eos_token
return tokenizer
4 changes: 2 additions & 2 deletions tests/model/test_model_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ class TestModelUtils(unittest.TestCase):
tokenizer_identifier=TokenizerIdentifier.DISTILGPT_2,
)
MODEL_CRITERIA_7B = ModelCriteria(
sequence_length=8192,
sequence_length=4096,
optimized=True,
max_model_bytes=15 * 1024 * 1024 * 1024,
max_model_parameters=6_900_000_000,
allowed_model_types=ALLOWED_MODEL_TYPES_2,
tokenizer_identifier=TokenizerIdentifier.GPT3_5_TURBO_16K,
tokenizer_identifier=TokenizerIdentifier.GPT3_5_TURBO,
)
model_criteria_cases = [
(2_405_920, MODEL_CRITERIA_772M),
Expand Down