Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Steffen/cleanup #1

Open
wants to merge 386 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
386 commits
Select commit Hold shift + click to select a range
ec23196
Implement cleanup loop in validator and associated local model store …
Dec 27, 2023
8b59645
Add implementations for storing/retrieving data on chain and in Huggi…
Dec 27, 2023
5730065
Format all files for consistency. (#3)
Dec 27, 2023
beeb0c1
Refactor to use hotkeys not uids for miner identification. (#4)
Dec 27, 2023
f4a0ad3
Adds the Perf Monitor
Dec 28, 2023
3bb2289
Merge pull request #5 from RaoFoundation/perf-tracker
Dec 28, 2023
1b180c2
Merge branch 'dev' into miner_tracker
Dec 28, 2023
cac0feb
Improve model tracker comments and logging.
Dec 27, 2023
bcb801d
Delete .vscode/settings.json which is now in the ..gitignore.
Dec 29, 2023
b8a3193
Merge pull request #6 from RaoFoundation/miner_tracker
Dec 29, 2023
07eaedb
Merge pull request #7 from RaoFoundation/model_cleaner
Dec 30, 2023
fac0eb0
Add helper to get hash of directory.
Dec 30, 2023
71e789f
Add logic to redownload and get hash in upload_model.
Dec 30, 2023
9768cb7
Update to only store model for hash in a tmp folder.
Dec 30, 2023
762792d
Address PR feedback.
Dec 30, 2023
6d674ae
Merge pull request #8 from RaoFoundation/dirHash
Dec 30, 2023
3f90a28
Update the Miner
Dec 30, 2023
56e8137
Address feedback
Dec 30, 2023
b6e4e47
More PR feedback
Dec 30, 2023
2755cf9
Merge pull request #9 from RaoFoundation/miner-updates
Dec 30, 2023
f78cfd3
Update model tracker to track metadata.
Dec 30, 2023
c891add
Update validator eval loop to use new stores.
Dec 30, 2023
7f17118
Miner fixes
Dec 30, 2023
6d656fa
Merge pull request #11 from RaoFoundation/miner-fixes
Dec 30, 2023
d33d4fb
Use AutoModelForCausalLM.
Dec 30, 2023
8c1b7ac
Also update mining test to use same model type.
Dec 30, 2023
c8855a5
Merge pull request #12 from RaoFoundation/autoModelLM
Dec 30, 2023
29ae1a0
Pass netuid to the chain store
Dec 31, 2023
c94f391
Handle exceptions calculating miner losses.
Dec 31, 2023
b4d2325
Support loading a non hugging face saved model
Dec 31, 2023
d3d2f3d
Make a new wandb run for the validator if logging there.
Dec 31, 2023
38a11a5
Merge pull request #14 from RaoFoundation/miner-fixes2
Dec 31, 2023
44a3e73
Address PR fixes.
Dec 31, 2023
67574c9
Merge branch 'dev' into valEval
Dec 31, 2023
ced19a6
Add size check before downloading from hugging face.
Dec 30, 2023
cd0e717
Merge pull request #10 from RaoFoundation/valEval
Dec 31, 2023
4f773e8
Merge pull request #13 from RaoFoundation/checkRepoSize
Dec 31, 2023
7f66a39
Add checks in Model Updater for bad models.
Dec 31, 2023
b0fc67d
Merge pull request #15 from RaoFoundation/exceptOnBadModels
Dec 31, 2023
d6f2904
Improve test logging.
Dec 31, 2023
db56c27
Collected fixes.
Dec 31, 2023
de6edac
Exception handling improvements.
Dec 31, 2023
3004f06
Fix update loop sleep logic when revisiting recently.
Dec 31, 2023
928b61f
Uid state handling fixes.
Dec 31, 2023
a8e8a2f
Sleep in run step for readability.
Dec 31, 2023
3d22ac1
Align local and remote directory pathing.
Dec 31, 2023
81edb9d
Compute_losses on the pt_model not the Model.
Dec 31, 2023
695e738
Validator wandb run logging fixes.
Dec 31, 2023
359fa9c
Update comments on expected directory structure.
Dec 31, 2023
d5b7825
Merge pull request #16 from RaoFoundation/vali-fixes
Dec 31, 2023
adaf416
Add a new tool to upload a trained model
Dec 31, 2023
a3e073f
Merge pull request #17 from RaoFoundation/miner-push-only
Dec 31, 2023
88c8216
Clean-up
Dec 31, 2023
7d80098
Create a new validator wandb run every 100 run steps.
Dec 31, 2023
87d4f88
Merge pull request #18 from RaoFoundation/clean-up
Dec 31, 2023
8fdb630
Add auto-update script
Dec 31, 2023
28e5769
Fix directory hash after downloading models.
Dec 31, 2023
4b08bdf
Merge pull request #20 from RaoFoundation/auto-update
Dec 31, 2023
da72955
Merge pull request #21 from RaoFoundation/hash_location_fix
Dec 31, 2023
ee0b22e
Merge pull request #19 from RaoFoundation/new_wandb_runs
Dec 31, 2023
3b27b56
Remove unused import
Dec 31, 2023
c57f2ef
Merge pull request #22 from RaoFoundation/logs
Dec 31, 2023
9da0a5c
Split out miner/vali docs and update.
Dec 31, 2023
0278385
Improve Miner docs.
Jan 1, 2024
175a58c
Merge pull request #23 from RaoFoundation/docs
Jan 1, 2024
ae42a47
Update scoring temperature to 0.04.
Jan 3, 2024
ef67494
Merge pull request #24 from RaoFoundation/temp_update
surcyf123 Jan 3, 2024
0ddead7
Update validator score boosting of earlier models.
Jan 3, 2024
43d6a6a
Merge pull request #25 from RaoFoundation/epsilon_update
surcyf123 Jan 3, 2024
972950a
Merge pull request #26 from RaoFoundation/dev
Jan 3, 2024
93a1d98
Formatting fixes for miner docs
Jan 3, 2024
8e91a9f
Merge pull request #27 from RaoFoundation/doc-format
Jan 3, 2024
2aac764
Merge pull request #28 from RaoFoundation/dev
Jan 3, 2024
4c9f60f
Fix for pending uids to eval in next loop.
Jan 5, 2024
3d65475
Merge pull request #29 from RaoFoundation/updatedEvalCheck
Jan 5, 2024
5e2aaa9
Also update to a new uids file.
Jan 5, 2024
9958cd4
Merge pull request #30 from RaoFoundation/updatedEvalCheck
Jan 5, 2024
9cb69dc
Merge pull request #31 from RaoFoundation/dev
Jan 5, 2024
edeac8d
Realize symlinks on download from remote store.
Jan 8, 2024
0ac4c65
Update to improve error logging around failures to parse the metadata…
Jan 9, 2024
bf1dc9d
Model_id locality fix.
Jan 9, 2024
b3dde1a
Merge pull request #32 from RaoFoundation/log_improvements
Jan 9, 2024
173ad38
Merge pull request #33 from RaoFoundation/remove_symlink
Jan 9, 2024
88ee418
Merge pull request #34 from RaoFoundation/dev
Jan 9, 2024
a8abc8b
Add a notebook to check latest vali perf
Jan 13, 2024
62ca7e9
Clear all outputs
Jan 13, 2024
143f0cd
Merge pull request #35 from RaoFoundation/vali-perf
Jan 14, 2024
014531d
Increase max model size to 186M
Jan 15, 2024
960163e
Perform a full eval after vali upgrade
Jan 15, 2024
560d8e6
Make the clean loop delay larger
Jan 15, 2024
2eea7a5
Update the miner docs
Jan 15, 2024
8b46e81
Keep losses to math.inf when failing to evaluate model.
Jan 15, 2024
6647082
Merge pull request #38 from RaoFoundation/model_loss_none_fix
Jan 15, 2024
ddc0e58
Merge pull request #36 from RaoFoundation/vali-updates
Jan 15, 2024
d1d4b50
Include repo_id in error messages
Jan 15, 2024
6ed4577
Merge pull request #39 from RaoFoundation/improve-errors
Jan 15, 2024
25b91ed
Read back the metadata commit after writing
Jan 15, 2024
2300785
Merge pull request #40 from RaoFoundation/dev
Jan 15, 2024
ae65103
Merge pull request #41 from RaoFoundation/read-metadata
Jan 15, 2024
b1a0bdd
Update setup.py to point to new version location.
Jan 16, 2024
714dff7
Correct the docs
Jan 17, 2024
56b4a52
Merge pull request #37 from RaoFoundation/model-increase
Jan 17, 2024
27fa33b
Merge pull request #42 from RaoFoundation/setup_fix
Jan 17, 2024
edf58fb
Bump version
Jan 17, 2024
9c25951
Merge pull request #43 from RaoFoundation/bump-version
Jan 17, 2024
06eecdd
Merge pull request #44 from RaoFoundation/dev
Jan 17, 2024
c9ec6bc
Simplify the mining API
Jan 20, 2024
5d45fc7
Merge pull request #45 from RaoFoundation/api
Jan 20, 2024
ac31bb6
Run each eval in a subprocess to avoid a bad model being able to corr…
Feb 2, 2024
bdae9e6
Merge pull request #46 from RaoFoundation/debug
Feb 2, 2024
198e103
Remove model with inf loss
Feb 2, 2024
1f96e89
Fix dict .get()
Feb 2, 2024
45595cc
Merge pull request #47 from RaoFoundation/remove-bad-miners
Feb 2, 2024
65b29aa
Clean-up accidental test code
Feb 2, 2024
563dfdb
Merge pull request #48 from RaoFoundation/clean-up2
Feb 2, 2024
4402b91
Merge pull request #49 from RaoFoundation/dev
Feb 2, 2024
4d09328
Correctly call is_dir() method.
Feb 2, 2024
9a6695d
Add test for is_dir() behavior.
Feb 3, 2024
c563c26
Log but do not throw for expected model sync failures.
Feb 3, 2024
3ab91cd
Only keep hotkeys to be evaluated in storage.
Feb 3, 2024
c2c8f6a
Only allow at most 10 new models to be pending eval.
Feb 3, 2024
eb6b471
Merge pull request #50 from RaoFoundation/is_dir_fix
Feb 3, 2024
34e08e0
Merge pull request #51 from RaoFoundation/downgrade_model_size_log
Feb 3, 2024
7b1e494
Add lock around metagraph for sub threads and remove grace period on …
Feb 3, 2024
f877806
Merge pull request #52 from RaoFoundation/limit_stored_models
Feb 3, 2024
8dba6f3
Merge pull request #53 from RaoFoundation/limit_pending_models
Feb 3, 2024
69c2749
Only filter out uids with weights at 0 in addition to inf loss.
Feb 4, 2024
c496bf2
Merge pull request #54 from RaoFoundation/inf_and_weight_check
Feb 4, 2024
45d9bc1
Move state file to the model dir
Feb 4, 2024
bcb696e
Merge pull request #55 from RaoFoundation/perplexity
Feb 4, 2024
1f7345d
Revert "Only allow at most 10 new models to be pending eval."
Feb 4, 2024
172e4e3
Merge pull request #56 from RaoFoundation/revert-53-limit_pending_models
Feb 4, 2024
c8a9eba
Only allow at most 20 new models to be pending eval.
Feb 3, 2024
47a444c
PR Feedback.
Feb 4, 2024
c247220
Handle shutil.rmtree FIleNotFoundError.
Feb 4, 2024
a89a67f
Merge pull request #58 from RaoFoundation/shutil_exception
Feb 4, 2024
4c313ce
Merge pull request #57 from RaoFoundation/limit_pending_models
Feb 4, 2024
2d86ecd
Catch all exceptions from shutil rmtree.
Feb 4, 2024
613fe76
Merge pull request #59 from RaoFoundation/catch_all_rmtree
Feb 4, 2024
c952148
Reapply grace period of 300s.
Feb 4, 2024
56e1665
Catch exceptions in the clean-up loop.
Feb 4, 2024
47e166d
Add handling around computation of file timestamps if the file no lon…
Feb 4, 2024
f6206de
Merge pull request #60 from RaoFoundation/grace_reapply
Feb 4, 2024
7702da1
Merge pull request #61 from RaoFoundation/catch-cleanup
Feb 4, 2024
78864de
Update docs to point to the leaderboard
Feb 4, 2024
4321f85
Fix get_newest_datetime_under_path to get newest not oldest.
Feb 4, 2024
fbbd159
Merge pull request #63 from RaoFoundation/get_latest_under_path_fix
Feb 5, 2024
40f31f8
Standardize the loss function
Feb 5, 2024
5a4ebd0
Bump version
Feb 5, 2024
fb44be8
Merge pull request #66 from RaoFoundation/loss
Feb 5, 2024
71dd311
Merge pull request #65 from RaoFoundation/bump_version
Feb 5, 2024
1dffefc
Merge pull request #62 from RaoFoundation/update-docs
Feb 5, 2024
7e3b2c4
Merge pull request #67 from RaoFoundation/dev
Feb 5, 2024
430cb5a
Require models have max_position_embeddings=1024.
Feb 11, 2024
ccba669
Also reduce severity of logs when failing to download model.
Feb 11, 2024
3b2d967
Update spec version to 2.2.1 to ensure validators get new state.
Feb 11, 2024
b341ed6
Restrict model types.
Feb 11, 2024
cdb622d
Move list of allowed models to constants.
Feb 11, 2024
c8573f9
Merge pull request #69 from RaoFoundation/restrict_model_types
Feb 11, 2024
bd1f026
Merge pull request #70 from RaoFoundation/dev
Feb 11, 2024
a8da485
Update docs for allowed model types.
Feb 11, 2024
d06c77e
Merge pull request #71 from RaoFoundation/doc_update
Feb 11, 2024
e18fdd4
Add tool for running a benchmark
Feb 13, 2024
4cf9e0b
Remove test notebook
Feb 13, 2024
f94fc93
Merge pull request #72 from RaoFoundation/benchmarks
Feb 13, 2024
28a1afe
Allow larger models after a defined block
Feb 14, 2024
81c8b78
Increase max repo size
Feb 14, 2024
d8a7bdc
Add gpt2-large to benchmark
Feb 16, 2024
2234478
Merge pull request #73 from RaoFoundation/block-max
Feb 16, 2024
937afae
Merge pull request #74 from RaoFoundation/add-gpt2-large
Feb 16, 2024
7f1ec1e
Merge pull request #75 from RaoFoundation/dev
Feb 16, 2024
4e5cc6d
Update README.md
dougsillars Feb 16, 2024
7140510
Load model in the subprocess to avoid pickling
Feb 21, 2024
0f237c2
Fix missing method
Feb 21, 2024
725365f
Bump ttl to 150 seconds
Feb 21, 2024
3ab1102
Bump tranformers version
Feb 21, 2024
dd26bcc
Merge pull request #78 from RaoFoundation/bump-transformers
Feb 21, 2024
abb7496
Track total eval perf
Feb 21, 2024
6122352
Don't bump spec version
Feb 21, 2024
7c5fe35
Clean-up vali-perf notebook
Feb 21, 2024
4091575
Merge pull request #77 from RaoFoundation/qol
Feb 21, 2024
98a21b5
Revert "Merge pull request #77 from RaoFoundation/qol"
Feb 22, 2024
4fccab7
Merge pull request #80 from RaoFoundation/undo-77
Feb 22, 2024
c935923
Increase alpha. Log weight failures
Feb 22, 2024
c06992f
Merge pull request #81 from RaoFoundation/alpha
Feb 22, 2024
d2faaec
Merge pull request #79 from RaoFoundation/dev
Feb 22, 2024
5409309
Update model size on downloads based on block.
Mar 17, 2024
8c13811
Use optimizations at new block for inference.
Mar 18, 2024
c8cb2b8
Limit model types based on block.
Mar 18, 2024
e8206a7
Run inference with sequence length based on block.
Mar 18, 2024
9f8ae23
Doc updates.
Mar 18, 2024
3f5748c
Adjust temperature to prioritize top 1 model.
Mar 19, 2024
bd7501c
Adjust to only keep 10 best models + eval up to 15 new per loop.
Mar 19, 2024
90b870f
Check for updates to models with incentive first.
Mar 19, 2024
28916ff
Remove notebook and update cadence for check.
Mar 19, 2024
ea91667
Update to only 6 min, 14 max models by default.
Mar 19, 2024
c83a787
Fix docs + increase time for eval + adjust sample model parameters.
Mar 19, 2024
4957e80
Refactor to use ModelParameters + pass sequence length.
Mar 20, 2024
e520ff1
Rename to Model Criteria for clarity.
Mar 20, 2024
d8af206
Update docs to point to correct line for ModelCriteria.
Mar 20, 2024
2e9d6dd
Check generated outputs before calculating losses.
Mar 22, 2024
82e74a3
Send inputs to the same device as the model.
Mar 22, 2024
7eb4b4e
Refactor check out to a helper function.
Mar 22, 2024
1177610
Bump spec version to force reload of models.
Mar 22, 2024
6160d49
Pass tokenizer eos token id to remove warning message.
Mar 22, 2024
d80f965
Start iterator at 200 for fresh start.
Mar 22, 2024
b4d1207
Merge pull request #86 from RaoFoundation/disallow_attn
Mar 22, 2024
706f659
Merge pull request #87 from RaoFoundation/dev
Mar 22, 2024
99afe25
Update to use 6.9 params, 8192 seqeuence length, and block 2735661.
Mar 23, 2024
56a3713
Update to 24 pages and add clarify TFLOPs required.
Mar 23, 2024
0f26862
Update documentation on vali requirements and flash-attn requirements.
Mar 24, 2024
fadbe82
Merge branch 'dev' into next_milestone
Mar 24, 2024
7ae6d0c
Merge pull request #83 from RaoFoundation/next_milestone
Mar 24, 2024
5213654
Merge branch 'dev' into eval_loop_adjustments
Mar 24, 2024
3c1c44a
Merge pull request #76 from dougsillars/main
Mar 24, 2024
99e0588
Merge pull request #84 from RaoFoundation/eval_loop_adjustments
Mar 24, 2024
fd4681c
Add a new tokenizer for 7B
Mar 21, 2024
cd9819a
Bump to 6 minute timeouts and go back to random iterator start.
Mar 24, 2024
fe2a0c3
Update to 4k seq length + lower pages + adjust tokenizer.
Mar 24, 2024
fca0dd4
Pass pad token id to avoid instantiating new tokenizer every loss com…
Mar 24, 2024
732f904
Add Model Criteria for block 0 and improve logging.
Mar 24, 2024
4309982
Calculate average loss correctly in log_step.
Mar 25, 2024
e8bfe81
Move to GPT4 tokenizer instead of GPT3_5.
Mar 27, 2024
0771aaa
Push switchover block out by a week.
Mar 27, 2024
c0cf96c
Merge pull request #88 from RaoFoundation/update_tokenizer
Mar 28, 2024
18f0056
Merge pull request #89 from RaoFoundation/dev
Mar 28, 2024
d9fe3a1
Raise threshhold for unreasonable output and keep models with weights.
Mar 28, 2024
ae1fd35
Also prioritize keeping higher weights when filtering.
Mar 28, 2024
8b1e8bb
Adjust output lengths and check reptitiveness for all outputs.
Mar 29, 2024
bf5cc6e
Handle failures to load tracker state gracefully.
Mar 29, 2024
43b2428
Also test redownloading works as expected.
Mar 29, 2024
24f4b76
Merge pull request #91 from RaoFoundation/handle_corrupt_state
Mar 29, 2024
e72efee
Refactor model prioritization for clarity + correctness.
Mar 29, 2024
63271e1
Handle failures to load uids to eval state gracefully.
Mar 29, 2024
2b71a5d
Wipe tracker state in case of no uids to eval.
Mar 29, 2024
8a73df8
Also wipe the state in case of multiple bad restarts.
Mar 29, 2024
63bd73e
Merge pull request #90 from RaoFoundation/improve_model_check
Apr 1, 2024
d0716fd
Merge pull request #93 from RaoFoundation/eval_state
Apr 1, 2024
fee6b41
Retry evaluation for discarded models with incentive periodically.
Apr 1, 2024
9347268
Merge pull request #94 from RaoFoundation/retry_incentive
Apr 1, 2024
2c377cf
Merge pull request #95 from RaoFoundation/dev
Apr 1, 2024
9a6e0c0
Initialize uids_to_eval as set().
Apr 2, 2024
5a36e47
Fix docstring
steffencruz Apr 3, 2024
feb620c
Enable uploading a model with bfloat 16.
Apr 12, 2024
1e6e6ef
Add 7b models to the benchmark script
Apr 12, 2024
7b8e7b5
Default to upload with b16 for manual upload.
Apr 12, 2024
b1247e8
Merge pull request #96 from RaoFoundation/type_fix
Apr 12, 2024
6948e7a
Merge pull request #100 from RaoFoundation/benchmark-7b
Apr 12, 2024
57e5f82
Merge pull request #98 from RaoFoundation/upload_arg_opt
Apr 12, 2024
2477d4a
Merge branch 'dev' of github.com:RaoFoundation/pretraining into steff…
steffencruz Apr 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
# VS Code
.vscode/

test-models/

# Exclude the Miner's directory for saving the models.
local-models/
neurons/pretraining_model/


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand All @@ -9,7 +18,6 @@ neurons/wandb/

# C extensions
*.so
**.ipynb

# Distribution / packaging
.Python
Expand Down Expand Up @@ -82,6 +90,7 @@ target/

# Jupyter Notebook
.ipynb_checkpoints
*.ipynb

# IPython
profile_default/
Expand Down
222 changes: 21 additions & 201 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,237 +1,57 @@
<div align="center">

# **Bittensor Training Subnet** <!-- omit in toc -->
# **Bittensor Pretrain Subnet** <!-- omit in toc -->
[![Discord Chat](https://img.shields.io/discord/308323056592486420.svg)](https://discord.gg/bittensor)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## Bittensor Incentivized Pretraining<!-- omit in toc -->

[Discord](https://discord.gg/bittensor) • [Network](https://taostats.io/) • [Research](https://bittensor.com/whitepaper)
[Leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard) • [Discord](https://discord.gg/bittensor) • [Network](https://taostats.io/subnets/netuid-9/) • [Research](https://bittensor.com/whitepaper)
</div>

---

# Introduction

Bittensor subnet 9 rewards miners for producing pretrained models of structure GPT2 on the Falcon Refined Web dataset. It acts like a continuous benchmark where miners are paid out for attaining the best losses on randomly sampled pages of that dataset. The reward mechanism works as follows:

1. Miner train and periodically host their model weights on a wandb account which is linked to their miner through the neurons/miner.py code.
2. Validators periodically check and pull the latest hosted models.
3. Validators run a continuous eval on pulled models and perform the validation system outlined in neurons/validator.py

## Validation

Miners are evaluated based on the number of times their loss on a random batch duing a 360 block epoch are lower than all other miners.
To perform well, miners must attain the lowest loss on the largest number of random batches sampled from the 900M page, 3T token dataset Falcon Refined Wed.

All models are open and accessible via a wandb [project](https://wandb.ai/opentensor-dev/openpretraining) and this repo contains tools for downloading them and then
serving them on your own miner. The drive to find the best miner at the earliest rate is ensured by having validators record the best global miner per epoch and assigning
a miner ```epsilon``` reduction on the loss of this miner when calculating wins per batch.

A Psuedo code for the algorithm can be read bellow:
```python
epsilon = 0.03 # best miner epsilon reduction.
while True:
wins = {} # Count of wins per batch per miner

# Run continous scoring until the epoch is over.
while epoch_not_over( block )
The following documentation assumes you are familiar with basic Bittensor concepts: Miners, Validators, and incentives. If you need a primer, please check out https://docs.bittensor.com/learn/bittensor-building-blocks.

# Fetch random sample of batches to evaluate models on
batches = get_random_sample_of_batches_from_falcon()

# Fetch and or update models during this step.
models = get_and_update_models_from_miners()
Bittensor subnet 9 rewards miners for producing pretrained Foundation-Models on the Falcon Refined Web dataset. It acts like a continuous benchmark whereby miners are rewarded for attaining the best losses on randomly sampled pages of Falcon given a consistent model architecture. The reward mechanism works as follows:

# Compute losses for each batch on subset and count wins per miner
for batch in batches:
1. Miners train and periodically publish models to hugging face and commit the metadata for that model to the Bittensor chain.
2. Validators download the models from hugging face for each miner based on the Bittensor chain metadata and continuously evaluate them, setting weights based on the performance of each model against the Falcon dataset. They also log results to [wandb](https://wandb.ai/opentensor-dev/pretraining-subnet).
3. The Bittensor chain aggregates weights from all active validators using Yuma Consensus to determine the proportion of TAO emission rewarded to miners and validators.

# Find miner with lowest loss on the batch.
for miner_uid, model in enumerate( models ):
loss = get_loss_for_model_on_batch( model, batch )
if miner_uid == epoch_global_min_uid: loss *= epsilon
if loss < best_loss:
best_uid = miner_uid
best_loss = loss

# Increment the number of wins for the miner with the lowest loss on this subnet.
wins[ best_uid ] += 1

# Assign epoch_global_min_uid to miner uid with lowest loss across all epoch batches.
# This miner now attains a single epoch advantage for attaining the lower lost first.
epoch_global_min_uid = get_miner_with_lowest_loss_on_all_epoch_batches()

# End epoch.
# Weights are computed based on the ratio of wins a model attains during the epoch.
weights = zeros()
for miner_uid in wins.keys()
# Adds a communistic +1 score for all active miners.
weights = (wins[miner_uid] + 1)/ sum(wins.values())

# Set weights on the chain.
set_weights( weight )
```
See the [Miner](docs/miner.md) and [Validator](docs/validator.md) docs for more information about how they work, as well as setup instructions.

---

## Installing
## Incentive Mechanism

Before continuing, make you have at least python3.8. If you have issues installing python on your machine I recommend using conda as explained [here](https://bittensor.com/documentation/getting-started/installation). Once python is install, install *this* repository as below:
```bash
# Installs this local repository using python.
git clone https://github.com/unconst/pretrain-subnet.git
cd pretrain-subnet
python -m pip install -e .
```

---
Bittensor hosts multiple incentive mechanism through which miners are evaluated by validators for performing actions well. Validators perform the process of evaluation and 'set weights', which are transactions into Bittensor's blockchain. Each incentive mechanism in Bittensor is called a 'subnet' and has an identifier (This particular mechanism has subnet uid 9). Weights and the amount of TAO held by the validators become inputs to Bittensor's consensus mechanism called Yuma Consensus. YC drives validators towards a consensus, agreement about the value of the work done by miners. The miners with the highest agreed upon scores are minted TAO, the network digital currency.

## Subtensor

Your node will run better if you are connecting to a local Bittensor chain entrypoint rather than using Opentensor's.
We recommend running a local node as follows and passing the ```--subtensor.network local``` flag all following commands i.e. for miners + validators.
To install and run a local subtensor node follow the commands below with [Docker and Docker-Compose](https://docs.docker.com/engine/install/) previously installed.
```bash
# Installs your local subtensor chain endpoint and runs it on your machine.
git clone https://github.com/opentensor/subtensor.git
cd subtensor
docker compose up --detach
```
Miners within this subnet are evaluated based on the number of times the model they have hosted has a lower loss than another model on the network when randomly sampling from the near infinite Falcon Refined Web pretraining dataset. To perform well, miners must attain the lowest loss on the largest number of random batches. Finding the best model and delta at the earliest block ensures the most incentive.

---

## Registration
## Getting Started

Miners + validator require a Bittensor coldkey and hotkey pair registered to netuid 9 before they can participate in the network.
To create a wallet for either your validator or miner run the following command in your terminal. Make sure to save the mnemonic for
both keys and store them in a safe place.
```bash
# Creates your miner/validator cold + hotkey keys.
btcli w create --wallet.name ... --wallet.hotkey ...
btcli w list # to view your created keys.
```
TL;DR:
1. [Chat](https://discord.gg/bittensor)
2. [Leaderboard](https://huggingface.co/spaces/RaoFoundation/pretraining-leaderboard)

Registering a miner or a validator on subnet 9 requires the participant `recycle` TAO to pay for entrance. To register your key run the
following command. Before continuing make sure you have enough TAO to register.
```bash
# Registers your cold and associated hotkey to netuid 9.
btcli s register --wallet.name ... --wallet.hotkey ... --netuid 0
```
---
This repo's main conversation is carried out in the Bittensor [Discord](https://discord.gg/bittensor). Visit the 'pretraining' channel to ask questions and get real time feedback. You can view the ongoing running of the incentive mechanism, the best miners (see 'incentive'), the most in consensus validators (see 'vtrust') using this [taostats link](https://taostats.io/subnets/netuid-9/). The table shows all 256 participant UIDs with corresponding YC stats and earnings.

## Wandb
See [Miner Setup](docs/miner.md#getting-started) to learn how to set up a Miner.

Miner and validators make **heavy use** of weights and biases (wandb) in order to share model state and validation information. Both miners and validators must attain
a wandb account from [wandb](https://wandb.ai/home) along with their wandb api key which can be found by following the instructions [here](https://docs.wandb.ai/quickstart).

Models hosted by miners and corresponding validator information for runs can be found in this open wandb [project](https://wandb.ai/opentensor-dev/openpretraining). You can get access to all valid, signed and recent miners runs from other participants on the network as follows:

```python
>>> import pretrain
>>> import bittensor as bt
>>> meta = bt.subtensor(network = 'local' ).metagraph(9)
# Get all valid runs.
>>> miner_runs = pretrain.get_miner_runs( meta )
{
238: {
'uid': 238,
'hotkey': '5CchHAvd95HtTaxfviiC36wt1HFXU73Xq9Aom7NDZJnAiG8v',
'emission': 0.02,
'run': <Run opentensor-dev/openpretraining/63j2ps12 (finished)>,
'model_artifact': <File model.pth () 312.5MiB>,
'timestamp': 1699448922
},
239: {
'uid': 239,
'hotkey': '5CSczy1dp4EpvLARaVbgvq8DST6oJgqmSTTQJZ8iXhJpKwdZ',
'emission': 0.01,
'run': <Run opentensor-dev/openpretraining/qp0w790l (finished)>,
'model_artifact': <File model.pth () 312.5MiB>, 'timestamp': 1699448504
}
...
# Download model from run 1
>> model = pretrain.model.get_model()
>> miner_runs['5CchHAvd95HtTaxfviiC36wt1HFXU73Xq9Aom7NDZJnAiG8v']['model_artifact'].download( replace=True, root=<path to model>)
>> model_weights = torch.load( <path to model> )
>> model.load_state_dict( model_weights )
```

You can download all validation data from wandb which can be used to evaluate how miners are performing on each individual page of the Falcon Refined Web dataset.
```python
>>> import pretrain
>>> import bittensor as bt
>>> meta = bt.subtensor(network = 'local' ).metagraph(9)
# Get all valid runs.
>>> vali_runs = pretrain.get_validator_runs( meta )
{
240: {
'uid': 238,
'hotkey': '5CchHAvd95HtTaxfviiC36wt1HFXU73Xq9Aom7NDZJnAiG8v',
'stake': 123121,
'run': <Run opentensor-dev/openpretraining/63j2ps12 (finished)>,
},
...
}
dataframe = vali_runs[240]['run'].history()
...
```

---

## Mining

The mining script can be run in multiple ways. In all cases, it uploads a model to wandb which will be evaluated by validators.

By default, the miner trains from scratch and posts the model periodically as its loss decreases on Falcon.
```bash
python neurons/miner.py --wallet.name ... --wallet.hotkey ... --num_epochs 10 --pages_per_epoch 5
```

Alternatively, you can scrape a model from an already performing miner on the network by passing its run id. This starts the training process from the checkpoint of another
miner. See this [page](https://wandb.ai/opentensor-dev/openpretraining) for the run_ids of other miners or use the above tools.
```bash
python neurons/miner.py --wallet.name ... --wallet.hotkey ... --num_epochs 10 --pages_per_epoch 5 --load_run_id ...
```

The miner can automatically search the for runs which perform well directly from wanbd. Using the best scored model as its initial checkpoint. The pretraining
subnet is *PRO* model sharing. We recommend miners scrape other participants models and often.
```bash
python neurons/miner.py --wallet.name ... --wallet.hotkey ... --num_epochs 10 --pages_per_epoch 5 --load_best
```

Passing the ```--device``` option allows you to select which GPU to run on. You can also use ```--continue_id``` to continue from a training run you have already started.
The model you train will be hosted on wandb. You can always view this model and others by visiting https://wandb.ai/opentensor-dev/openpretraining/runs/ where all participant
model are shared.

We strongly recommend you read, understand and adapt the miner code to your needs by reading ```neurons/miner.py```. For all serious attempts to get emission on this subnet you will
likely NEED to do this.
See [Validator Setup](docs/validator.md#getting-started) to learn how to set up a Validator.

---

## Validating
## Feedback

Validators can be run as follows. Pass you wallet hotkey and coldkey to the script. Note validation required you have a working GPU.
In version release/2.0.1 you need a GPU with atleast 20GB of RAM.
We welcome feedback!

```bash
python neurons/validator.py
--wallet.name YOUR_WALLET_NAME
--wallet.hotkey YOUR_WALLET_HOTKEY
--device YOUR_CUDA DEVICE
--wandb.on
```
---

# Auto-update PM2 + CRON

```bash
echo '* * * * * git -C <path to pretrain-subnet repo> pull' >> /etc/crontab
pm2 start neurons/validator.py --name sn9_validator --interpreter python3 --watch -- --wallet.name my_wallet ...
pm2 start neurons/miner.py --name sn9_miner_1 --interpreter python3 --watch -- --wallet.name my_wallet ...
pm2 start neurons/miner.py --name sn9_miner_2 --interpreter python3 --watch -- --wallet.name my_wallet ...
```
If you have a suggestion, please reach out on the Discord channel, or file an Issue.

---

Expand Down
Loading