Skip to content

Releases: macrocosm-os/data-universe

Announcing Release (Hotfix) 1.6.3

23 Sep 18:39
8ff6740
Compare
Choose a tag to compare

Announcing Release (Hotfix) 1.6.3

A few miners have reported an issue when trying to upload data into HF using the default script (division by zero). This issue has now been fixed.

Additionally, we have created new documentation files to provide a clearer understanding of the current scoring system and future updates.

DatasetCard | HF validation process

19 Sep 20:52
f2712bd
Compare
Choose a tag to compare

Changes for Miners:

  • Miners are to create a dataset card (short description of the datasets on HuggingFace)
  • We have made some performance improvements for the uploading system
  • Miners can upload the dataset in offline mode again (to test the scripts)
  • Miners now store encoding keys for the datasets in HFMetadata for easy access by the validators

Changes for Validators:

  • Validators can query the HFMetaData for dataset validation
  • Validators validate HF datasets once per 55k blocks (once per upload)

Fixed issues related to uploading data into HF

06 Sep 16:35
cb3bbaa
Compare
Choose a tag to compare

Updates:

Fixed issues that were reported by miners who uploads data into HF ( thanks guys)
Added test case to validate X dataset ( it also includes my encoding key to make it transparent).

Coming soon:
Finish validation process for HF ( for X it's 90% done, for reddit same)
Update the HF upload script to upload data in not chronological order and avoid duplication of data.
Finalization of the scoring system for validators (please await all math and rules soon).
New data source is developed for image datasets that can be used in vision models. We need to know how legal it is to scrape/share images from the new source.
DynamicDesirability ( details coming soon)
Utility scripts.
Long-term solution for Twitter validation ( I'll share the results of the R&D of official API)

1.6.0

29 Aug 12:54
99d1bdc
Compare
Choose a tag to compare

Changes for Miners:

Dataset UID is now generated using the miner's hotkey.
If a miner is deregistered and registers again with the same hotkey, they can use previously uploaded datasets associated with that hotkey.
Miners will encode the URL and username to adhere to data sharing policies, and provide keys to validators to decode these for dataset validation.
Miners upload data in 10 chunks (1 chunk = 1 million rows from the database) to save disk space on their machines.
Miners now store the last few rows of the data uploaded to HF. In future uploads, they will only upload newly scraped data (the previous data is still saved in the HF dataset).
It's been proven that our miners can utilize our previous script to upload 400 million rows of data into HF without any issues. For initial (first) uploads, it will select 400 million rows. In the future, they will upload data scraped since the previous upload.
HF uploads for offline mining are not allowed, to prevent data duplication in HF.

Encoding/Decoding URLs and Usernames:
Miners will create an encoding key. By default, it's stored in the hf_utils folder, but you can select the location. Make sure not to change the key, as validators will query it to decode the data. Note that only validators are able to do this, and the data is "dehydrated" otherwise, so it can't be stolen and used in this subnet.
Changes for Validators:
While full validation functionality hasn't been developed yet, you can see the test cases for how this "simplified" validation works:
Validators query the encoding key and HuggingFace metadata once per upload( 50k blocks).
Optimal query frequency is being tested and subject to change.
Validators select 10 completely random files from the entire miner dataset.
If Hugging Face validation is successful, the miner's credibility will increase; if not, it will decrease.

Coming Soon:
Finalization of the scoring system for validators (please await all math and rules soon).
New data source is developed for image datasets that can be used in vision models. We need to know how legal it is to scrape/share images from the new source.
DynamicDesirability ( more details coming soon)
Utility scripts.
Long-term solution for Twitter validation.

Bump bittensor version to 6.9.4. Add WandB Validator Scraper Tag

20 Aug 18:29
2738f19
Compare
Choose a tag to compare

Bump bittensor version to 6.9.4.
Add WandB Validator Scraper Tag

Switch actor to apidojo.

20 Aug 11:31
9eba003
Compare
Choose a tag to compare

We switched the actors from microworlds to apidojo.

Switch the apify actor

18 Aug 14:58
1ac58ab
Compare
Choose a tag to compare

We switched the apify actor from apidojo to microworlds, which is functional.

Release(hotfix) v1.5.9

31 Jul 22:44
d9e4748
Compare
Choose a tag to compare

Release( hotfix) v1.5.9

  1. Fixed the bug with content bytes validation for X data ( "The claimed bytes are too big compared to the actual tweet." error)

Release v1.5.8

31 Jul 16:07
094cd33
Compare
Choose a tag to compare

Announcement. Release v1.5.8

  • Fixed the issue with hashtag validation of X tweets.
    Sometime apify actor couldn’t fetch all hashtags from the tweets, it’s fixed now.
    Validators with autoupdate will update their code within 15 minutes. We recommend to restart validators with newest version.
    Happy scrapping!

v1.5.7: Merge pull request #266 from macrocosm-os/dev

17 Jul 14:30
92734e6
Compare
Choose a tag to compare

As Macrocosmos continues to grow, we need to start hosting our own infrastrucutres. As such, we are moving to our own Wandb account! https://wandb.ai/macrocosmos/projects