Releases: macrocosm-os/data-universe
v1.5.6: Merge pull request #264 from macrocosm-os/dev
Announcing Release 1.5.6: Enhanced HuggingFace Integration and Validator Updates
Hey everyone! 🚀 Excited to share our latest improvements in Release 1.5.6. Here’s what’s new:
What’s New:
• HFMetaData Querying: Validators can now query HFMetaData directly from miners
• NOTE: Validator HFMetaData Querying will NOT affect validation scoring and rewards. This will be gradually implemented in future sprints
• Miner Updates: Enhanced to support validator queries from HFMetaData table
• Logs updates: Added more logs to have a better visibility on HF upload operations.
• Enhanced HuggingFace Storage: New unique identifier system for multiple miners
• The datasets names will be displayed on the databox dashboard( soon)
Coming Soon:
• Revamped reward system for public data uploads
• Improve Validation Test Cases
• Exploring new data sources
• Aiming to become top open-source data collecting community
• Developing utility scripts for researchers
Notes:
The validators will start querying miners beginning on July 22nd, so everyone has time to update. If you don't use HF, that will not affect you at all.
Happy mining, everyone! ⛏️ Let’s keep pushing our data universe forward! 🌠
HuggingFace Integration!
Hello, everyone!
We're excited to announce a new feature that integrates HuggingFace functionality into our network! Starting today, you can now upload your scraped data directly to HuggingFace datasets using the --huggingface flag when starting your script. This addition is aimed at enhancing our data handling capabilities and providing you with more flexibility in managing your data.
Getting Started
To make use of this new feature, please refer to our detailed setup documentation here: https://github.com/macrocosm-os/data-universe/blob/main/docs/validator.md . This guide will walk you through the process step-by-step, ensuring a smooth transition.
No Impact on Current Validation
Please note, the introduction of this feature will not impact current validation processes. All existing data validation protocols remain in effect without any changes.
Additional Requirements
The new HuggingFace functionality requires a small increase in RAM (up to 100 megabytes).
Additional storage: The compression ratio of SQL databases to parquet files is typically 1:10. Therefore, we recommend having an additional 10% of free disk space relative to the size of your current database.
Upcoming Rewards
While this release does not immediately affect validation protocols, we are pleased to inform you that in the coming weeks, we will introduce additional rewards for miners who upload their data to HuggingFace. We highly recommend taking a look at the new system to maximize your benefits once these rewards are available.
We believe this new functionality will greatly benefit our network and its participants.
Hot fix, skip is_retweet field from XContent
Skip is_retweet field from XContent
Remove is_retweet field from XContent
Remove is_retweet field from XContent. Add new flag to not setting the weights for validators.
Validators will cease to validate retweets created after 10th June 2024
We have decided that, effective Monday, validators will cease to validate retweets created after 10th June 2024
Reason for Change: Throughout the operation of this subnet, some miners have scraped and stored retweets. Personally, I do not consider a retweet a valid type of data for our network, however, we recognize that it would be unfair to those miners who have stored these data while complying with all network rules.
To Ensure Fairness: Retweets will continue to be validated for an additional 30 days, until July 6th. This will provide all miners time to adapt to the new regulations.
v1.5.1: Merge pull request #251 from macrocosm-os/dev
- Enhanced Logging: Added the version of the validator to Weights & Biases logs, improving traceability and monitoring.
- Bug Fixes: Addressed issues with hashtag processing to ensure accuracy in data handling.
- Scoring Adjustment: Retweets will be scored normally for the next 30 days.
Release 1.5
New Actor: ApiDojo. ApiDojo has been introduced as a replacement for the previous Microworlds actor (https://console.apify.com/actors/61RPP7dywgiy0JPD0) .
The Microworlds actor remains available. Should Apify resolve existing issues with it, you may continue to use Microworlds as well.
New miners dashboard: https://shorturl.at/Ca5uu
Release 1.3.9
Subnet improvements:
- Improves the robustness of the miner score computation. Miner scores no longer use an exponential moving average and credibility is adjusted for index size increases.
Release 1.3.8.1
Address edge case for miners returning 128MB buckets.
Release 1.3.8
This release fixes an exploit where fake unique data was not punished harshly enough.
Although this fix prevents new abuse and would eventually remove the bad actors, we have also explicitly blacklisted them to ensure we get back to a healthy subnet state as soon as possible.
We have also decided to increase the duration of the immunity period to help new miners stay registered. This will come with an increase in minimum registration cost as well.
Validators:
- Buckets are chosen for validation based on uniqueness weighted sizes instead of raw sizes.
Subnet:
-
Increased weight of credibility in determining scores.
-
Fixed data desirability lookup reddit labels.
-
Immunity_period increased from 9000 blocks to 12000 blocks (~30 hours -> ~40 hours).
-
Min_burn increased from 0.1 to 0.2 Tao.