-
Notifications
You must be signed in to change notification settings - Fork 380
Add audioset (WIP) #2331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: maeb
Are you sure you want to change the base?
Add audioset (WIP) #2331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. The description of task is missing.
Please remove irrelevant stuff (model checklist) from the message. I would also really like to see that a model has actually been run on the task to confirm that it works.
descriptive_stats={ | ||
"n_samples": {"test": 8961}, # Need to change | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descriptive_stats={ | |
"n_samples": {"test": 8961}, # Need to change | |
}, |
task_subtypes=[ | ||
"Environment Sound Classification" | ||
], # Since this dataset has sounds of ALL types, this seems to be the best option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task_subtypes=[ | |
"Environment Sound Classification" | |
], # Since this dataset has sounds of ALL types, this seems to be the best option | |
task_subtypes=[], |
Hmm not sure about this one
class AudioSetMultilingualClassification(AbsTaskAudioMultilabelClassification): | ||
metadata = TaskMetadata( | ||
name="AudioSet", | ||
description="Multilabel Audio Classification.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to know anything about the dataset from this
dataset={ | ||
"path": "agkphysics/AudioSet", | ||
"revision": "5a2fa42a1506470d275a47ff8e1fdac5b364e6ef", | ||
}, # this is actually used to download the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}, # this is actually used to download the data | |
}, |
date=( | ||
"2020-01-01", | ||
"2020-01-30", | ||
), # Estimated date when this dataset was committed, what should be the second tuple? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the estimated time when the data was created (sounds produced, tweets posted, images taken). In between A and B.
"2020-01-01", | ||
"2020-01-30", | ||
), # Estimated date when this dataset was committed, what should be the second tuple? | ||
domains=["Web"], # obtained from Freesound - online collaborative platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
domains=["Web"], # obtained from Freesound - online collaborative platform | |
domains=[], |
I'm not sure about this one - it's hard to say with the description, though.
Yeah, makes sense. Its currently a draft PR haven't run the actual model yet to get the numbers. Again this dataset is bigger than others so it will take some time |
Add audioset dataset part of #2319 . Also addressed #2049
Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)