-
Notifications
You must be signed in to change notification settings - Fork 380
Create Audio Event Detection Task #2338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: maeb
Are you sure you want to change the base?
Conversation
def __init__( | ||
self, | ||
num_samples: int, | ||
total_duration: float, | ||
min_duration: float, | ||
avg_duration: float, | ||
max_duration: float, | ||
sample_rate: int, | ||
min_events_per_sample: int, | ||
avg_events_per_sample: float, | ||
max_events_per_sample: int, | ||
unique_event_labels: int, | ||
event_label_distribution: dict[str, int], | ||
min_event_duration: float, | ||
avg_event_duration: float, | ||
max_event_duration: float, | ||
): | ||
self.num_samples = num_samples | ||
self.total_duration = total_duration | ||
self.min_duration = min_duration | ||
self.avg_duration = avg_duration | ||
self.max_duration = max_duration | ||
self.sample_rate = sample_rate | ||
self.min_events_per_sample = min_events_per_sample | ||
self.avg_events_per_sample = avg_events_per_sample | ||
self.max_events_per_sample = max_events_per_sample | ||
self.unique_event_labels = unique_event_labels | ||
self.event_label_distribution = event_label_distribution | ||
self.min_event_duration = min_event_duration | ||
self.avg_event_duration = avg_event_duration | ||
self.max_event_duration = max_event_duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is typed dict, you don't need __init__
def evaluate( | ||
self, | ||
model: AudioEncoder, | ||
eval_split: str = "test", | ||
*, | ||
encode_kwargs: dict[str, Any] = {}, | ||
**kwargs: Any, | ||
) -> dict[HFSubset, ScoresDict]: | ||
if not self.data_loaded: | ||
self.load_data() | ||
scores = {} | ||
hf_subsets = self.hf_subsets | ||
|
||
for hf_subset in hf_subsets: | ||
logger.info( | ||
f"\nTask: {self.metadata.name}, split: {eval_split}, subset: {hf_subset}. Running..." | ||
) | ||
|
||
if hf_subset not in self.dataset and hf_subset == "default": | ||
ds = self.dataset | ||
else: | ||
ds = self.dataset[hf_subset] | ||
scores[hf_subset] = self._evaluate_subset( | ||
model, | ||
ds, | ||
eval_split, | ||
encode_kwargs=encode_kwargs, | ||
**kwargs, | ||
) | ||
self._add_main_score(scores[hf_subset]) | ||
|
||
return scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that evaluate
is the same as in AbsTask
def fit(self, X_train: list[np.ndarray], y_train: list[list[dict]]): | ||
"""Train frame-level classifier on audio embeddings""" | ||
all_embeddings, all_labels = self._process_training_data(X_train, y_train) | ||
self._init_model(input_dim=all_embeddings.shape[1]) | ||
X_tensor = torch.tensor(all_embeddings, dtype=torch.float32).to(self.device) | ||
y_tensor = torch.tensor(all_labels, dtype=torch.float32).to(self.device) | ||
optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3) | ||
criterion = nn.BCELoss() | ||
|
||
# Training loop | ||
self.model.train() | ||
for epoch in range(10): | ||
optimizer.zero_grad() | ||
outputs = self.model(X_tensor) | ||
loss = criterion(outputs, y_tensor) | ||
loss.backward() | ||
optimizer.step() | ||
|
||
def _init_model(self, input_dim): | ||
self.model = nn.Sequential( | ||
nn.Linear(input_dim, 256), | ||
nn.ReLU(), | ||
nn.Dropout(0.2), | ||
nn.Linear(256, len(self.classes_)), | ||
nn.Sigmoid(), | ||
).to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use logreg or something simple instead of NN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the HEAR benchmark uses an NN to get per frame predictions and I thought we wanted to have evaluation similar to them, I would prefer having a simpler evaluator too
#2246
#2332
Waiting for tests to finish
Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding datasets checklist
Reason for dataset addition: ...
mteb -m {model_name} -t {task_name}
command.sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
intfloat/multilingual-e5-small
self.stratified_subsampling() under dataset_transform()
make test
.make lint
.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)