Create Audio Event Detection Task #2338

anime-sh · 2025-03-12T08:13:28Z

#2246
#2332
Waiting for tests to finish

Code Quality

Code Formatted: Format the code using make lint to maintain consistent style.

Documentation

Updated Documentation: Add or update documentation to reflect the changes introduced in this PR.

Testing

New Tests Added: Write tests to cover new functionality. Validate with make test-with-coverage.
Tests Passed: Run tests locally using make test or make test-with-coverage to ensure no existing functionality is broken.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision) and
- mteb.get_model_meta(model_name, revision)
I have tested the implementation works on a representative set of tasks.

Samoed · 2025-03-12T08:18:52Z

mteb/abstasks/Audio/AbsTaskAudioEventDetection.py

+    def __init__(
+        self,
+        num_samples: int,
+        total_duration: float,
+        min_duration: float,
+        avg_duration: float,
+        max_duration: float,
+        sample_rate: int,
+        min_events_per_sample: int,
+        avg_events_per_sample: float,
+        max_events_per_sample: int,
+        unique_event_labels: int,
+        event_label_distribution: dict[str, int],
+        min_event_duration: float,
+        avg_event_duration: float,
+        max_event_duration: float,
+    ):
+        self.num_samples = num_samples
+        self.total_duration = total_duration
+        self.min_duration = min_duration
+        self.avg_duration = avg_duration
+        self.max_duration = max_duration
+        self.sample_rate = sample_rate
+        self.min_events_per_sample = min_events_per_sample
+        self.avg_events_per_sample = avg_events_per_sample
+        self.max_events_per_sample = max_events_per_sample
+        self.unique_event_labels = unique_event_labels
+        self.event_label_distribution = event_label_distribution
+        self.min_event_duration = min_event_duration
+        self.avg_event_duration = avg_event_duration
+        self.max_event_duration = max_event_duration


This is typed dict, you don't need __init__

Samoed · 2025-03-12T08:23:12Z

mteb/abstasks/Audio/AbsTaskAudioEventDetection.py

+    def evaluate(
+        self,
+        model: AudioEncoder,
+        eval_split: str = "test",
+        *,
+        encode_kwargs: dict[str, Any] = {},
+        **kwargs: Any,
+    ) -> dict[HFSubset, ScoresDict]:
+        if not self.data_loaded:
+            self.load_data()
+        scores = {}
+        hf_subsets = self.hf_subsets
+
+        for hf_subset in hf_subsets:
+            logger.info(
+                f"\nTask: {self.metadata.name}, split: {eval_split}, subset: {hf_subset}. Running..."
+            )
+
+            if hf_subset not in self.dataset and hf_subset == "default":
+                ds = self.dataset
+            else:
+                ds = self.dataset[hf_subset]
+            scores[hf_subset] = self._evaluate_subset(
+                model,
+                ds,
+                eval_split,
+                encode_kwargs=encode_kwargs,
+                **kwargs,
+            )
+            self._add_main_score(scores[hf_subset])
+
+        return scores


It seems that evaluate is the same as in AbsTask

Samoed · 2025-03-12T08:26:51Z

mteb/evaluation/evaluators/Audio/AudioEventDetectionEvaluator.py

+    def fit(self, X_train: list[np.ndarray], y_train: list[list[dict]]):
+        """Train frame-level classifier on audio embeddings"""
+        all_embeddings, all_labels = self._process_training_data(X_train, y_train)
+        self._init_model(input_dim=all_embeddings.shape[1])
+        X_tensor = torch.tensor(all_embeddings, dtype=torch.float32).to(self.device)
+        y_tensor = torch.tensor(all_labels, dtype=torch.float32).to(self.device)
+        optimizer = torch.optim.Adam(self.model.parameters(), lr=1e-3)
+        criterion = nn.BCELoss()
+
+        # Training loop
+        self.model.train()
+        for epoch in range(10):
+            optimizer.zero_grad()
+            outputs = self.model(X_tensor)
+            loss = criterion(outputs, y_tensor)
+            loss.backward()
+            optimizer.step()
+
+    def _init_model(self, input_dim):
+        self.model = nn.Sequential(
+            nn.Linear(input_dim, 256),
+            nn.ReLU(),
+            nn.Dropout(0.2),
+            nn.Linear(256, len(self.classes_)),
+            nn.Sigmoid(),
+        ).to(self.device)


Can we use logreg or something simple instead of NN?

so the HEAR benchmark uses an NN to get per frame predictions and I thought we wanted to have evaluation similar to them, I would prefer having a simpler evaluator too

cc: @Muennighoff @silky1708 @RahulSChand

anime-sh added 2 commits March 12, 2025 01:12

Create Audio Event Detection Task

cca97c9

lint

49ff0d1

Samoed reviewed Mar 12, 2025

View reviewed changes

RahulSChand added the maeb Audio extension label Mar 12, 2025

RahulSChand assigned anime-sh Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Audio Event Detection Task #2338

Create Audio Event Detection Task #2338

anime-sh commented Mar 12, 2025 •

edited

Loading

Samoed Mar 12, 2025

Samoed Mar 12, 2025

Samoed Mar 12, 2025

anime-sh Mar 12, 2025

Create Audio Event Detection Task #2338

Are you sure you want to change the base?

Create Audio Event Detection Task #2338

Conversation

anime-sh commented Mar 12, 2025 • edited Loading

Code Quality

Documentation

Testing

Adding datasets checklist

Adding a model checklist

Samoed Mar 12, 2025

Choose a reason for hiding this comment

Samoed Mar 12, 2025

Choose a reason for hiding this comment

Samoed Mar 12, 2025

Choose a reason for hiding this comment

anime-sh Mar 12, 2025

Choose a reason for hiding this comment

anime-sh commented Mar 12, 2025 •

edited

Loading