Skip to content

Latest commit

 

History

History
68 lines (47 loc) · 3.13 KB

README.md

File metadata and controls

68 lines (47 loc) · 3.13 KB

Bias Correction with Pre-trained Audio Embeddings

Implementation of different bias correction methods for pretrained audio embeddings proposed in the following paper:

Changhong Wang, Gaël Richard, and Brian McFee. "Transfer Learning and Bias Correction with Pre-trained Audio Embeddings". Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, 2023.

(Ps: deem in deem.py is an acronym for "debiasing embeddings")

Content

Installation

We recommend using Conda environment:

git clone https://github.com/changhongw/audio-embedding-bias.git
conda env create -f environment.yml
conda activate embedding-bias

Datasets

Download IRMAS and OpenMIC datasets and save in directories data/irmas and data/openmic-2018, respectively.

Pre-trained embeddings

Extract VGGish, OpenL3, and YAMNet embeddings for both datasets. Or use our extracted pre-trained embeddings directly.

Bias correction

Run the note books in notebooks:

  • 0_data_distribution.ipynb: investigate the distribution of each dataset in terms of genre distribution and number of samples per class
  • 1_debias_linear.ipynb: linear bias correction (original, LDA, mLDA)
  • 2_debias_nonlinear.ipynb: nonlinear bias correction (K, KLDA, mKLDA)
  • 3_cosine_similarity.ipynb: calculate cosine similarity between dataset separation and instrument classification; check matrix rank for the case of multiple bias correction
  • 4_result_summary.ipynb: summarize results from all bias correction methods

Note

Thanks to Jayeon Yi, we notice two typos in the paper, i.e. the dimensionality of $W$ and $U$ in Equation (3). We correct them as following:

  • $W\in\mathbb{R}^{D\times G}$ -> $W\in\mathbb{R}^{G\times D}$
  • $U\in\mathbb{R}^{D\times G}$ -> $U\in\mathbb{R}^{G\times G}$

Contact

For any questions, support, or inquiries, please feel free to contact [email protected].

Cite

Please cite the following paper if you use the code provided in this repository.

Changhong Wang, Gaël Richard, and Brian McFee. "Transfer Learning and Bias Correction with Pre-trained Audio Embeddings". Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference, 2023.

@inproceedings{wang2023transfer,
    author = {Changhong Wang and Gaël Richard and Brian McFee},
    title = {Transfer Learning and Bias Correction with Pre-trained Audio Embeddings},
    booktitle = {Proceedings of the International Society for Music Information Retrieval (ISMIR) Conference},
    year = 2023,
}