Skip to content

FOA-MEIR dataset: multi-environment impulse response recordings with a first-order ambisonic microphone

Notifications You must be signed in to change notification settings

nttrd-mdlab/seld-foa-meir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FOA-MEIR dataset

DOI

FOA-MEIR is an impulse response (IR) dataset recorded in over 100 environments for use in sound event localization and detection (SELD) tasks. This dataset is set up to develop a robust SELD system in an unknown environment, and the IRs for the inferred environment are recorded at a different location from that of training data. The dataset also contains dry source recordings that can be combined with IR recordings to generate audio clips for training the SELD task.

Download

You can download the dataset here.

Details of dataset

The dataset has following folder structure:

FOA_MEIR_Dataset
├── placeinfo.csv
├── DrySounds
│   ├── eventsources
│   ├── soundeventinfo.csv
│   └── swept_sine_20ms.wav
├── IRrecordings
│   ├── anechoic
│   ├── echo
│   │   ├── anechoic
│   │   ├── reverb-s
│   │   └── test
│   ├── reverb-c
│   ├── reverb-s
│   └── test
└── Noise
    ├── reverb-c
    ├── reverb-s
    └── test

IRrecordings

This folder contains IR recordings. All IR recordings were recorded with a 4-channel FOA microphone (A-Format), Sennheiser AMBEO VR MIC. The sampling rate was 48 kHz in all recordings. Recording of IR was carried out by using two times synchronous addition of TSP signals of 131,072 samples in signal length. In all recordings, the noise level of the circumference was monitored, and SNR over 30 dB was ensured.

The naming convention for IR recording is as follows.

imp_d[distance]_azi[azimuth angle]_ele[elevation angle]_pid[place id].bin

Here, distance, azimuth angle and elevation angle are the relative position of sound source related to microphones. The place id is the index of the recording position. Please refer to placeinfo.csv to see what kind of environment each place id refers to.

The IR recordings consist of the five subsets shown in the table below. See Sec. 4.1 in [1] for details on each subset.

Subset Anechoic Reverb-S Test Echo Reverb-C
# of environment 1 96 5 102 2
# of IR / environment 216 3 216 1 216
Azimuth range $[-\pi,\pi)$ $[-\pi,\pi)$ $[-\pi,\pi)$ $0$ $[-\pi,\pi)$
Azimuth interval $10^{\circ}$ random $10^{\circ}$ - $10^{\circ}$
Elevation range $[-\frac{\pi}{2},\frac{\pi}{2})$ $[-\frac{\pi}{2},\frac{\pi}{2})$ $[-\frac{\pi}{2},\frac{\pi}{2})$ $0$ $[-\frac{\pi}{2},\frac{\pi}{2})$
Elevation interval $20^{\circ}$ random $20^{\circ}$ - $20^{\circ}$
Distance [cm] 75,150 75,150 75,150 150 75,150
Noise / environment - 2.5 min 15 min - 15 min

Noise

This folder contains ambient noise that recorded at same position of Reverb-S,Test and Reverb-C, using the 4-channel FOA microphone (A-Format). Ambient noise includes air conditioning, walking, talking, etc.

The naming convention for the ambient noise recordings is as follows.

BG_pid[place id]_ch[channel of microphone].wav

Here, the place id is the index of the recording position (same with IR recordings). The channel of microphone indicates the channel number (1 to 4) of the FOA microphone. For the reverb-c, since there is two types of noise recordings, white noise and walking noise, the naming convertion is as follows:

BG_pid[place id]_whitenoise_ch[channel of microphone].wav
BG_pid[place id]_walknoise_ch[channel of microphone].wav

DrySounds

To synthesize a dataset for SELD using the above IR and ambient noise recordings, dry sounds were recorded in an anechoic room using a monaural microphone. These dry sounds contain 12 different sound event classes (see soundeventinfo.csv), and each class has 20 variations of sound.

The naming convention for the dry sound recordings is as follows.

sample[sound event index]_[sample index].wav

Here, sound event index is the index of the sound event defined in soundeventinfo.csv(1 to 12), sample index is the index of the variations of each sound event (0 to 19).

Usage

For the synthesis of SELD data sets using these impulse responses, environmental noise, and dry sound recordings, sample code will be available soon.

extract the IR data from binary file.

import utils

filepath = ".FOA_MEIR_Dataset/IRrecordings/reverb-s/imp_d150_azi100_ele0_pid0046.bin"
n_mic = 4 
irs = utils.fetch_imp(filepath,n_mic)
# irs.shape => (4,48000)

convert FOA A-format to FOA B-format.

import utils

#A.shape => (4,T): FOA A-Format signal
B = utils.convertAB(A)
# B.shape => (4,T): FOA B-Format signal

License

See this license file.

Authors and Contact

Citing this work

If you'd like to cite this work, you may use the following.

Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, “Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments,” in IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2022.

Link

Paper: arXiv

About

FOA-MEIR dataset: multi-environment impulse response recordings with a first-order ambisonic microphone

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages