HM-Conformer

A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Introduction

Pytorch code for following paper:

Title : HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods
- Accepted at ICASSP 2024 (change URL later): https://arxiv.org/pdf/2309.08208.pdf
Autor : Hyun-seo Shin*, Jungwoo Heo*, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu

Abstract

Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems.

Prerequisites

Datasets setting

We used ASVspoof 2019 LA and ASVspoof 2021 DF datasets to train and evaluate our proposed method.

Once you have datasets ready, you should run the code below to augment the data.

Write your path in data_prepare.py and make_metadata.py

# data_prepare.py line 117
# you need to write your path of ASVspoof 2019
  	YOUR_ASVspoof2019_PATH = {YOUR_ASVspoof2019_PATH}   # '/ASVspoof2019'
    path_train = YOUR_ASVspoof2019_PATH + '/LA/ASVspoof2019_LA_train'
   
# make_metadata.py line 51
    YOUR_ASVspoof2019_PATH = {YOUR_ASVspoof2019_PATH}   # '/ASVspoof2019'
    path_meta = '/LA/ASVspoof2019_LA_cm_protocols/ASVspoof2019.LA.cm.dev.trl.txt'

Run data_prepare.py and make_metadata.py in terminal

# One by one
python data_prepare.py
python make_metadata.py

Environment setting

We used 'nvcr.io/nvidia/pytorch:22.08-py3' image of Nvidia GPU Cloud for conducting our experiments.
Python 3.8.12
Pytorch 1.13.0+cu117
Torchaudio 0.13.0+cu117
ffmpeg

Make dockerfile image

# build docker img
# run at /~/HM-Conformer
./docker/build.sh

Run docker image

sudo docker run --gpus all -it --rm --ipc=host -v {PATH_DB}:/data -v \
{PATH_HM-Conformer}/env202305:/environment -v \
{PATH_HM-Conformer}/env202305/results:/results -v \
{PATH_HM-Conformer}/exp_lib:/exp_lib -v \
{PATH_HM-Conformer}:/code env202305:latest

#	CAUTION! You need to write your path
# PATH_DB
#   |- ASVspoof2019
#   |      |- LA
#   |- ASVspoof2021_DF
#          |- ASVspoof2021_DF_eval
#          |- keys

Run experiment

Set system arguments

First, you need to set system arguments. You can set arguments in arguments.py. Here is list of system arguments to set.

1. 'usable_gpu'	 : {Available_GPUs}
	'usable_gpu' is the order of the GPUs you have available.
	input type is str # ex) '0,1'

	CAUTION! You need to use 2 or more GPUs

2. 'TEST'        : True or False
	'TEST' is the factor that determines whether you use inference or learning.
        Set it to 'True' if you only want to infer, or 'False' if you want to train.
	input type is bool

Additional logger

We have a basic logger that stores information in local. However, if you would like to use an additional online logger (wandb or neptune):

In arguments.py

# Wandb: Add 'wandb_user' and 'wandb_token'
# Neptune: Add 'neptune_user' and 'neptune_token'
# input this arguments in "system_args" dictionary:
# for example
'wandb_group'   : 'group',
'wandb_entity'  : 'user-name',
'wandb_api_key' : 'WANDB_TOKEN',

'neptune_user'  : 'user-name',
'neptune_token' : 'NEPTUNE_TOKEN'

In main.py

# Just remove "#" in logger

# logger
builder = egg_exp.log.LoggerList.Builder(args['name'], args['project'], args['tags'], 	
                                         args['description'], args['path_scripts'], args)
builder.use_local_logger(args['path_log'])
# builder.use_neptune_logger(args['neptune_user'], args['neptune_token'])
# builder.use_wandb_logger(args['wandb_entity'], args['wandb_api_key'], 
# 												 args['wandb_group'])
logger = builder.build()
logger.log_arguments(experiment_args)

Now you can run our code!

Run main.py in docker container.

python /code/hm_conformer/main.py

Citation

Please cite this paper if you make use of the code.

# add later...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HM-Conformer

A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Introduction

Abstract

Prerequisites

Datasets setting

Environment setting

Run experiment

Set system arguments

Additional logger

Now you can run our code!

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

HM-Conformer

A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

Introduction

Abstract

Prerequisites

Datasets setting

Environment setting

Run experiment

Set system arguments

Additional logger

Now you can run our code!

Citation