Skip to content

Python bindings

Tatiana Likhomanenko edited this page Dec 13, 2019 · 9 revisions

Featurization

Featurization module provides a bunch of classes for standard feature extraction from the audio data: Ceplifter, Dct, Derivatives, Dither, Mfcc, Mfsc, PowerSpectrum, PreEmphasis, TriFilterbank, Windowing.

All of them have the method apply which can be used to transform the input data. For example:

# imports
from wav2letter.feature import FeatureParams, Mfcc
import itertools

# read the wave


# create params struct
params = FeatureParams()
params.sampling_freq = 16000
params.low_freq_filterbank = 0
params.high_freq_filterbank = 8000
params.num_filterbank_chans = 20
params.num_cepstral_coeffs = 13
params.use_energy = False
params.zero_mean_frame = False
params.use_power = False
# define transformation and apply to the wave
mfcc = Mfcc(params)
features = mfcc.apply(wavinput)

ASG Loss

ASG loss is a pytorch module (nn.Module) which supports CPU and CUDA backends. It can be defined as

from wav2letter.criterion import ASGLoss
asg_loss = ASGLoss(ntokens, scale_mode).to(device)

where ntokens is the number of tokens predicted for each frame (number of classes), scale_mode is a scaling factor which can be:

NONE = 0, # no scaling
INPUT_SZ = 1, # scale to the input size
INPUT_SZ_SQRT = 2, # scale to the sqrt of the input size
TARGET_SZ = 3, # scale to the target size
TARGET_SZ_SQRT = 4, # scale to the sqrt of the target size

Beam-search decoder

Example how to define your own language model state

class LMStateNew(LMState):
    some_helpful_var = 1

    def __init__(self, some_helpful_var):
        super().__init__()
        self.some_helpful_var = some_helpful_var