-
Notifications
You must be signed in to change notification settings - Fork 1k
Python bindings
Tatiana Likhomanenko edited this page Dec 13, 2019
·
9 revisions
Featurization module provides a bunch of classes for standard feature extraction from the audio data: Ceplifter, Dct, Derivatives, Dither, Mfcc, Mfsc, PowerSpectrum, PreEmphasis, TriFilterbank, Windowing.
All of them have the method apply
which can be used to transform the input data. For example:
# imports
from wav2letter.feature import FeatureParams, Mfcc
import itertools
# read the wave
# create params struct
params = FeatureParams()
params.sampling_freq = 16000
params.low_freq_filterbank = 0
params.high_freq_filterbank = 8000
params.num_filterbank_chans = 20
params.num_cepstral_coeffs = 13
params.use_energy = False
params.zero_mean_frame = False
params.use_power = False
# define transformation and apply to the wave
mfcc = Mfcc(params)
features = mfcc.apply(wavinput)
ASG loss is a pytorch module (nn.Module
) which supports CPU and CUDA backends.
It can be defined as
from wav2letter.criterion import ASGLoss
asg_loss = ASGLoss(ntokens, scale_mode).to(device)
where ntokens
is the number of tokens predicted for each frame (number of classes), scale_mode
is a scaling factor which can be:
NONE = 0, # no scaling
INPUT_SZ = 1, # scale to the input size
INPUT_SZ_SQRT = 2, # scale to the sqrt of the input size
TARGET_SZ = 3, # scale to the target size
TARGET_SZ_SQRT = 4, # scale to the sqrt of the target size
Example how to define your own language model state
class LMStateNew(LMState):
some_helpful_var = 1
def __init__(self, some_helpful_var):
super().__init__()
self.some_helpful_var = some_helpful_var