-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster module for resampling time series. #911
Conversation
The proposed Resampler routine is a more efficient routine than the existing Resample module for resampling time series signals. Speed improvements are obtained by splitting the signal into blocks where there are 'input_sr' input samples and 'output_sr' output samples. Each block is treated with a convolution mapping 'input_sr' input channels to 'output_sr' output channels per time step. The existing Resample module uses a for loop to iterate over the first index where each filter is applied, but this implementation is fully convolutional. The module is based on https://github.com/danpovey/filtering/blob/master/lilfilter/resampler.py with improvements to include additional filter types and input parameters that align with the librosa api.
torchaudio/transforms.py
Outdated
@@ -9,6 +9,9 @@ | |||
from torchaudio import functional as F | |||
from torchaudio.compliance import kaldi | |||
|
|||
import numpy as np | |||
from scipy import special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening a PR. I will try this out to think how to test. I have not looked into the detail yet but numpy
and scipy
are not a mandatory dependencies of torchaudio
.
We have a mechanism to differ loading the optional dependencies but it would be nice if we can implement without them. Quick googling tells that torch
has heaviside
function, so removing numpy
appears plausible. We need to think what to do for scipy
dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy.special.i0
is used to compute the kaiser window for kaiser
, kaiser_best
and kaiser_fast
. If scipy dependency is not desirable, then one option would be to just allow hann
window mode. (the original goal of including kaiser
windows was to try to duplicate the behaviour of librosa, but I don't think the results of librosa and this proposed routine are identical to numeric precision, so there is likely some other detail about how the librosa filters were generated that I haven't managed to duplicate). If the routine only needs to support hann
windows, then I agree that it should be possible to remove scipy as a dependency.
I also see that there is an issue suggesting that pytorch add the modified bessel functions, although there doesn't seem to be a PR to merge the code into the pytorch codebase: pytorch/pytorch#7815 . If the code to calculate the equivalent of scipy.special.i0
is added to pytorch, then it should also be possible to include kaiser
windows.
edit: the proposed resample routine also uses np.sinc
, which might need a definition in pytorch as sinc = sin(pi*x)/pi/x
and dangerous things happen when x->0. One other possibility that would be OK for this resample routine (where in theory nobody is taking any derivatives...) is to say sinc = sin(pi*x)/pi/(x+epsilon)
. Any thoughts? Scipy/numpy docs: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.sinc.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that sinc can be written in terms of the gamma function - and pytorch has a routine to compute the log of the gamma function (torch.lgamma
)! So I think it should be possible to excise numpy as a dependency as well.... (see output 10 in https://mathworld.wolfram.com/SincFunction.html)
- removed numpy and scipy as dependencies - sinc function uses torch.exp(-torch.lgamma(1-x) - torch.lgamma(1+x)), but a better solution is to have a pytorch definition based on a taylor series expansion - in 'general' mode the .forward function no longer crops out the portion of the signal longer than (seq_len//input_sr) * input_sr
Any comments to the following changes? @mthrok
I am still thinking about how to support non-integer sample rates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. I took a look at your PR and it was surprisingly straightforward to follow. I have not got to the point of thinking of testing phase, but I made couple of comments.
Specifically, torch
seems to already have kaiser_window
and hann_window
. Did you look at them? sinc
is also under development pytorch/pytorch#44713 so it might become available rather soon.
On the interface design aspect, since this module has four distinctive operation modes (trivial
, integer_downsample
, integer_upsample
and general
), it might be a better to split the resampling class into 4 different and introduce a factory function. cc @cpuhrsch what do you think?
torch.arange(input_sr, dtype=dtype).reshape((1, input_sr, 1)) / input_sr - | ||
(torch.arange(kernel_width, dtype=dtype).reshape((1, 1, kernel_width)) - blocks_per_side)) | ||
|
||
def hann_window(a): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is torch.hann_window
. https://pytorch.org/docs/master/generated/torch.hann_window.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.hann_window
doesn't have quite the same functionality as the hann_window
function in the resample routine: the pytorch hann_window
computes the filter for a fixed window size (the only input is the window size) without the ability to scale the period of the filter (ie to change the argument of cos(2*pi*n/N)
to cos(2*pi*n*a/N)
. Compare this to the behaviour of the hann_window
function in the resampler, which takes the 3d times
array and computes the value of the filter for each element (cos(2*pi*a)
) using the array element value (a[i, j,k]
) instead of just the array index.
return torch.heaviside(1 - torch.abs(a), 0.0) * (0.5 + 0.5 * torch.cos(a * pi)) | ||
|
||
|
||
def kaiser_window(a, beta): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is torch.kaiser_window
. https://pytorch.org/docs/master/generated/torch.kaiser_window.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue as with hann_window
above... but the good news is that pytorch now has the i0
function, so I can rewrite the resampler
kaiser_window
routine with that. https://pytorch.org/docs/master/generated/torch.i0.html#torch.i0
torchaudio/transforms.py
Outdated
There must be 2 axes, interpreted as (minibatch_size, sequence_length)... | ||
the minibatch_size may in practice be the number of channels. | ||
|
||
TODO: make default input dim (minibatch_size, channels, seq_len)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typical way we do in torchaudio is to accept Tensors of broad range of dimentions.
In this case, we can generalize it to at least two dimensions and the time axis is the last dimension
, then inside of forward
function we can reshape the input Tensor to 3D. If the input is 2D, then add extra axis in 0 or 1. and if it's more than 3D, then reshape it to [batch, -1, seq_len]
(then back to [batch, ..., new_seq_len]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the flexibility of having 2d or 3d inputs.
torchaudio/transforms.py
Outdated
# num_blocks = 1 | ||
|
||
# data = data[:, 0:(num_blocks*self.input_sr)] # Truncate input | ||
data = data[:, 0:(num_blocks * self.input_sr)].view(minibatch_size, num_blocks, self.input_sr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, data
is padded so that its length is multiple of input_sr
. Do you still need to truncate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! Let me tidy up this piece of code a bit more...
self.resample_type = 'trivial' | ||
return | ||
|
||
def gcd(a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of surprise to me but torch
has gcd. https://pytorch.org/docs/master/generated/torch.gcd.html#torch-gcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcd
is also in the standard python library. I'm fine to switch to the pytorch gcd
but is there some underlying reason why it would compelling to do so?
return None | ||
|
||
|
||
def sinc(x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like sinc
function is under development. pytorch/pytorch#44713
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good. I'll leave the definition of the sinc function as it is for now, but the new function should be used when it's available.
Your idea is to have a base class What are your thoughts on what the goal of tests should be? This issue has some discussion about trying to duplicate the behaviour of audio processing tools in scipy with pytorch. Another issue is that the resampler |
The latest PR now has the ability to accept 2d or 3d inputs.
I think there is also a minor issue with Finally, I think it would be good for somebody to really scrutinize the definition I have for |
Any comments on the latest revision or the questions in the previous post? @mthrok |
Sorry for the late reply. I do not forget you and this PR, but I got caught up with some works related to the upcoming release. I will try my best to get back to you before the end of the week. |
Sorry for the delayed response. I am still working on release-related task, please give me some more time before I can give feedback. |
No worries, thanks for keeping this in PR mind. :-) |
Sorry for the late reply. torchaudio 0.7.0 is released and I was reviewing the project ideas. I posted an RFC #1000 regarding the faster I/O, which I have been working, then realized that in this work we can provide a resampling algorithm faster than the current Python implementation. Therefore I would like to hold this PR. I am so sorry for the efforts you have made based on the interaction here. |
I expect most people want to resample data as they load it and I agree that faster resampling with C++ on the cpu is the functionality most worth focusing on. |
Also maybe its worth, comparing to https://github.com/adefossez/julius that was just released |
Closing the issue as it is mostly addressed by #1487. |
The proposed Resampler routine is more efficient than the existing Resample
module for resampling time series signals. Speed improvements are obtained by splitting
the signal into blocks where there are 'input_sr' input samples and 'output_sr' output
samples. Each block is treated with a convolution mapping 'input_sr' input channels to
'output_sr' output channels per time step. The existing Resample module uses a for loop
to iterate over the first index where each filter is applied, but this implementation is
fully convolutional.
The module is based on
https://github.com/danpovey/filtering/blob/master/lilfilter/resampler.py
with improvements to include additional filter types and input parameters that align
with the librosa api.
#908
input_sr
should be padded with zerosShould tests try to reproduce signals resampled with scipy.signal.resample ?