Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster module for resampling time series. #911

Closed
wants to merge 3 commits into from

Conversation

small-yellow-duck
Copy link

@small-yellow-duck small-yellow-duck commented Sep 15, 2020

The proposed Resampler routine is more efficient than the existing Resample
module for resampling time series signals. Speed improvements are obtained by splitting
the signal into blocks where there are 'input_sr' input samples and 'output_sr' output
samples. Each block is treated with a convolution mapping 'input_sr' input channels to
'output_sr' output channels per time step. The existing Resample module uses a for loop
to iterate over the first index where each filter is applied, but this implementation is
fully convolutional.

The module is based on

https://github.com/danpovey/filtering/blob/master/lilfilter/resampler.py

with improvements to include additional filter types and input parameters that align
with the librosa api.

#908

  • inputs where the sequence length is not a whole number multiple of input_sr should be padded with zeros
  • adapt module to deal with multi-channel input
  • adapt to deal with non-integer sample rates?

Should tests try to reproduce signals resampled with scipy.signal.resample ?

The proposed Resampler routine is a more efficient routine than the existing Resample
module for resampling time series signals. Speed improvements are obtained by splitting
the signal into blocks where there are 'input_sr' input samples and 'output_sr' output
samples. Each block is treated with a convolution mapping 'input_sr' input channels to
'output_sr' output channels per time step. The existing Resample module uses a for loop
to iterate over the first index where each filter is applied, but this implementation is
fully convolutional.

The module is based on

https://github.com/danpovey/filtering/blob/master/lilfilter/resampler.py

with improvements to include additional filter types and input parameters that align
with the librosa api.
@@ -9,6 +9,9 @@
from torchaudio import functional as F
from torchaudio.compliance import kaldi

import numpy as np
from scipy import special
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @small-yellow-duck

Thanks for opening a PR. I will try this out to think how to test. I have not looked into the detail yet but numpy and scipy are not a mandatory dependencies of torchaudio.

We have a mechanism to differ loading the optional dependencies but it would be nice if we can implement without them. Quick googling tells that torch has heaviside function, so removing numpy appears plausible. We need to think what to do for scipy dependency.

Copy link
Author

@small-yellow-duck small-yellow-duck Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy.special.i0 is used to compute the kaiser window for kaiser, kaiser_best and kaiser_fast. If scipy dependency is not desirable, then one option would be to just allow hann window mode. (the original goal of including kaiser windows was to try to duplicate the behaviour of librosa, but I don't think the results of librosa and this proposed routine are identical to numeric precision, so there is likely some other detail about how the librosa filters were generated that I haven't managed to duplicate). If the routine only needs to support hann windows, then I agree that it should be possible to remove scipy as a dependency.

I also see that there is an issue suggesting that pytorch add the modified bessel functions, although there doesn't seem to be a PR to merge the code into the pytorch codebase: pytorch/pytorch#7815 . If the code to calculate the equivalent of scipy.special.i0 is added to pytorch, then it should also be possible to include kaiser windows.

edit: the proposed resample routine also uses np.sinc, which might need a definition in pytorch as sinc = sin(pi*x)/pi/x and dangerous things happen when x->0. One other possibility that would be OK for this resample routine (where in theory nobody is taking any derivatives...) is to say sinc = sin(pi*x)/pi/(x+epsilon). Any thoughts? Scipy/numpy docs: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.sinc.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that sinc can be written in terms of the gamma function - and pytorch has a routine to compute the log of the gamma function (torch.lgamma)! So I think it should be possible to excise numpy as a dependency as well.... (see output 10 in https://mathworld.wolfram.com/SincFunction.html)

- removed numpy and scipy as dependencies
- sinc function uses torch.exp(-torch.lgamma(1-x) - torch.lgamma(1+x)), but a better
solution is to have a pytorch definition based on a taylor series expansion
- in 'general' mode the .forward function no longer crops out the portion of the signal
longer than (seq_len//input_sr) * input_sr
@small-yellow-duck
Copy link
Author

small-yellow-duck commented Sep 22, 2020

Any comments to the following changes? @mthrok

  • Removing numpy and scipy as dependencies means that the resampler will no longer support kaiser windows. kaiser windows can be supported if there is a pytorch definition for scipy.special.i0.

  • In order to remove the numpy dependency, the sinc function is now defined as torch.exp(-torch.lgamma(1-x) - torch.lgamma(1+x)). This is not a good solution because the exp introduces numerical errors. A better solution is to have a pytorch definition based on a taylor series expansion of sinc.

  • The general mode in forward no longer crops out the portion of the signal that is longer than (seq_len//input_sr) * input_sr. (YAY)

I am still thinking about how to support non-integer sample rates.
Further modifications to support multi-channel signals is desirable.

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @small-yellow-duck

Sorry for the late reply. I took a look at your PR and it was surprisingly straightforward to follow. I have not got to the point of thinking of testing phase, but I made couple of comments.

Specifically, torch seems to already have kaiser_window and hann_window. Did you look at them? sinc is also under development pytorch/pytorch#44713 so it might become available rather soon.

On the interface design aspect, since this module has four distinctive operation modes (trivial, integer_downsample, integer_upsample and general), it might be a better to split the resampling class into 4 different and introduce a factory function. cc @cpuhrsch what do you think?

torch.arange(input_sr, dtype=dtype).reshape((1, input_sr, 1)) / input_sr -
(torch.arange(kernel_width, dtype=dtype).reshape((1, 1, kernel_width)) - blocks_per_side))

def hann_window(a):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@small-yellow-duck small-yellow-duck Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.hann_window doesn't have quite the same functionality as the hann_window function in the resample routine: the pytorch hann_window computes the filter for a fixed window size (the only input is the window size) without the ability to scale the period of the filter (ie to change the argument of cos(2*pi*n/N) to cos(2*pi*n*a/N). Compare this to the behaviour of the hann_window function in the resampler, which takes the 3d times array and computes the value of the filter for each element (cos(2*pi*a)) using the array element value (a[i, j,k]) instead of just the array index.

return torch.heaviside(1 - torch.abs(a), 0.0) * (0.5 + 0.5 * torch.cos(a * pi))


def kaiser_window(a, beta):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as with hann_window above... but the good news is that pytorch now has the i0 function, so I can rewrite the resampler kaiser_window routine with that. https://pytorch.org/docs/master/generated/torch.i0.html#torch.i0

There must be 2 axes, interpreted as (minibatch_size, sequence_length)...
the minibatch_size may in practice be the number of channels.

TODO: make default input dim (minibatch_size, channels, seq_len)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical way we do in torchaudio is to accept Tensors of broad range of dimentions.
In this case, we can generalize it to at least two dimensions and the time axis is the last dimension, then inside of forward function we can reshape the input Tensor to 3D. If the input is 2D, then add extra axis in 0 or 1. and if it's more than 3D, then reshape it to [batch, -1, seq_len] (then back to [batch, ..., new_seq_len]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also like the flexibility of having 2d or 3d inputs.

# num_blocks = 1

# data = data[:, 0:(num_blocks*self.input_sr)] # Truncate input
data = data[:, 0:(num_blocks * self.input_sr)].view(minibatch_size, num_blocks, self.input_sr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, data is padded so that its length is multiple of input_sr. Do you still need to truncate it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Let me tidy up this piece of code a bit more...

self.resample_type = 'trivial'
return

def gcd(a, b):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of surprise to me but torch has gcd. https://pytorch.org/docs/master/generated/torch.gcd.html#torch-gcd

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcd is also in the standard python library. I'm fine to switch to the pytorch gcd but is there some underlying reason why it would compelling to do so?

return None


def sinc(x):
Copy link
Collaborator

@mthrok mthrok Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like sinc function is under development. pytorch/pytorch#44713

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good. I'll leave the definition of the sinc function as it is for now, but the new function should be used when it's available.

@small-yellow-duck
Copy link
Author

Your idea is to have a base class resample that does the weight generation and which resample_trivial, resample_integer_upsample, resample_integer_downsample and resample_general can inherit from? I am pretty agnostic on whether to split the resampler into resample_trivial, resample_integer_upsample, resample_integer_downsample and resample_general: I'm happy to do it if people think it makes the code easier to understand.

What are your thoughts on what the goal of tests should be? This issue has some discussion about trying to duplicate the behaviour of audio processing tools in scipy with pytorch.
#452

Another issue is that the resampler hann_window function needs to have a value for pi, but pytorch doesn't have a definition for pi. I've used a suggested workaround (pi = 2.0*torch.acos(torch.zeros(1)).item(), but there is also an open issue to make pytorch include pi: pytorch/pytorch#19172

@small-yellow-duck
Copy link
Author

small-yellow-duck commented Sep 25, 2020

The latest PR now has the ability to accept 2d or 3d inputs.

  • Is there a compelling reason to switch from using gcd to using torch.gcd?
  • Is there a better workaround for a definition of pi?
  • the existing implementation of Resample uses different names for input_sr and output_sr, while this implementation tries to align with the librosa resample routine. I think it's better to try to reproduce the librosa interface. Any comments?
  • Is it desirable to split the resample class into resample_trivial, resample_integer_upsample, resample_integer_downsample and resample_general and have each class inherit the weight initializations from resample?
  • What should the goals of unit tests be? (with the existing definition of sinc and pi, we can't expect numeric agreement with other packages...) Perhaps one option is to downsample a signal and then upsample it again? (and vice versa?)
    @mthrok

I think there is also a minor issue with general mode padding the input with 0's so that the seq_len is an integer multiple of input_sr. When the kernel width is small, this is pretty safe to do, but if the kernel width is large, then the convolution will involve a lot of the padded zeros and the output of the convolution is not going to be scaled correctly (the output will be too small). It should be possible to see this by comparing a signal padded with zeros at the end (resample(torch.cat(data, zeros), dim=-1)[:, 0:new_seq_len]) to the same signal padded with zeros at the begining (resample(torch.cat(zeros, data), dim=-1)[:, -new_seq_len:]). Maybe there is a smart way to to construct an appropriately normalized kernel to take care of the last segment of the input data[:, :, (seq_len//self.input_sr)*self.input_sr:], or the case where the seq_len is shorter than self.input_sr. (However, this kernel would have to be constructed during the .forward call as the size depends on the data seq_len.)

Finally, I think it would be good for somebody to really scrutinize the definition I have for kaiser_window to make sure that it is correct. :-)

@small-yellow-duck
Copy link
Author

Any comments on the latest revision or the questions in the previous post? @mthrok

@mthrok
Copy link
Collaborator

mthrok commented Oct 6, 2020

Hi @small-yellow-duck

Sorry for the late reply. I do not forget you and this PR, but I got caught up with some works related to the upcoming release. I will try my best to get back to you before the end of the week.

@mthrok
Copy link
Collaborator

mthrok commented Oct 12, 2020

Hi @small-yellow-duck

Sorry for the delayed response. I am still working on release-related task, please give me some more time before I can give feedback.

@small-yellow-duck
Copy link
Author

No worries, thanks for keeping this in PR mind. :-)

@mthrok
Copy link
Collaborator

mthrok commented Nov 3, 2020

@small-yellow-duck

Sorry for the late reply. torchaudio 0.7.0 is released and I was reviewing the project ideas. I posted an RFC #1000 regarding the faster I/O, which I have been working, then realized that in this work we can provide a resampling algorithm faster than the current Python implementation. Therefore I would like to hold this PR. I am so sorry for the efforts you have made based on the interaction here.

@small-yellow-duck
Copy link
Author

I expect most people want to resample data as they load it and I agree that faster resampling with C++ on the cpu is the functionality most worth focusing on.

@faroit
Copy link
Contributor

faroit commented Nov 4, 2020

Also maybe its worth, comparing to https://github.com/adefossez/julius that was just released

@mthrok
Copy link
Collaborator

mthrok commented Aug 3, 2021

Closing the issue as it is mostly addressed by #1487.
Please comment in #1487 if we are missing something.
Thank you for the contribution @small-yellow-duck

@mthrok mthrok closed this Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants