Synthesis results of vocoder #1

twidddj · 2018-04-04T11:59:00Z

Single speaker

We stopped the training at 680K step.
You can find some results at https://twidddj.github.io/docs/vocoder.

We tested the vocoder on the set of two group: 1) samples from the datasets 2) samples generated from Tacotron.

This is because my stupid mistake (So sorry, I did not separate the data for test).

However, I believe the result shows the performance to some extent. See first section in the page.

In other section, you can guess the performance of the vocoder.

It can generate enough as much as the target using only mel-spectrum of target.

Moreover, some part of the result has better quality than target (I hope you think so too). Note that the Tacotron was trained on sample rate = 24K audio data on the other hand our vocoder was trained sample rate = 22K. This means that the vocoder has never seen the frequencies over 11K. Therefore, If you synchronize the sample rate, your results would be better than the results we reported.

By the way, we believe the pre-trained model can be used as a teacher model for parallel wavenet.

Parallel Wavenet - Single speaker

Not yet tested.

Multi speaker

Not yet tested.

SPGB · 2018-07-25T05:22:20Z

Impressive results! Do you have any samples trained on music or guidance for someone seeking to create those samples?

twidddj · 2018-07-26T09:05:33Z

@SPGB welcome! Although we don't have tested this model on music data, we might can give you the guidance. What is the purpose of your model?

SPGB · 2018-07-27T03:51:57Z

@twidddj Thank you for the response. I'm hoping to use it to generate musical instrument sounds (for example a drum loop, or bass sounds). So far my results have mostly been static.

I know I won't be able to replicate the reference samples such as the piano (https://storage.googleapis.com/deepmind-media/pixie/making-music/sample_1.wav) but maybe with the right parameters and enough time an approximation is possible?

twidddj · 2018-07-30T02:24:42Z

You are probably interested in neural synthesizer.

Then Wavenet vocoder would help you to generate the sounds when the right encoded features are given as the local condition (like what they did). For example, you may can use pitch as a local condition and use timbre as a global condition for the vocoder.

SPGB · 2018-08-22T05:17:02Z

Thanks for sharing the link @twidddj . I've been using Ibab's wavenet implementation to some moderate success with a wide receptive field and minimal layers.

Is it possible to turn off local conditions all together and just create unconditional sounds? Something similar to python train.py --data-root=./data/cmu_arctic/ --hparams="cin_channels=-1,gin_channels=-1" from r9y9/wavenet_vocoder.

It would be really interesting if there was a way to condition it on MIDI but I wouldn't know where to begin for an addition like that.

twidddj closed this as completed Apr 11, 2018

twidddj reopened this Apr 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthesis results of vocoder #1

Synthesis results of vocoder #1

twidddj commented Apr 4, 2018 •

edited

Loading

SPGB commented Jul 25, 2018

twidddj commented Jul 26, 2018

SPGB commented Jul 27, 2018

twidddj commented Jul 30, 2018

SPGB commented Aug 22, 2018

Synthesis results of vocoder #1

Synthesis results of vocoder #1

Comments

twidddj commented Apr 4, 2018 • edited Loading

Single speaker

Parallel Wavenet - Single speaker

Multi speaker

SPGB commented Jul 25, 2018

twidddj commented Jul 26, 2018

SPGB commented Jul 27, 2018

twidddj commented Jul 30, 2018

SPGB commented Aug 22, 2018

twidddj commented Apr 4, 2018 •

edited

Loading