Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthesis results of vocoder #1

Open
twidddj opened this issue Apr 4, 2018 · 5 comments
Open

Synthesis results of vocoder #1

twidddj opened this issue Apr 4, 2018 · 5 comments

Comments

@twidddj
Copy link
Owner

twidddj commented Apr 4, 2018

Single speaker

We stopped the training at 680K step.
You can find some results at https://twidddj.github.io/docs/vocoder.

We tested the vocoder on the set of two group: 1) samples from the datasets 2) samples generated from Tacotron.

This is because my stupid mistake (So sorry, I did not separate the data for test).

However, I believe the result shows the performance to some extent. See first section in the page.

In other section, you can guess the performance of the vocoder.

It can generate enough as much as the target using only mel-spectrum of target.

Moreover, some part of the result has better quality than target (I hope you think so too). Note that the Tacotron was trained on sample rate = 24K audio data on the other hand our vocoder was trained sample rate = 22K. This means that the vocoder has never seen the frequencies over 11K. Therefore, If you synchronize the sample rate, your results would be better than the results we reported.

By the way, we believe the pre-trained model can be used as a teacher model for parallel wavenet.

Parallel Wavenet - Single speaker

Not yet tested.

Multi speaker

Not yet tested.

@twidddj twidddj closed this as completed Apr 11, 2018
@twidddj twidddj reopened this Apr 12, 2018
@SPGB
Copy link

SPGB commented Jul 25, 2018

Impressive results! Do you have any samples trained on music or guidance for someone seeking to create those samples?

@twidddj
Copy link
Owner Author

twidddj commented Jul 26, 2018

@SPGB welcome! Although we don't have tested this model on music data, we might can give you the guidance. What is the purpose of your model?

@SPGB
Copy link

SPGB commented Jul 27, 2018

@twidddj Thank you for the response. I'm hoping to use it to generate musical instrument sounds (for example a drum loop, or bass sounds). So far my results have mostly been static.

I know I won't be able to replicate the reference samples such as the piano (https://storage.googleapis.com/deepmind-media/pixie/making-music/sample_1.wav) but maybe with the right parameters and enough time an approximation is possible?

@twidddj
Copy link
Owner Author

twidddj commented Jul 30, 2018

You are probably interested in neural synthesizer.

Then Wavenet vocoder would help you to generate the sounds when the right encoded features are given as the local condition (like what they did). For example, you may can use pitch as a local condition and use timbre as a global condition for the vocoder.

@SPGB
Copy link

SPGB commented Aug 22, 2018

Thanks for sharing the link @twidddj . I've been using Ibab's wavenet implementation to some moderate success with a wide receptive field and minimal layers.

Is it possible to turn off local conditions all together and just create unconditional sounds? Something similar to python train.py --data-root=./data/cmu_arctic/ --hparams="cin_channels=-1,gin_channels=-1" from r9y9/wavenet_vocoder.

It would be really interesting if there was a way to condition it on MIDI but I wouldn't know where to begin for an addition like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants