How to generate mel spectrogram #4

nkcdy · 2019-06-20T09:18:23Z

with the same wavenet model and the same utterence(p225_001.wav), i found that the quality of the waveform generated from the mel-spectrogram in provided metadata.pkl is much better than the one generated by myself. Is there any tricky on how to generate proper mel-spectrogram?

auspicious3000 · 2019-06-20T23:52:51Z

num_mels: 80
fmin: 90
fmax: 7600
fft_size: 1024
hop_size: 256
min_level_db: -100
ref_level_db: 16

nkcdy · 2019-06-21T02:12:31Z

num_mels: 80
fmin: 90
fmax: 7600
fft_size: 1024
hop_size: 256
min_level_db: -100
ref_level_db: 16

Thanks a lot. the quality is improved with the above hyperparameters when i generate the mel spectrogram even if i use default parameter to generate waveform.

nkcdy · 2019-06-21T02:24:23Z

Another question is about the speaker embeddings. The speaker embedding in metadata.pkl is a scalar with 256-dimensions, but i got a matrix with the size of N*256 when I use the GE2E method to generate the speaker embeddings. What's the relationship between the scalar and the matrix?

auspicious3000 · 2019-06-21T02:59:26Z

The embedding in metadata.pkl should be a vector of length 256.
The N you got might be the number of speakers.

nkcdy · 2019-06-22T05:12:00Z

Yes, the embedding is metadata.pkl is a vector of length 256. But I got several d-vector with length of 256 even if i use a single wave file(p225_001.wav). I did some normalizationc according to the GE2E paper( section 3.2), "the final utterance-wise d-vector is generated by L2 normalizing the window-wise d-vectors, then takking the element-wise averge". The result looks quite different from the vector in metadata.pkl. All number in the vector were positive number while the number in metadata.pkl has both positive and negative value. Should I just average the all the d-vector without normalization?

auspicious3000 · 2019-06-22T06:22:42Z

You can average all the d-vectors without normalization.

nkcdy · 2019-06-22T10:31:57Z

It didnt work... :(.

I noticed that the sampling rate of TIMIT corpus used in https://github.com/HarryVolek/PyTorch_Speaker_Verification is 16KHz while the sampling rate in VCTK corpus is 48kHz.

Should I re-train the D-vector network at the 48kHz sampling rate?

auspicious3000 · 2019-06-22T16:30:23Z

The details are described in the paper.

nkcdy · 2019-06-28T02:07:56Z

The details are described in the paper.

I still can not reproduce your reults as shown in the demo. what i got were babbles. The sampling rate of all the wavefiles has been changed to 16kHz as described in your paper.

The network I used to generate the speaker embeddings was Janghyun1230's version(https://github.com/Janghyun1230/Speaker_Verification).

I noticed that the method used to generate the Mel-spectrogram in wavenet is different with that in speaker verification. So I modified the source code of the speaker verificaiton to match the mel-spectrogram of wavenet and retrain the speaker embedding netowrk. But it still doesn't work for autovc conversion.

I guess the reason lies in the method to generate the speaker embeddings.

Can you give me some advice on that?

auspicious3000 · 2019-07-01T04:26:12Z

You are right. In this case, you have to retrain the model using your speaker embeddings.

lhppom · 2019-07-03T06:49:01Z

num_mels: 80
fmin: 90
fmax: 7600
fft_size: 1024
hop_size: 256
min_level_db: -100
ref_level_db: 16

Thanks a lot. the quality is improved with the above hyperparameters when i generate the mel spectrogram even if i use default parameter to generate waveform.

Do you clip the mel spectrogram to a specific range, such as [-1,1] or other case? Thanks!

auspicious3000 · 2019-07-03T09:41:42Z

Clip to [0,1]

liveroomand · 2019-07-22T08:54:20Z

Clip to [0,1]

How does mel spectrogram clip to [0,1] ? what algorithm or method do you used?

xw1324832579 · 2019-07-23T12:28:12Z

@auspicious3000 Can you please release your code to generate speaker embedding? I have the same question with @liveroomand that can't reproduce your embedding results, p225、p228、p256 and p270.Retraining the model costs a lot of time. Or please release all the parameters you set during training speaker embeddings. Thank you

auspicious3000 · 2019-07-26T01:29:21Z

@xw1324832579 You can use one-hot embedding if you are not doing zero-shot conversion. Retraining takes less than 12 hrs on single gpu.

liveroomand · 2019-08-09T07:26:43Z

@auspicious3000 Are the features（80-mel） of speaker embedding and text extraction（The encoder input） the same？

auspicious3000 · 2019-08-09T19:35:55Z

They don't have to be the same.

liveroomand · 2019-08-13T01:32:16Z

How to generate speaker mel spectrogram
eg:
num_mels: 40
fmin: 90
fmax: 7600
window_hight: 0.025s
hop_hight: 0.01s
don't Clip to [0,1]

auspicious3000 · 2019-08-13T02:39:33Z

@liveroomand Looks fine. You can refer to r9y9's wavenet vocoder for more details on spectrogram normalizaton and clipping

liveroomand · 2019-08-13T02:47:16Z

What you mean: the Speaker Encoder is pre-trained to use the Merle spectrum also needs to be Clip to [0, 1]

auspicious3000 · 2019-08-13T03:29:34Z

@liveroomand Yes in our case. but you can design your own speaker encoder or just use onehot embedding

smalissa · 2019-09-20T08:49:48Z

hi all ,
can you help me please,
i have my own dataset, how i process this data , how i can build my models to get my own wav audio.?
thanks

miaoYuanyuan · 2019-09-23T07:58:53Z

num_mels: 80
fmin: 90
fmax: 7600
fft_size: 1024
hop_size: 256
min_level_db: -100
ref_level_db: 16

is this params suitable for other dataset? when i change to myself dataset ,it doesn't present good quilatity like vctk. is the reason you train wavenet-vocoder on vctk once again? could you give some advices on other dataset? thanks

auspicious3000 · 2019-09-23T08:13:02Z

@miaoYuanyuan For other dataset, you need to tune the parameters of the conversion model instead of the parameters of the feature.

smalissa · 2019-09-23T08:15:16Z

@miaoYuanyuan
pleaze can you tell me what do you do to get you result , can you guide me what do you do ?
i will thank you

miaoYuanyuan · 2019-09-25T08:51:49Z

@miaoYuanyuan For other dataset, you need to tune the parameters of the conversion model instead of the parameters of the feature.

thanks, you mean the wavenet-vocoder or auto-vc conversion model?

auspicious3000 · 2019-09-25T11:19:00Z

@miaoYuanyuan If you change the parameters of features, you will need to retrain the wavenet-vocoder as well.

miaoYuanyuan · 2019-09-25T15:30:59Z

Thank you! I got it.

miaoYuanyuan · 2019-09-25T15:35:42Z

@miaoYuanyuan
pleaze can you tell me what do you do to get you result , can you guide me what do you do ?
i will thank you

from wavs to mel spectrogram:
Refer to the process of preprocess.py processing audio in the wavenet_vocoder folder to get the Mel spectrum you want. I haven't done voice conversion yet, so I couldn't give you advice.

smalissa · 2019-09-25T18:39:33Z

@miaoYuanyuan
thanx for you reply
but if you can what do until now ,what is steps? until i could understand well, because iam confused with the details .

smalissa · 2019-09-25T18:49:53Z

@miaoYuanyuan
can you tell mw what the aim of preprocess.py file?
can guide me to the start point >from where i should start?
thanx

miaoYuanyuan · 2019-10-21T10:46:12Z

@miaoYuanyuan
can you tell mw what the aim of preprocess.py file?
can guide me to the start point >from where i should start?
thanx
this is the code . wish can help you.
https://github.com/miaoYuanyuan/gen_melSpec_from_wav

KnurpsBram · 2020-06-18T15:17:30Z

Thanks @miaoYuanyuan for making the preprocessing steps clear! I wanted to experiment with AutoVC and the wavenet vocoder separately, and found this thread really useful. In the end I put my experiments in a notebook and made a git repo of it. It could be useful for those of you who are in the shoes of me-a-week-ago.

https://github.com/KnurpsBram/AutoVC_WavenetVocoder_GriffinLim_experiments

auspicious3000 mentioned this issue Aug 28, 2019

Hyperparameters for generating mel spectrogram from training .wav files #25

Open

kvnsq mentioned this issue Nov 26, 2019

Bad conversion quality after retraining #33

Closed

DominikKoller mentioned this issue Jan 14, 2021

I wrote a single Jupyter notebook to reproduce the results, but all I get is silence, please help #79

Closed

hrnoh24 mentioned this issue Jan 22, 2021

request documentation hrnoh/f0-autovc#1

Open

gnipping mentioned this issue Mar 31, 2021

How to get the same mel feature in "metadata.pkl"? #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to generate mel spectrogram #4

How to generate mel spectrogram #4

nkcdy commented Jun 20, 2019

auspicious3000 commented Jun 20, 2019

nkcdy commented Jun 21, 2019

nkcdy commented Jun 21, 2019

auspicious3000 commented Jun 21, 2019

nkcdy commented Jun 22, 2019

auspicious3000 commented Jun 22, 2019

nkcdy commented Jun 22, 2019

auspicious3000 commented Jun 22, 2019

nkcdy commented Jun 28, 2019

auspicious3000 commented Jul 1, 2019

lhppom commented Jul 3, 2019

auspicious3000 commented Jul 3, 2019

liveroomand commented Jul 22, 2019

xw1324832579 commented Jul 23, 2019

auspicious3000 commented Jul 26, 2019

liveroomand commented Aug 9, 2019

auspicious3000 commented Aug 9, 2019

liveroomand commented Aug 13, 2019

auspicious3000 commented Aug 13, 2019

liveroomand commented Aug 13, 2019

auspicious3000 commented Aug 13, 2019 •

edited

Loading

smalissa commented Sep 20, 2019

miaoYuanyuan commented Sep 23, 2019 •

edited

Loading

auspicious3000 commented Sep 23, 2019

smalissa commented Sep 23, 2019 •

edited

Loading

miaoYuanyuan commented Sep 25, 2019

auspicious3000 commented Sep 25, 2019

miaoYuanyuan commented Sep 25, 2019

miaoYuanyuan commented Sep 25, 2019 •

edited

Loading

smalissa commented Sep 25, 2019

smalissa commented Sep 25, 2019

miaoYuanyuan commented Oct 21, 2019

KnurpsBram commented Jun 18, 2020

How to generate mel spectrogram #4

How to generate mel spectrogram #4

Comments

nkcdy commented Jun 20, 2019

auspicious3000 commented Jun 20, 2019

nkcdy commented Jun 21, 2019

nkcdy commented Jun 21, 2019

auspicious3000 commented Jun 21, 2019

nkcdy commented Jun 22, 2019

auspicious3000 commented Jun 22, 2019

nkcdy commented Jun 22, 2019

auspicious3000 commented Jun 22, 2019

nkcdy commented Jun 28, 2019

auspicious3000 commented Jul 1, 2019

lhppom commented Jul 3, 2019

auspicious3000 commented Jul 3, 2019

liveroomand commented Jul 22, 2019

xw1324832579 commented Jul 23, 2019

auspicious3000 commented Jul 26, 2019

liveroomand commented Aug 9, 2019

auspicious3000 commented Aug 9, 2019

liveroomand commented Aug 13, 2019

auspicious3000 commented Aug 13, 2019

liveroomand commented Aug 13, 2019

auspicious3000 commented Aug 13, 2019 • edited Loading

smalissa commented Sep 20, 2019

miaoYuanyuan commented Sep 23, 2019 • edited Loading

auspicious3000 commented Sep 23, 2019

smalissa commented Sep 23, 2019 • edited Loading

miaoYuanyuan commented Sep 25, 2019

auspicious3000 commented Sep 25, 2019

miaoYuanyuan commented Sep 25, 2019

miaoYuanyuan commented Sep 25, 2019 • edited Loading

smalissa commented Sep 25, 2019

smalissa commented Sep 25, 2019

miaoYuanyuan commented Oct 21, 2019

KnurpsBram commented Jun 18, 2020

auspicious3000 commented Aug 13, 2019 •

edited

Loading

miaoYuanyuan commented Sep 23, 2019 •

edited

Loading

smalissa commented Sep 23, 2019 •

edited

Loading

miaoYuanyuan commented Sep 25, 2019 •

edited

Loading