how to control upsample scales #169

james20141606 · 2019-09-29T02:28:43Z

I use the default setting of [4,4,4,4] in 20180510_mixture_lj_checkpoint_step000320000_ema.json for umsample parameters, and I got an error from

if c is not None and self.upsample_net is not None:
            c = self.upsample_net(c)
            assert c.size(-1) == x.size(-1)

in wavenet.py
I print the c and x size out: torch.Size([2, 32, 19968]), torch.Size([2, 1, 9984])
it seems its twice the size, I tried to change the parameters to [2,4,4,4] but it did not work.
Or should I change other parameters?

The text was updated successfully, but these errors were encountered:

james20141606 · 2019-09-29T02:36:35Z

by the way I customized some parameters in json file as:

{
  "name": "wavenet_vocoder",
  "builder": "wavenet",
  "input_type": "raw",
  "quantize_channels": 65536,
  "sample_rate": 16000,
  "silence_threshold": 2,
  "num_mels": 32,
  "fmin": 125,
  "fmax": 7600,
  "fft_size": 1024,
  "hop_size": 128,
  "frame_shift_ms": null,
  "min_level_db": -100,
  "ref_level_db": 20,
  "rescaling": true,
  "rescaling_max": 0.999,
  "allow_clipping_in_normalization": true,
  "log_scale_min": -32.23619130191664,
  "out_channels": 30,
  "layers": 24,
  "stacks": 4,
  "residual_channels": 512,
  "gate_channels": 512,
  "skip_out_channels": 256,
  "dropout": 0.050000000000000044,
  "kernel_size": 3,
  "weight_normalization": true,
  "cin_channels": 32,
  "upsample_conditional_features": true,
  "upsample_scales": [
    2,
    4,
    4,
    4
  ],
  "cin_pad": 2,
  "freq_axis_kernel_size": 3,
  "gin_channels": -1,
  "n_speakers": 1,
  "pin_memory": true,
  "num_workers": 2,
  "test_size": 0.0441,
  "test_num_samples": null,
  "random_state": 1234,
  "batch_size": 2,
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "adam_eps": 1e-08,
  "amsgrad": false,
  "initial_learning_rate": 0.001,
  "lr_schedule": "noam_learning_rate_decay",
  "lr_schedule_kwargs": {},
  "nepochs": 2000,
  "weight_decay": 0.0,
  "clip_thresh": -1,
  "max_time_sec": null,
  "max_time_steps": 10000,
  "exponential_moving_average": true,
  "ema_decay": 0.9999,
  "checkpoint_interval": 10000,
  "train_eval_interval": 10000,
  "test_eval_epoch_interval": 5,
  "save_optimizer_state": true
}

Could you help to see what's wrong with the setting?

james20141606 · 2019-09-29T02:59:49Z

I think I solved it, I found that although I changed the upsample parameters to [2,4,4,4], the train.py did not receive the parameters, so I change the codes in build_model from

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad

to

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad
upsample_params['upsample_scales'] = hparams.upsample_scales

and this time the hparams.upsample_params can pass the upsample scale parameters from json file

r9y9 · 2019-09-29T03:24:34Z

As noted in

wavenet_vocoder/hparams.py

Line 75 in c0ac05e

"upsample_scales": [4, 4, 4, 4], # should np.prod(upsample_scales) == hop_size

, np.prod(upsample_scales) must be equal to hop_size. This is the reason you got the assertion error.

Looks like you are using an old json file. Top-level upsample_scales doesn't exist anymore (It did in v0.1.1 though)

r9y9 · 2019-09-29T03:26:08Z

Ah, I haven't updated https://github.com/r9y9/wavenet_vocoder/tree/c0ac05e41f9f563421172034e9398633df172b4f/presets, which may confuse you. I will simply delete them.

james20141606 · 2019-09-30T02:28:40Z

I used the json file you provided in Hyper params URL in Pre-trained models. Do you mean we do not need the upsample_scales parameters anymore? Could you provide the new json file? I encounted the similar upsample problems when I tried to use trained model to synthesize audio files, it seems that the upsampled c's size(-1) in line 276 in wavenet.py does not match with T

r9y9 · 2019-09-30T02:34:13Z

For pretrained models, please checkout the specific git commit as noted in README.

james20141606 · 2019-09-30T02:37:00Z

Yeah I checkout to the specific version while trying synthesis. But for training a new model use my own data I think I kind of mixed the older version with specific version.
For the error, for one case, I have a c with size(-1) 1016 and after upsample it's 129536 which ratio is 127.49606299212599, it does not match the hop size 128 I provided.
The weird thing is I think I use the same parameters and wavenet.py in my train.py and it also use upsampling and it runs well. I am not sure why the upsample fails in synthesis.py part

james20141606 · 2019-10-01T17:57:55Z

hey, I'd like to ask again that although the model can be trained smoothly on the specific upsample scale, the model can't be used to synthesize the audio using same json file since the upsample network did not give input audio c exact upsample scales (for me it gives 127.xxxx instead of 128). I am not sure what may cause this problem.

r9y9 · 2019-10-04T13:17:19Z

wavenet_vocoder/datasets/wavallin.py

Lines 97 to 100 in 8cc0c2d

    
           # ensure length of raw audio is multiple of hop_size so that we can use 
        
           # transposed convolution to upsample 
        
           out = out[:N * audio.get_hop_size()] 
        
           assert len(out) % audio.get_hop_size() == 0

If you use our preprocessing script, upsampling is expected to work correctly.

I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging to isolate your problem.

james20141606 · 2019-10-10T01:37:15Z

Hey, I tried to see what happened to upsample_net, I found that when specifying scales to [2, 4, 4, 4] (which supposed to upsample 128). But during training when I print the c.size(-1),x.size(-1) in wavenet.py before and after line 196, I found that the upsample scales are not 128 (for example: torch.Size([2, 32, 82]) and torch.Size([2, 32, 9984])), but fortunately c.size(-1),x.size(-1) matches

However, during synthesis which using codes in wavenet.py line 275

c = self.upsample_net(c)
assert c.size(-1) == T

this time the upsample_net won't produce c.size(-1) == T

james20141606 · 2019-10-10T03:26:15Z

I did some further debugging and there are still something confusing me:
at first in synthesis.py it seems batch_wavegen function's parameter has some problems when applying it in line 243.
then I found that the length mismatch may due to the cin_pad? the cin_pad made len(x)/lem(c) != hopsize. and upsample_net(c) does not produce same length with x. I am not sure how to deal with it.

stale · 2019-12-09T03:48:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale bot added the wontfix label Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to control upsample scales #169

how to control upsample scales #169

james20141606 commented Sep 29, 2019 •

edited

Loading

james20141606 commented Sep 29, 2019

james20141606 commented Sep 29, 2019

r9y9 commented Sep 29, 2019

r9y9 commented Sep 29, 2019

james20141606 commented Sep 30, 2019

r9y9 commented Sep 30, 2019

james20141606 commented Sep 30, 2019

james20141606 commented Oct 1, 2019

r9y9 commented Oct 4, 2019

james20141606 commented Oct 10, 2019

james20141606 commented Oct 10, 2019

stale bot commented Dec 9, 2019

how to control upsample scales #169

how to control upsample scales #169

Comments

james20141606 commented Sep 29, 2019 • edited Loading

james20141606 commented Sep 29, 2019

james20141606 commented Sep 29, 2019

r9y9 commented Sep 29, 2019

r9y9 commented Sep 29, 2019

james20141606 commented Sep 30, 2019

r9y9 commented Sep 30, 2019

james20141606 commented Sep 30, 2019

james20141606 commented Oct 1, 2019

r9y9 commented Oct 4, 2019

james20141606 commented Oct 10, 2019

james20141606 commented Oct 10, 2019

stale bot commented Dec 9, 2019

james20141606 commented Sep 29, 2019 •

edited

Loading