Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hackathon 7th] 修复 intValuemax 的问题 #3903

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

megemini
Copy link
Contributor

@megemini megemini commented Nov 22, 2024

PR types

Bug fixes

PR changes

Others

Describe

修复 intValuemax 的问题 ~

这里的输入会是:

[Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor<i32>, stop_gradient=True), 2, Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor<i32>, stop_gradient=True), Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor<i32>, stop_gradient=True)]
[Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor<i32>, stop_gradient=True), 1, 1, Value(define_op_name=pd_op.slice, index=0, dtype=builtin.tensor<i32>, stop_gradient=True)]

另外,paddlespeech/t2s/modules/transformer/embedding.pyself.pe = pe 改为 self.pe = paddle.assign(pe),否则提示错误:

...
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/tensor/creation.py", line 2678, in assign
    _C_ops.assign_out_(input, output)

Sorry about what's happened. In to_static mode, pd_op.assign_out_'s output variable is a viewed Tensor in dygraph. This will result in inconsistent calculation behavior between dynamic and static graphs. You must find the location of the strided ops be called, and call paddle.assign() before inplace input.If you certainly make sure it's safe, you can set env stride_in_no_check_dy2st_diff to 1.

stride_in_no_check_dy2st_diff=0 export 后,也可以正常运行,因此:

  • 这里是否需要修改 self.pe = paddle.assign(pe)
  • 这个文件中还有多处 self.pe = pe 类似的赋值方式,是否一并修改?
  • 是否使用 paddle.assign(pe, self.pe) 的方式?

修改后,可正常执行如下命令:

$ FLAGS_allocator_strategy=naive_best_fit FLAGS_fraction_of_gpu_memory_to_use=0.01 python3 ${BIN_DIR}/../synthesize_e2e.py   --am=fastspeech2_aishell3   --am_config=fastspeech2_canton_ckpt_1.4.0/default.yaml   --am_ckpt=fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz   --am_stat=fastspeech2_canton_ckpt_1.4.0/speech_stats.npy   --voc=pwgan_aishell3   --voc_config=pwg_aishell3_ckpt_0.5/default.yaml   --voc_ckpt=pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz   --voc_stat=pwg_aishell3_ckpt_0.5/feats_stats.npy   --lang=canton   --text=${BIN_DIR}/../../assets/sentences_canton.txt   --output_dir=exp/default/test_e2e   --phones_dict=fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt   --speaker_dict=fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt   --spk_id=10   --inference_dir=exp/default/inference
========Args========
am: fastspeech2_aishell3
am_ckpt: fastspeech2_canton_ckpt_1.4.0/snapshot_iter_140000.pdz
am_config: fastspeech2_canton_ckpt_1.4.0/default.yaml
am_stat: fastspeech2_canton_ckpt_1.4.0/speech_stats.npy
inference_dir: exp/default/inference
lang: canton
ngpu: 1
nmlu: 0
nnpu: 0
nxpu: 0
output_dir: exp/default/test_e2e
phones_dict: fastspeech2_canton_ckpt_1.4.0/phone_id_map.txt
pinyin_phone: null
speaker_dict: fastspeech2_canton_ckpt_1.4.0/speaker_id_map.txt
speech_stretchs: null
spk_id: 10
text: /home/aistudio/PaddleSpeech/paddlespeech/t2s/exps/fastspeech2/../../assets/sentences_canton.txt
tones_dict: null
use_rhy: false
voc: pwgan_aishell3
voc_ckpt: pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz
voc_config: pwg_aishell3_ckpt_0.5/default.yaml
voc_stat: pwg_aishell3_ckpt_0.5/feats_stats.npy

========Config========
batch_size: 32
f0max: 400
f0min: 110
fmax: 7600
fmin: 80
fs: 24000
max_epoch: 1000
model:
  adim: 384
  aheads: 2
  decoder_normalize_before: True
  dlayers: 4
  dunits: 1536
  duration_predictor_chans: 256
  duration_predictor_kernel_size: 3
  duration_predictor_layers: 2
  elayers: 4
  encoder_normalize_before: True
  energy_embed_dropout: 0.0
  energy_embed_kernel_size: 1
  energy_predictor_chans: 256
  energy_predictor_dropout: 0.5
  energy_predictor_kernel_size: 3
  energy_predictor_layers: 2
  eunits: 1536
  init_dec_alpha: 1.0
  init_enc_alpha: 1.0
  init_type: xavier_uniform
  pitch_embed_dropout: 0.0
  pitch_embed_kernel_size: 1
  pitch_predictor_chans: 256
  pitch_predictor_dropout: 0.5
  pitch_predictor_kernel_size: 5
  pitch_predictor_layers: 5
  positionwise_conv_kernel_size: 3
  positionwise_layer_type: conv1d
  postnet_chans: 256
  postnet_filts: 5
  postnet_layers: 5
  reduction_factor: 1
  spk_embed_dim: 256
  spk_embed_integration_type: concat
  stop_gradient_from_energy_predictor: False
  stop_gradient_from_pitch_predictor: True
  transformer_dec_attn_dropout_rate: 0.2
  transformer_dec_dropout_rate: 0.2
  transformer_dec_positional_dropout_rate: 0.2
  transformer_enc_attn_dropout_rate: 0.2
  transformer_enc_dropout_rate: 0.2
  transformer_enc_positional_dropout_rate: 0.2
  use_scaled_pos_enc: True
n_fft: 2048
n_mels: 80
n_shift: 300
num_snapshots: 5
num_workers: 2
optimizer:
  learning_rate: 0.001
  optim: adam
seed: 10086
updater:
  use_masking: True
win_length: 1200
window: hann
allow_cache: True
batch_max_steps: 24000
batch_size: 8
discriminator_grad_norm: 1
discriminator_optimizer_params:
  epsilon: 1e-06
  weight_decay: 0.0
discriminator_params:
  bias: True
  conv_channels: 64
  in_channels: 1
  kernel_size: 3
  layers: 10
  nonlinear_activation: LeakyReLU
  nonlinear_activation_params:
    negative_slope: 0.2
  out_channels: 1
  use_weight_norm: True
discriminator_scheduler_params:
  gamma: 0.5
  learning_rate: 5e-05
  step_size: 200000
discriminator_train_start_steps: 100000
eval_interval_steps: 1000
fmax: 7600
fmin: 80
fs: 24000
generator_grad_norm: 10
generator_optimizer_params:
  epsilon: 1e-06
  weight_decay: 0.0
generator_params:
  aux_channels: 80
  aux_context_window: 2
  dropout: 0.0
  gate_channels: 128
  in_channels: 1
  kernel_size: 3
  layers: 30
  out_channels: 1
  residual_channels: 64
  skip_channels: 64
  stacks: 3
  upsample_scales: [4, 5, 3, 5]
  use_weight_norm: True
generator_scheduler_params:
  gamma: 0.5
  learning_rate: 0.0001
  step_size: 200000
lambda_adv: 4.0
n_fft: 2048
n_mels: 80
n_shift: 300
num_save_intermediate_results: 4
num_snapshots: 10
num_workers: 4
pin_memory: True
remove_short_samples: True
save_interval_steps: 5000
seed: 42
stft_loss_params:
  fft_sizes: [1024, 2048, 512]
  hop_sizes: [120, 240, 50]
  win_lengths: [600, 1200, 240]
  window: hann
train_max_steps: 1000000
win_length: 1200
window: hann
frontend done!
W1122 10:37:30.571856 22376 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1122 10:37:30.573297 22376 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
/home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/layer/layers.py:2194: UserWarning: Skip loading for encoder.embed.1.alpha. encoder.embed.1.alpha receives a shape [1], but the expected shape is [].
/home/aistudio/.local/lib/python3.8/site-packages/paddle/nn/layer/layers.py:2194: UserWarning: Skip loading for decoder.embed.0.alpha. decoder.embed.0.alpha receives a shape [1], but the expected shape is [].
acoustic model done!
voc done!
convert am and voc to static model.
/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:747: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

W1122 10:37:36.563637 22376 pd_api.cc:31283] got different data type, run type promotion automatically, this may cause data type been changed.
/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:747: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

001 白云山爬过一次嘅,好远啊,爬上去都成两个钟
I1122 10:37:41.684955 22376 pir_interpreter.cc:1564] New Executor is Running ...
I1122 10:37:42.039050 22376 pir_interpreter.cc:1591] pir interpreter is running by multi-thread mode ...
001, mel: [163, 80], wave: (119700, 1), time: 5376s, Hz: 22.265625, RTF: 1077.8947368421052.
001 done!
002 睇书咯,番屋企,而家好多人好少睇书噶喎
002, mel: [237, 80], wave: (113100, 1), time: 4007s, Hz: 28.225605190915896, RTF: 850.291777188329.
002 done!
003 因为如果唔考试嘅话,工资好低噶
003, mel: [117, 80], wave: (93600, 1), time: 2628s, Hz: 35.61643835616438, RTF: 673.8461538461539.
003 done!
004 冇固定噶,你中意休边日就边日噶
004, mel: [184, 80], wave: (86400, 1), time: 2738s, Hz: 31.555880204528854, RTF: 760.5555555555555.
004 done!

@zxcd @SigureMo @Liyulingyue

Copy link

paddle-bot bot commented Nov 22, 2024

Thanks for your contribution!

@mergify mergify bot added the T2S label Nov 22, 2024
Copy link
Collaborator

@zxcd zxcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zxcd zxcd merged commit 7fd5abd into PaddlePaddle:develop Nov 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants