You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have observed that the TTS results are sometimes truncated. The key characteristics of this issue are as follows:
Incomplete Output: Only part of target_text is spoken.
Longer Processing Time : This seems to occur because the following conditional statements in t2s_model.py is not met, causing the loop to run all 1500 iterations:
Poor Audio Quality : The generated audio has degraded quality and less similar to the target voice.
Fixed Output Length : The output .wav file is always 1 minute long, with some text read and the rest padded with spaces.
Anomalous Log Values : When inspecting the logs in the get_tts_wav function of inference_webui.py, the pred_semantic shape and idx_value differ from the normal operation:
My guess is that in the inference_webui.py file, the infer_panel function is called, and in the infer_panel_naive function, the value of torch.argmax(logits, dim=-1)[0] or samples[0, 0] is not equal to self.EOS.
The experiment was conducted under identical conditions for the parameters of infer_panel, including all_phoneme_ids, all_phoneme_len, None if ref_free else prompt, bert, top_k, top_p, temperature, and early_stop_num, as well as the same target_text, reference_text, and reference_audio.
Related Code (inference_webui.py)
Normal Output
Abnormal Output
Thank you for taking the time to look into this issue. I truly appreciate your efforts in maintaining and improving this project, and I am happy to provide additional details or conduct further testing if needed. 🙂
The text was updated successfully, but these errors were encountered:
We have observed that the TTS results are sometimes truncated. The key characteristics of this issue are as follows:
My guess is that in the inference_webui.py file, the infer_panel function is called, and in the infer_panel_naive function, the value of torch.argmax(logits, dim=-1)[0] or samples[0, 0] is not equal to self.EOS.
The experiment was conducted under identical conditions for the parameters of infer_panel, including all_phoneme_ids, all_phoneme_len, None if ref_free else prompt, bert, top_k, top_p, temperature, and early_stop_num, as well as the same target_text, reference_text, and reference_audio.
Related Code (inference_webui.py)
Normal Output
Abnormal Output
Thank you for taking the time to look into this issue. I truly appreciate your efforts in maintaining and improving this project, and I am happy to provide additional details or conduct further testing if needed. 🙂
The text was updated successfully, but these errors were encountered: