TTS Results Occasionally Truncated #1992

KangSquad · 2025-01-22T06:27:16Z

We have observed that the TTS results are sometimes truncated. The key characteristics of this issue are as follows:

Incomplete Output: Only part of target_text is spoken.
Longer Processing Time : This seems to occur because the following conditional statements in t2s_model.py is not met, causing the loop to run all 1500 iterations:

if torch.argmax(logits, dim=-1)[0] == self.EOS or samples[0, 0] == self.EOS:
    stop = True
if stop:
    if y.shape[1] == 0:
        y = torch.concat([y, torch.zeros_like(samples)], dim=1)
        print("bad zero prediction")
    print(f"T2S Decoding EOS [{prefix_len} -> {y.shape[1]}]")
    break

Poor Audio Quality : The generated audio has degraded quality and less similar to the target voice.
Fixed Output Length : The output .wav file is always 1 minute long, with some text read and the rest padded with spaces.
Anomalous Log Values : When inspecting the logs in the get_tts_wav function of inference_webui.py, the pred_semantic shape and idx_value differ from the normal operation:

pred_semantic shape: torch.Size([1, 1, 1498])
value: 1498

My guess is that in the inference_webui.py file, the infer_panel function is called, and in the infer_panel_naive function, the value of torch.argmax(logits, dim=-1)[0] or samples[0, 0] is not equal to self.EOS.

The experiment was conducted under identical conditions for the parameters of infer_panel, including all_phoneme_ids, all_phoneme_len, None if ref_free else prompt, bert, top_k, top_p, temperature, and early_stop_num, as well as the same target_text, reference_text, and reference_audio.

Related Code (inference_webui.py)

Normal Output

Abnormal Output

Thank you for taking the time to look into this issue. I truly appreciate your efforts in maintaining and improving this project, and I am happy to provide additional details or conduct further testing if needed. 🙂

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS Results Occasionally Truncated #1992

TTS Results Occasionally Truncated #1992

KangSquad commented Jan 22, 2025

TTS Results Occasionally Truncated #1992

TTS Results Occasionally Truncated #1992

Comments

KangSquad commented Jan 22, 2025