Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowering the pitch below 50 will cause the voice not to work when using the NaturalVoiceSAPIAdapter #276

Closed
cary-rowen opened this issue Nov 3, 2024 · 12 comments

Comments

@cary-rowen
Copy link
Collaborator

Describe the Problem

Lowering the pitch below 50 will cause the voice not to work when using the NaturalVoiceSAPIAdapter

To Reproduce

Steps to reproduce the behavior:

  1. Follow the guide to use the NaturalVoiceSAPIAdapter to use high-quality Microsoft voices.
  2. Create a new sapi5 voice profile.
  3. Set the pitch to 45 and activate the profile.

Expected behavior

TTS read aloud

If the problem is related to a file, indicate the file you have opened

any file

Desktop (please complete the following information):

  • OS: [Windows 10 22H2 (AMD64) build 19045.5011]
  • Bookworm version [Latest Build]
  • Recent settings you may have changed in Bookworm [None]

Additional context

tbd

@mush42
Copy link
Collaborator

mush42 commented Nov 3, 2024

Any way to figure out the accepted pitch ranges that does not cause the TTS to crash?

If there is no way, we should resort to runtime checks, matching on voice key, to set pitch to a fixed value.

@cary-rowen
Copy link
Collaborator Author

This comes from community feedback, which I haven't investigated yet.
But this TTS works as expected on NVDA.

@gexgd0419
Copy link

Seems that bookworm will add <xml version="1.0"> and </xml> at the beginning and the end of the XML string.

This is, in fact, not needed by SAPI5. As SAPI does not recognize the tag, it is passed to the TTS engine as an "unknown tag".

NaturalVoiceSAPIAdapter's current release version (v0.2), unfortunately, does not process unknown XML tags properly, so when there's a <pitch> tag inserted in the middle, the result SSML can be messed up.

(NaturalVoiceSAPIAdapter converts the SAPI commands back to SSML to send them to the Microsoft voices.)

I'm working on a fix, and the next release should fix this.

If the XML text is not surrounded with an <xml> tag, it will also work.

@cary-rowen
Copy link
Collaborator Author

Many Thanks @gexgd0419
@mush42
Do you have any thoughts on what @gexgd0419 discussed about us adding <xml version="1.0"> and </xml>?

@gexgd0419
Copy link

The latest release v0.2.1 of NaturalVoiceSAPIAdapter should fix this issue.

@mush42
Copy link
Collaborator

mush42 commented Nov 12, 2024

@cary-rowen wdyt?

@cary-rowen
Copy link
Collaborator Author

Many thanks @gexgd0419
I will close this

@cary-rowen
Copy link
Collaborator Author

cary-rowen commented Nov 12, 2024

Hi @gexgd0419
This issue can still be reproduced using the Microsoft Edge online speech engine, do you plan to apply the same fix in a future release?

@gexgd0419
Copy link

Seems that Edge voice server does not allow more than two <prosody> tags. Although the SSML now has a valid XML structure, the existence of unknown XML tags leaves two empty <prosody> tags. The XML tags themselves are removed due to not being supported, but the <prosody> tags are still generated. As a result, there are three <prosody> tags in total, so it is rejected.

The fix will be in the next release.

@gexgd0419
Copy link

This should be fixed in v0.2.2.

@cary-rowen
Copy link
Collaborator Author

@gexgd0419 Thanks again!

@mush42
Copy link
Collaborator

mush42 commented Nov 13, 2024

@gexgd0419 thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants