-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure example ("letter") #20
Comments
For these types of speech models, your phoneset can vary slightly from what is by-the-book correct IPA, as long as its consistently trained. You can see how the various phonemes are used in this demo: https://huggingface.co/spaces/hexgrad/Kokoro-TTS |
I see! Thanks! Does this mean that the def to_espeak(ps):
# Optionally, you can add a tie character in between the 2 replacement characters.
ps = ps.replace('ʤ', 'dʒ').replace('ʧ', 'tʃ')
ps = ps.replace('A', 'eɪ').replace('I', 'aɪ').replace('Y', 'ɔɪ')
ps = ps.replace('O', 'oʊ').replace('Q', 'əʊ').replace('W', 'aʊ')
return ps.replace('ᵊ', 'ə') (or is there another way to convert to IPA?) |
The If going from standard IPA phonemes to the custom vocab Kokoro v1.0 understands, that would be more like the mapping logic in the Lines 22 to 67 in 2432307
|
Yes, exactly :) So, to convert from misaki to IPA (needed for my use-case), how should the "the letter" case be handled? |
Oh, if going from misaki to IPA, I think that May I ask what use-case requires misaki to IPA? (That was originally intended for linguists/researchers to understand mapping back to standard phonemes.) If running Kokoro v1.0 inference, just using misaki is the way to go. If doing v0.19, you can use espeak-ng directly. |
I'm currently setting up an evaluation framework/benchmark to:
TLDR: I need a standard format all models understand, so I chose IPA. |
Ah, makes sense. I have definitely spent a fair amount of time thinking about G2P, both neural and non-neural. For English I'm fairly bearish on neural G2P, unless it is (1) implicitly done as part of large end-to-end TTS or (2) used as a last resort fallback model. From what I've seen, neural English G2P simply does not put up good numbers on the speed vs accuracy tradeoff curve. Feel free to use |
but it should be
The text was updated successfully, but these errors were encountered: