Failure example ("letter") #20

xenova · 2025-02-02T23:28:47Z

>>> from misaki import en
>>> g2p = en.G2P(trf=False, british=False, fallback=None)
>>> phonemes, tokens = g2p("the letter")
>>> phonemes
'ðə lˈɛɾəɹ'

but it should be

ðə ˈɫɛtɝ

The text was updated successfully, but these errors were encountered:

hexgrad · 2025-02-02T23:41:23Z

the letter => ðə lˈɛɾəɹ is actually correct based on the custom v1.0 English phoneset documented here (which is just missing ɐ): https://github.com/hexgrad/misaki/blob/main/EN_PHONES.md

For these types of speech models, your phoneset can vary slightly from what is by-the-book correct IPA, as long as its consistently trained. You can see how the various phonemes are used in this demo: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

xenova · 2025-02-03T00:08:43Z

I see! Thanks! Does this mean that the to_espeak function should be updated?

def to_espeak(ps):
    # Optionally, you can add a tie character in between the 2 replacement characters.
    ps = ps.replace('ʤ', 'dʒ').replace('ʧ', 'tʃ')
    ps = ps.replace('A', 'eɪ').replace('I', 'aɪ').replace('Y', 'ɔɪ')
    ps = ps.replace('O', 'oʊ').replace('Q', 'əʊ').replace('W', 'aʊ')
    return ps.replace('ᵊ', 'ə')

(or is there another way to convert to IPA?)

hexgrad · 2025-02-03T00:19:38Z

The to_espeak function converts back to espeak-ng phonemes, aka more standard IPA.

If going from standard IPA phonemes to the custom vocab Kokoro v1.0 understands, that would be more like the mapping logic in the EspeakFallback class:

misaki/misaki/espeak.py

Lines 22 to 67 in 2432307

    
           # EspeakFallback is used as a last resort for English 
        
           class EspeakFallback: 
        
               E2M = sorted({ 
        
                   'ʔˌn\u0329':'tn', 'ʔn\u0329':'tn', 'ʔn':'tn', 'ʔ':'t', 
        
                   'a^ɪ':'I', 'a^ʊ':'W', 
        
                   'd^ʒ':'ʤ', 
        
                   'e^ɪ':'A', 'e':'A', 
        
                   't^ʃ':'ʧ', 
        
                   'ɔ^ɪ':'Y', 
        
                   'ə^l':'ᵊl', 
        
                   'ʲo':'jo', 'ʲə':'jə', 'ʲ':'', 
        
                   'ɚ':'əɹ', 
        
                   'r':'ɹ', 
        
                   'x':'k', 'ç':'k', 
        
                   'ɐ':'ə', 
        
                   'ɬ':'l', 
        
                   '\u0303':'', 
        
               }.items(), key=lambda kv: -len(kv[0])) 
        
               def __init__(self, british): 
        
                   self.british = british 
        
                   self.backend = phonemizer.backend.EspeakBackend( 
        
                       language=f"en-{'gb' if british else 'us'}", 
        
                       preserve_punctuation=True, with_stress=True, tie='^' 
        
                   ) 
        
               def __call__(self, token): 
        
                   ps = self.backend.phonemize([token.text]) 
        
                   if not ps: 
        
                       return None, None 
        
                   ps = ps[0].strip() 
        
                   for old, new in type(self).E2M: 
        
                       ps = ps.replace(old, new) 
        
                   ps = re.sub(r'(\S)\u0329', r'ᵊ\1', ps).replace(chr(809), '') 
        
                   if self.british: 
        
                       ps = ps.replace('e^ə', 'ɛː') 
        
                       ps = ps.replace('iə', 'ɪə') 
        
                       ps = ps.replace('ə^ʊ', 'Q') 
        
                   else: 
        
                       ps = ps.replace('o^ʊ', 'O') 
        
                       ps = ps.replace('ɜːɹ', 'ɜɹ') 
        
                       ps = ps.replace('ɜː', 'ɜɹ') 
        
                       ps = ps.replace('ɪə', 'iə') 
        
                       ps = ps.replace('ː', '') 
        
                   ps = ps.replace('o', 'ɔ') # for espeak < 1.52 
        
                   return ps.replace('^', ''), 2

xenova · 2025-02-03T00:22:44Z

The to_espeak function converts back to espeak-ng phonemes, aka more standard IPA.

Yes, exactly :) So, to convert from misaki to IPA (needed for my use-case), how should the "the letter" case be handled?

hexgrad · 2025-02-03T00:28:25Z

Oh, if going from misaki to IPA, I think that to_espeak function should be mostly accurate, but it may not be complete. For example, like you pointed out, misaki writes əɹ while other IPA systems might use ɝ or in the case of espeak, ɚ.

May I ask what use-case requires misaki to IPA? (That was originally intended for linguists/researchers to understand mapping back to standard phonemes.) If running Kokoro v1.0 inference, just using misaki is the way to go. If doing v0.19, you can use espeak-ng directly.

xenova · 2025-02-03T00:46:05Z

I'm currently setting up an evaluation framework/benchmark to:

compare different LLMs for the G2P task. I found that many LLMs fail on very basic examples like homographs, even though the context should be very clear based on the input.
generate synthetic data by using multiple LLMs and arriving at a consensus. As part of this, I'll be using non-LLM approaches, to help with voting too!

TLDR: I need a standard format all models understand, so I chose IPA.

hexgrad · 2025-02-03T01:10:05Z

Ah, makes sense. I have definitely spent a fair amount of time thinking about G2P, both neural and non-neural. For English I'm fairly bearish on neural G2P, unless it is (1) implicitly done as part of large end-to-end TTS or (2) used as a last resort fallback model. From what I've seen, neural English G2P simply does not put up good numbers on the speed vs accuracy tradeoff curve.

Feel free to use misaki to produce this data or use the .json data files directly, but you should keep in mind these may not losslessly bridge the gap back to standard IPA. For example, misaki[en] will only use the vowel extender ː in British English and not American English, but many other G2P systems will include the vowel extender for both. It still could be helpful for consensus voting, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure example ("letter") #20

Failure example ("letter") #20

xenova commented Feb 2, 2025

hexgrad commented Feb 2, 2025

xenova commented Feb 3, 2025

hexgrad commented Feb 3, 2025

xenova commented Feb 3, 2025

hexgrad commented Feb 3, 2025

xenova commented Feb 3, 2025 •

edited

Loading

hexgrad commented Feb 3, 2025

Failure example ("letter") #20

Failure example ("letter") #20

Comments

xenova commented Feb 2, 2025

hexgrad commented Feb 2, 2025

xenova commented Feb 3, 2025

hexgrad commented Feb 3, 2025

xenova commented Feb 3, 2025

hexgrad commented Feb 3, 2025

xenova commented Feb 3, 2025 • edited Loading

hexgrad commented Feb 3, 2025

xenova commented Feb 3, 2025 •

edited

Loading