Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds in _create_audio() #115

Open
maxpatiiuk opened this issue Feb 23, 2025 · 3 comments
Open

Index out of bounds in _create_audio() #115

maxpatiiuk opened this issue Feb 23, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@maxpatiiuk
Copy link

What happened?

Follow up on #95

I get an IndexError: index 510 is out of bounds for axis 0 with size 510 error for any input text that contains a long-ish sentence.

Steps to reproduce

Python script:

# examples/play.py
from kokoro_onnx import Kokoro

kokoro = Kokoro("kokoro-v1.0.onnx", "voices-v1.0.bin")
# Putting a dot in the middle of this text fixes the issue:
text="It may be that this communication will be considered as a madman's freak but at any rate it must be admitted that in its clearness and frankness it left nothing to be desired The serious part of it was that the Federal Government had undertaken to treat a sale by auction as a valid concession of these undiscovered territories Opinions on the matter were many Some readers saw in it only one of those prodigious outbursts of American humbug which would exceed the limits of puffism if the depths of human credulity were not unfathomable"
kokoro.create(text, voice="af_heart", lang="en-us")

Run the script:

LOG_LEVEL=DEBUG python examples/play.py


Thank you for the work on kokoro-onnx!

What OS are you seeing the problem on?

MacOS

Package version

0.4.2

Relevant log output

DEBUG    [__init__.py:34] koko-onnx version 0.4.2 on macOS-15.3.1-arm64-arm-64bit Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000
DEBUG    [__init__.py:53] Providers: ['CPUExecutionProvider']
DEBUG    [__init__.py:169] Creating audio for 2 batches for 556 phonemes
DEBUG    [__init__.py:76] Phonemes: 
DEBUG    [__init__.py:100] Created audio in length of 0.47s for 0 phonemes in 0.16s (RTF: 0.33
DEBUG    [__init__.py:76] Phonemes: ɪt mˈeɪ biː ðæt ðɪs kəmjˌuːnɪkˈeɪʃən wɪl biː kənsˈɪdɚd æz ɐ mˈædmənz fɹˈiːk bˌʌt æɾ ˌɛni ɹˈeɪt ɪt mˈʌst biː ɐdmˈɪɾᵻd ðæt ɪn ɪts klˈɪɹnəs ænd fɹˈæŋknəs ɪt lˈɛft nˈʌθɪŋ təbi dɪzˈaɪɚd ðə sˈɪɹiəs pˈɑːɹt ʌv ɪt wʌz ðætðə fˈɛdɚɹəl ɡˈʌvɚnmənt hæd ˌʌndɚtˈeɪkən tə tɹˈiːt ɐ sˈeɪl baɪ ˈɔːkʃən æz ɐ vˈælɪd kənsˈɛʃən ʌv ðiːz ʌndɪskˈʌvɚd tˈɛɹɪtˌɔːɹiz əpˈɪniənz ɔnðə mˈæɾɚ wɜː mˈɛni sˌʌm ɹˈiːdɚz sˈɔː ɪn ɪɾ ˈoʊnli wˈʌn ʌv ðoʊz pɹədˈɪdʒəs ˈaʊtbɜːsts ʌv ɐmˈɛɹɪkən hˈʌmbʌɡ wˌɪtʃ wʊd ɛksˈiːd ðə lˈɪmɪts ʌv pˈʌfɪzəm ɪf ðə dˈɛpθs ʌv hjˈuːmən kɹɛdʒˈuːlᵻɾi wɜː nˌɑːt ʌnfˈæðəməbəl
# (I modified the warning in the source code to include more details)
WARNING  [__init__.py:78] Phonemes (556) are too long, truncating to 510 phonemes (ɪt mˈeɪ biː ðæt ðɪs kəmjˌuːnɪkˈeɪʃən wɪl biː kənsˈɪdɚd æz ɐ mˈædmənz fɹˈiːk bˌʌt æɾ ˌɛni ɹˈeɪt ɪt mˈʌst biː ɐdmˈɪɾᵻd ðæt ɪn ɪts klˈɪɹnəs ænd fɹˈæŋknəs ɪt lˈɛft nˈʌθɪŋ təbi dɪzˈaɪɚd ðə sˈɪɹiəs pˈɑːɹt ʌv ɪt wʌz ðætðə fˈɛdɚɹəl ɡˈʌvɚnmənt hæd ˌʌndɚtˈeɪkən tə tɹˈiːt ɐ sˈeɪl baɪ ˈɔːkʃən æz ɐ vˈælɪd kənsˈɛʃən ʌv ðiːz ʌndɪskˈʌvɚd tˈɛɹɪtˌɔːɹiz əpˈɪniənz ɔnðə mˈæɾɚ wɜː mˈɛni sˌʌm ɹˈiːdɚz sˈɔː ɪn ɪɾ ˈoʊnli wˈʌn ʌv ðoʊz pɹədˈɪdʒəs ˈaʊtbɜːsts ʌv ɐmˈɛɹɪkən hˈʌmbʌɡ wˌɪtʃ wʊd ɛksˈiːd ðə lˈɪmɪts ʌv pˈʌfɪzəm ɪf ðə dˈɛpθs ʌv hjˈuːmən kɹɛdʒˈuːlᵻɾi wɜː nˌɑːt ʌnfˈæðəməbəl)
Traceback (most recent call last):
  File "/Users/maxpatiiuk/site/python/tts-nn/kokoro-onnx/examples/play.py", line 6, in <module>
    kokoro.create(text, voice="af_heart", lang="en-us")
  File "/Users/maxpatiiuk/site/python/tts-nn/venv/lib/python3.12/site-packages/kokoro_onnx/__init__.py", line 173, in create
    audio_part, _ = self._create_audio(phonemes, voice, speed)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maxpatiiuk/site/python/tts-nn/venv/lib/python3.12/site-packages/kokoro_onnx/__init__.py", line 88, in _create_audio
    voice = voice[len(tokens)]
            ~~~~~^^^^^^^^^^^^^
IndexError: index 510 is out of bounds for axis 0 with size 510
@maxpatiiuk maxpatiiuk added the bug Something isn't working label Feb 23, 2025
@maxpatiiuk
Copy link
Author

My workaround was to override _split_phonemes with a more robust implementation:

class FixedKokoro(Kokoro):
    # Workaround for https://github.com/thewh1teagle/kokoro-onnx/issues/115
    def _split_phonemes(self, phonemes: str) -> list[str]:
        batched_phonemes = []
        while len(phonemes) > MAX_PHONEME_LENGTH:
            # Find best split point within limit
            split_idx = MAX_PHONEME_LENGTH
            
            # Try to find the last period before MAX_PHONEME_LENGTH
            period_idx = phonemes.rfind('.', 0, MAX_PHONEME_LENGTH)
            if period_idx != -1:
                split_idx = period_idx + 1  # Include period
            
            else:
                # Try other punctuation
                match = re.search(r'[!?;,]', phonemes[:MAX_PHONEME_LENGTH][::-1])  # Search backwards
                if match:
                    split_idx = MAX_PHONEME_LENGTH - match.start()
                
                else:
                    # Try last space
                    space_idx = phonemes.rfind(' ', 0, MAX_PHONEME_LENGTH)
                    if space_idx != -1:
                        split_idx = space_idx
            
            # If no good split point is found, force split at MAX_PHONEME_LENGTH
            chunk = phonemes[:split_idx].strip()
            batched_phonemes.append(chunk)
            
            # Move to the next part
            phonemes = phonemes[split_idx:].strip()
        
        # Add remaining phonemes
        if phonemes:
            batched_phonemes.append(phonemes)
        
        return batched_phonemes

Happy to open a PR

@maxpatiiuk
Copy link
Author

Although, I still get an exception if the input is exactly MAX_PHONEME_LENGTH long. Here is the updated _split_phonemes to reduce max_length by 1:

class FixedKokoro(Kokoro):
    # Workaround for https://github.com/thewh1teagle/kokoro-onnx/issues/115
    def _split_phonemes(self, phonemes: str) -> list[str]:
        max_length = MAX_PHONEME_LENGTH - 1
        batched_phonemes = []
        while len(phonemes) > max_length:
            # Find best split point within limit
            split_idx = max_length
            
            # Try to find the last period before max_length
            period_idx = phonemes.rfind('.', 0, max_length)
            if period_idx != -1:
                split_idx = period_idx + 1  # Include period
            
            else:
                # Try other punctuation
                match = re.search(r'[!?;,]', phonemes[:max_length][::-1])  # Search backwards
                if match:
                    split_idx = max_length - match.start()
                
                else:
                    # Try last space
                    space_idx = phonemes.rfind(' ', 0, max_length)
                    if space_idx != -1:
                        split_idx = space_idx
            
            # If no good split point is found, force split at max_length
            chunk = phonemes[:split_idx].strip()
            batched_phonemes.append(chunk)
            
            # Move to the next part
            phonemes = phonemes[split_idx:].strip()
        
        # Add remaining phonemes
        if phonemes:
            batched_phonemes.append(phonemes)
        
        return batched_phonemes

@freddyaboulton
Copy link

Yes I got the same error !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants