Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Persian voice #404

Open
DonMonro opened this issue Mar 3, 2025 · 30 comments
Open

Problem with Persian voice #404

DonMonro opened this issue Mar 3, 2025 · 30 comments

Comments

@DonMonro
Copy link

DonMonro commented Mar 3, 2025

Hello
I realized when I run your Colab for Persian that it wasn't working properly, meaning the voice playback speed was too high and it sounded very strange, like a robot or an alien. What's the problem? or some time read the strange text not persian and not even have mean ....

@ROBERT-MCDOWELL
Copy link
Collaborator

I don't maintain colab so maybe it's an old version with a bug that has been fixed already. I recall that the default sample rate was wrong for fairseq tts engine and was fixed long time ago.

@DrewThomasson DrewThomasson self-assigned this Mar 3, 2025
@DrewThomasson
Copy link
Owner

DrewThomasson commented Mar 3, 2025

I'm the one maintaining the colab

It should be pulling from the latest in the main tho... and that issue should have been fixed...

I'll check but hit me up if I forget about this

@DrewThomasson
Copy link
Owner

Hello
I realized when I run your Colab for Persian that it wasn't working properly, meaning the voice playback speed was too high and it sounded very strange, like a robot or an alien. What's the problem? or some time read the strange text not persian and not even have mean ....

Could you give us a sample text that is running into that issue on your end for us to test out?

@DrewThomasson
Copy link
Owner

Cause, I'm not getting the issue your describing when running it in colab or locally.

My results

My input

test_persion.txt

My Local output

test_persion_local.m4b.zip

My Colab output

test_persion_colab.m4b.zip

@DrewThomasson
Copy link
Owner

@DonMonro ?

@DonMonro
Copy link
Author

DonMonro commented Mar 4, 2025

Hello
I realized when I run your Colab for Persian that it wasn't working properly, meaning the voice playback speed was too high and it sounded very strange, like a robot or an alien. What's the problem? or some time read the strange text not persian and not even have mean ....

Could you give us a sample text that is running into that issue on your end for us to test out?

Yes definitely, this is an example.
https://github.com/DonMonro/test/blob/main/%D8%AF%DB%8C%D9%85%D8%A7%D9%87%201391%20%D8%A8%D9%88%D8%AF%20%D9%88%20%D8%AF%D8%A7%D8%B4%D8%AA%D9%85%20%D8%A7%D8%AE%D8%A8%D8%A7%D8%B1%20%D8%A8%DB%8C%D8%A8%DB%8C%D8%B3%DB%8C%20%D8%B1%D8%A7%20%D9%85%DB%8C%D8%AF%DB%8C%D8%AF%D9%85.pdf
In this case, the voice is not a robot, but the text it reads is completely meaningless and strange and has nothing to do with the text of the book.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Mar 4, 2025

Have you tried giving it input files that are not pdf?

Perhaps a epub or txt instead?

PDF's are notoriously difficult to convert into txt

That may be the cause of your problem

@DonMonro
Copy link
Author

DonMonro commented Mar 4, 2025

Have you tried giving it input files that are not pdf?

Perhaps a epub or txt instead?

PDF's are notoriously difficult to convert into txt

That may be the cause of your problem

yeah its work better with .txt about 85% but still some time read the strange text and meaningless ...
is this cuz dataset?

@DrewThomasson
Copy link
Owner

hm , it could potentially be that the Persian model is not very good...

Also when you say "some time read the strange text and meaningless"

  • Are you saying that it's pronouncing things weirdly? or like completely hallucinating and making weird noises?

@DonMonro
Copy link
Author

DonMonro commented Mar 4, 2025

hm , it could potentially be that the Persian model is not very good...

Also when you say "some time read the strange text and meaningless"

  • Are you saying that it's pronouncing things weirdly? or like completely hallucinating and making weird noises?

Yes, you're probably right.
I mean, reading Persian letters one after the other in a meaningless way, not a single meaningful word.

@ROBERT-MCDOWELL
Copy link
Collaborator

fairseq is not the same quality of xttsv2 or bark. it can be some glitches caused by special punctuations or just a space at the wrong place.

@DonMonro
Copy link
Author

DonMonro commented Mar 4, 2025

ok, whatever it's so weak for persian.
you can close this.

@ROBERT-MCDOWELL
Copy link
Collaborator

ROBERT-MCDOWELL commented Mar 4, 2025

you can help us to improve it rather than give up. without community help nothing can evolve.
as it's your mother language, why not to find a fairseq compatible model or even xttsv2 or else with better quality?
I recall everyone of you, an open source project is for everybody but everybody must contribute to make it better.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Mar 4, 2025

It looks like there's also another Persian glow-tts model when I look at the list of coqui-tts models

@ROBERT-MCDOWELL
Copy link
Collaborator

ROBERT-MCDOWELL commented Mar 4, 2025

I would like a comment from @DonMonro who can help us to find a better model for fairseq or else if not another tts engine.
we cannot search on persian website, for him it would be easy.
I will integrate glow-tts after the next PR.

@mahdi155000
Copy link
Contributor

Hi.

Unfortunately, the Persian community is very weak in this field, and if something is found, it has to be searched for in English.

I searched a lot, and the only thing I found that has been pre-trained is this:

https://github.com/SadeghKrmi/pertts-streamlit

But I can't tell if this can be used in your project or not.

@ROBERT-MCDOWELL
Copy link
Collaborator

this model is used with piper-tts engine, which we wanted to integrate into eb2ab, but last time we checked it was not possible to do it since piper-tts is locked to python 3.10 max. maybe we should push their dev to upgrade to python 3.12

@DrewThomasson
Copy link
Owner

If anyone ever gets theses added to the coqui-tts in a PR then it would also help out with this

https://github.com/karim23657/awesome-Persian-Speech?tab=readme-ov-file

@ROBERT-MCDOWELL
Copy link
Collaborator

@DonMonro could you provide the text you got the issue?

@mahdi155000
Copy link
Contributor

It mispronounces many words, especially those that have come from foreign languages into Persian, such as 'system' or in Persian 'سیستم'. This word has a lot of usage in Persian, but it is inherently English. This problem stems from the language model.

Often, it creates irregular and unnecessary pauses between words, and sometimes the speaker's voice changes within the text. For example, I tested this on this text.

test_persian.txt

test_persian.m4b.zip

@ROBERT-MCDOWELL
Copy link
Collaborator

I need only the original text to see if it's the punctuation doing some glitches. the rest cannot be solved like foreign languages pronounced in persian etc....

@ROBERT-MCDOWELL
Copy link
Collaborator

btw, to get a perfect A.I. TTS pronounced it does not come only from the model, but also how it's written, and even sometimes the word itself must be phoneticly changed to pronounced it well.

@ROBERT-MCDOWELL
Copy link
Collaborator

maybe fixed in the next update

@DrewThomasson
Copy link
Owner

DrewThomasson commented Mar 8, 2025

@mahdi155000

how does the vits female (best)

voice sound in this free demo hugginface space to you?

Cause I made a PR to coqui-tts to add that model here idiap/coqui-ai-TTS#332 and We need feedback from a persian speaker

:)

@mahdi155000
Copy link
Contributor

In fact, it works really well. There is still a lot of room for improvement, but it's the best Persian model I've ever seen. I'm very excited that you were able to find this.

@DrewThomasson
Copy link
Owner

DrewThomasson commented Mar 9, 2025

Thank you I will report this to the coqui team about my PR

@DrewThomasson
Copy link
Owner

Give it a thumbs up or something to give it more attention as it might help it pass faster

idiap/coqui-ai-TTS#332

@ROBERT-MCDOWELL
Copy link
Collaborator

@mahdi155000 is this attached audio (you must unzip) result better from the text you provided?

persian.zip

@mahdi155000
Copy link
Contributor

It's just a little better. The issue with the pauses has been resolved, but it still pronounces the words poorly. Overall, it has improved, but it doesn't reach the quality of coqui

@ROBERT-MCDOWELL
Copy link
Collaborator

pronounciation is due to the model quality, I worked on the code for the pauses issue. thanks for your report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants