-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails when attempting to read Norwegian #42
Comments
@bruce133: yes, the language code variants may cause problems when matching website metadata languages and browser language detection routines against voices' built-in language codes. Norwegian may be a trickier case than usual, depending on how important the differences between Thank you for providing three different example URLs. It would certainly help usability if Talkie "just works" on "all" Norwegian websites.
One approach for Talkie is to attempt to expand all Norwegian language codes to all three variants, and then find voices which matches either. A perfect match ( Thus all Norwegian voices need to be treated under a single language code, meaning that rather than expanding to all language code variants it is easier to reduce to a single "canonical" code Dialects and regions may also need parsing and special handling, but the example The same should be applied to other languages where mappings are needed, both for parsing website and voice languages. You are right in that Talkie's user interface uses
Note that voices' self-specified language does not use "plain" ISO 639 but IETF BCP 47, which complicates things further. Talkie itself is fairly neutral with regards to matching website/voice language codes. Most matches are made against the major language code, since relatively few websites specify a dialect. Talkie already has some rudimentary language code 1:1 mapping for issues I stumbled upon myself during testing. For Norwegian variants a single 1:1 mapping would not be enough, but perhaps two mappings would.
I have previously looked at using wooorm's BCP-47 library to parse language tags, at least to reliably extract language/region for some mislabeled voices. It includes limited mapping for Norwegian: |
Have a look at https://www.synthesia.io/features/languages; they only list the variant "Norwegian - Natural", which is really a typical Oslo dialect, associated with Bokmål. Normalizing in this manner would probably be the most efficient. |
Expected behavior
Should use an available Norwegian voice when trying to read Norwegian text.
Actual behavior
Plays the error message "Sorry, no available voice language detected for the selected text."
Steps to reproduce behavior
Examples of Norwegian websites
Text and language
System information
Additional information
The problem is likely caused by the language being detected as "no", while the ISO language code used by Talkie for Norwegian is "nb". Note that "nb" is actually the ISO code for Norwegian Bokmål, which is one of two official written standards in Norway.
Here are the three Norwegian ISO codes, sourced from https://www.w3schools.com/tags/ref_language_codes.asp:
It might also be worth mentioning that you'll likely see different ISO codes being used on Norwegian sites; on the three examples provided, different HTML
lang
attribute values are being used:The text was updated successfully, but these errors were encountered: