Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with languages #38

Open
morethanbooks opened this issue Sep 19, 2024 · 1 comment
Open

Issues with languages #38

morethanbooks opened this issue Sep 19, 2024 · 1 comment
Assignees

Comments

@morethanbooks
Copy link

  • Albanian.xml: language code is in the metadata table right now "als". This is Tosk Albanian (https://iso639-3.sil.org/code/als). In the TEI file it is "alb", which is the code of the macrolanguage. I would keep "alb"
  • Arabic.xml: in the table the code for Arabic is "arb" and in the XML file is "ara", which is the code for the macrolanguage. I would use "ara" in both files.
  • Aukan-NT.xml: in the table and XML file is the code "djk". However, the SIL and wikipedia prefer "Eastern Maroon Creole". Wikipedia gives Aukan as possible name, SIL does not.
  • Campa-NT.xml: the name of the language in the corpus is Campa, but both SIL and Wikipedia prefer Asháninka: https://iso639-3.sil.org/code/cni https://en.wikipedia.org/wiki/Ash%C3%A1ninka_language
  • Chinese.xml: the SIL calls it Mandarin Chinese (https://iso639-3.sil.org/code/cmn). I would consider call it like that.
  • Jakalteko-NT.xml: in the table the code is "jai", in the XML file is "jak". Actually, I would use "jac": https://iso639-3.sil.org/code/jac
  • Myanmar.xml: both Wikipedia and SIL use the term "Burmese" and not "Myanmar", I would use "Burmese"
  • Ojibwa-NT.xml: both in the table and XML file the code is "ojb". This is the code for Northwestern Ojibwa; Ojibwa has the code "oji". I would use "oji".
  • Quichua-NT.xml: both in the table and XML file the code is "quw", which is the code of "Tena Lowland Quichua". The code for Quechua or Quichua is "que". I would use "que".
  • Syriac-NT.xml: in the table, the code is "arc", which is a code for Aramaic. In the XML file, the code is "syr", which is correct. Correct it in the metadata table.
@christos-c christos-c self-assigned this Sep 20, 2024
christos-c added a commit that referenced this issue Sep 20, 2024
@christos-c
Copy link
Owner

Addressed most of the issues with #39 - let me know if all looks good and I can merge.

For Aukan, Wikipedia, Glottolog, and WALS list either Aukan or Ndyuka as a name (together with Eastern Maroon Creole). I would trust the typology databases over SIL but happy to change if it's more important for the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants