Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New check: dot of soft dotted characters should disappear #4059

Closed
3 of 7 tasks
Tracked by #3319
moyogo opened this issue Feb 16, 2023 · 3 comments
Closed
3 of 7 tasks
Tracked by #3319

New check: dot of soft dotted characters should disappear #4059

moyogo opened this issue Feb 16, 2023 · 3 comments
Assignees
Labels
New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id
Milestone

Comments

@moyogo
Copy link
Contributor

moyogo commented Feb 16, 2023

What needs to be checked?

The dot of soft dotted characters should disappear when an accent is placed on them.
For example (U+006A LATIN SMALL LETTER J, U+0301 COMBINING ACUTE) should look like ȷ́ (U+0237 LATIN SMALL LETTER DOTLESS J, U+0301 COMBINING ACUTE).

Detailed description of the problem

Several Unicode characters have the Soft_Dotted propery as described in https://www.unicode.org/reports/tr44/#Soft_Dotted

Soft_Dotted | Characters with a "soft dot", like i or j. An accent placed on these characters causes the dot to disappear. An explicit dot above can be added where required, such as in Lithuanian. See Section 7.1, Latin in [Unicode].

See "Diacritics on i and j" in Section 7.1, "Latin" in The Unicode Standard.

Diacritics on i and j. A dotted (normal) i or j followed by some common nonspacing
marks above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with
i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases
of accented dotted-i equivalent to accented dotless-i (for example, i + ̈ ≠ ı + ̈ ). The same
pattern is used for j. Dotless-j is used in the Landsmålsalfabet, where it does not have a case
pair.
To express the forms sometimes used in the Baltic (where the dot is retained under a top
accent in dictionaries), use i + overdot + accent (see Figure 7-2).
All characters that use their dot in this manner have the Soft_Dotted property in Unicode.
Screenshot 2023-02-16 at 10 10 56

For a list of soft dotted characters see https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B:Soft_Dotted=Yes:%5D which currently shows "iⅈ𝐢𝑖𝒊𝒾𝓲𝔦𝕚𝖎𝗂𝗶𝘪𝙞𝚒ⁱᵢįị ḭ ɨᶤ 𝼚 ᶖ jⅉ𝐣𝑗𝒋𝒿𝓳𝔧𝕛𝖏𝗃𝗷𝘫𝙟𝚓ʲⱼ ɉ ʝᶨ ϳ і ј". These are Latin, Greek and Cyrillic letters, modifier letters and mathematical symbols.

Optional fix

The ccmp feature should have lookups that substitute soft dotted glyphs by their dotless equivalents when followed by marks with at least one mark above. In some cases additional glyphs are or could be needed when full decomposition is not adequate, like istroke.dotless or optionally iogonek.dotless if /idotless/ogonek is not adequate.

For example:

@COMBINING_TOP_MARKS = [gravecomb acutecomb circumflexcomb tildecomb macroncomb diaeresiscomb brevecomb dotaccentcomb caroncomb ringcomb]; # etc.

feature ccmp {
    lookup ccmp_soft_dotted {
        lookupflag UseMarkFilteringSet @COMBINING_TOP_MARKS;
        sub [i j]' @COMBINING_TOP_MARKS by [idotless jdotless];
        sub [i-cy je-cy]' @COMBINING_TOP_MARKS by [idotless jdotless];
        # for iogonek, idotbelow, itildebelow either decompose or substitute for dotless glyphs
        sub iogonek' @COMBINING_TOP_MARKS by idotless ogonekcomb; # or "by iogonek.dotless"
        sub idotbelow' @COMBINING_TOP_MARKS by idotless dotbelowcomb; # or "by idotbelow.dotless"
        sub itildebelow' @COMBINING_TOP_MARKS by idotless tildebelowcomb; # or "by itildebelow.dotless"
        # for istroke, jstroke, imod or others substitute for dotless glyphs
        sub istroke' @COMBINING_TOP_MARKS by istroke.dotless;
        # etc.
    } ccmp_soft_dotted;
} ccmp;

Resources and exact process needed to replicate

The characters with the Unicode Soft Dotted property are defined in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt and are listed on https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B:Soft_Dotted=Yes:%5D

The combining marks that should cause the dot to disappear are mostly in the Combining Diacritical Marks block, but also in the Combining Diacritical Marks Extended, Combining Diacritical Marks for Symbols, Cyrillic, Cyrillic Extended-A, Cyrillic Extended-B, Cyrillic Extended-D.

The sequences that are currently known to be used in orthographies are i̇́ i̇̀ i̋ i̍ i᷆ i᷇ i̓ i̊ i̇̃ i̐ ɨ́ ɨ̀ ɨ̂ ɨ̋ ɨ̏ ɨ̌ ɨ̄ ɨ̃ ɨ̈ ɨ̧́ ɨ̧̀ ɨ̧̂ ɨ̧̌ ɨ̱́ ɨ̱̀ ɨ̱̈ į́ į̇́ į̀ į̂ į̄ į̄́ į̄̀ į̄̂ į̄̌ į̃ į̇̃ į̌ ị́ ị̀ ị̂ ị̄ ị̃ ḭ́ ḭ̀ ḭ̄ j́ j̀ j̄ j̑ j̃ j̇̃ j̈ і́.

Expected Profile

Given this is a Unicode compliance check it should probably be in the universal profile.

  • Vendor-specific: Google Fonts
  • Vendor-specific: Adobe Fonts
  • OpenType (requirements imposed by the OpenType specification)
  • Universal (broadly accepted best practices on the type design community)
  • Other:

Expected Result

Which log result level should it have:

  • 🔥 FAIL
    The check should FAIL when a font displays the dot of soft dotted characters used in a sequence used in a language orthography.
    For example:
    • ị /idotbelow (LATIN SMALL LETTER I) is used in orthographies like Igbo (20 to 30 million speakers) where ị́ /idotbelow/acutecomb (U+0069 LATIN SMALL LETTER I, U+0301 COMBINING ACUTE) is used,
    • j /j (U+006A LATIN SMALL LETTER J) is used in orthographies like Dutch (24 million speakers) where j́ /j/acutecomb (U+006A LATIN SMALL LETTER J, U+0301 COMBINING ACUTE) is used.
    • į /iogonek (U+012F LATIN SMALL LETTER I WITH OGONEK) is used in orthographies like Navajo (160,000 speakers) where į́ /iogonek/acutecomb (U+012F LATIN SMALL LETTER I WITH OGONEK, U+0301 COMBINING ACUTE) is used.
  • ⚠️ WARN
    The check should WARN when a font displays the dot of soft dotted characters used in sequences not know to be used in an orthography.
    For example:
    • ⁱ (U+2071 SUPERSCRIPT LATIN SMALL LETTER I) is not used, as far as we know, in orthographies in combination with a combining mark above. Its combinations with mark above may be used in more specialized phonetic or scientific notation.
    • 𝐢 (U+1D422 MATHEMATICAL BOLD SMALL I) is not used in orthographies, its combinations with mark above may be used in specialized scientific notation.

Severity assessment

4, fonts that would FAIL are unsable in the languages they apparently aim to cover.

Edit:

  • 2023-05-20: simplified example ccmp lookup fix with UseMarkFilteringSet and comments.
@moyogo moyogo added the New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id label Feb 16, 2023
@felipesanches felipesanches modified the milestones: 0.8.12, 0.8.11 Feb 16, 2023
@moyogo
Copy link
Contributor Author

moyogo commented Feb 27, 2023

From googlefonts/NunitoSans#16 (comment)

I usually create a idotless_ogonek ligature so it can work in Indesign, but I don't know if it is such a good idea.
sub iogonek' @CombiningTopAccents by idotless_ogonek;

That makes sense. InDesign would need both iogonek and idotbelow then. Fonts with more soft dotted characters would need more dotless glyphs for InDesign as well. Should that be the guidelines then, I can update #4059 accordingly?

Yep, we should add them all in the related glyphsets, and make sure the feature is complete when these glyphs are being added.

We could have

  • WARN: "There is a iogonek in this font, you may want to add idotless_ogonek to provide better support to african and native american languages"
  • FAIL : "this font contains idotless_ogonek, therefore it should have this line in the ccmp feature".

The question is, do we go with idotlessogonek (new unencoded glyph), idotless_ogonek (ligature) or iogonek.dotless (variante) in the glyphsets? The important thing is only the substitution to one glyph.

felipesanches pushed a commit to moyogo/fontbakery that referenced this issue Feb 27, 2023
"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot."
Added to the Universal Profile
(issue fonttools#4059)
felipesanches pushed a commit that referenced this issue Feb 27, 2023
"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot."
Added to the Universal Profile
(issue #4059)
@MariannaPaszkowska
Copy link

@felipesanches Does this test verify if the full character set for languages requiring soft-dotted characters is present in the font? My tests on some fonts suggest it might not. I propose we might not need to flag a FAIL for fonts that don't support languages requiring the dropping of soft-dotted characters.

@moyogo
Copy link
Contributor Author

moyogo commented Jun 17, 2023

@MariannaPaszkowska #4140 does change it to WARN among other things.

rimas-kudelis added a commit to rimas-kudelis/jonova that referenced this issue Sep 24, 2024
- Implementing the disappearing soft-dot on letter i per example at fonttools/fontbakery#4059
- Moved lookups inside their relevant feature blocks, since they were only referenced once (not sure if it changes anything in the end, but it seems more logical, at least for now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id
Projects
None yet
Development

No branches or pull requests

3 participants