New check: dot of soft dotted characters should disappear #4059

moyogo · 2023-02-16T20:33:40Z

What needs to be checked?

The dot of soft dotted characters should disappear when an accent is placed on them.
For example j́ (U+006A LATIN SMALL LETTER J, U+0301 COMBINING ACUTE) should look like ȷ́ (U+0237 LATIN SMALL LETTER DOTLESS J, U+0301 COMBINING ACUTE).

Detailed description of the problem

Several Unicode characters have the Soft_Dotted propery as described in https://www.unicode.org/reports/tr44/#Soft_Dotted

Soft_Dotted | Characters with a "soft dot", like i or j. An accent placed on these characters causes the dot to disappear. An explicit dot above can be added where required, such as in Lithuanian. See Section 7.1, Latin in [Unicode].

See "Diacritics on i and j" in Section 7.1, "Latin" in The Unicode Standard.

Diacritics on i and j. A dotted (normal) i or j followed by some common nonspacing
marks above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with
i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases
of accented dotted-i equivalent to accented dotless-i (for example, i + ̈ ≠ ı + ̈ ). The same
pattern is used for j. Dotless-j is used in the Landsmålsalfabet, where it does not have a case
pair.
To express the forms sometimes used in the Baltic (where the dot is retained under a top
accent in dictionaries), use i + overdot + accent (see Figure 7-2).
All characters that use their dot in this manner have the Soft_Dotted property in Unicode.

For a list of soft dotted characters see https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B:Soft_Dotted=Yes:%5D which currently shows "iⅈ𝐢𝑖𝒊𝒾𝓲𝔦𝕚𝖎𝗂𝗶𝘪𝙞𝚒ⁱᵢįị ḭ ɨᶤ 𝼚 ᶖ jⅉ𝐣𝑗𝒋𝒿𝓳𝔧𝕛𝖏𝗃𝗷𝘫𝙟𝚓ʲⱼ ɉ ʝᶨ ϳ і ј". These are Latin, Greek and Cyrillic letters, modifier letters and mathematical symbols.

Optional fix

The ccmp feature should have lookups that substitute soft dotted glyphs by their dotless equivalents when followed by marks with at least one mark above. In some cases additional glyphs are or could be needed when full decomposition is not adequate, like istroke.dotless or optionally iogonek.dotless if /idotless/ogonek is not adequate.

For example:

@COMBINING_TOP_MARKS = [gravecomb acutecomb circumflexcomb tildecomb macroncomb diaeresiscomb brevecomb dotaccentcomb caroncomb ringcomb]; # etc.

feature ccmp {
    lookup ccmp_soft_dotted {
        lookupflag UseMarkFilteringSet @COMBINING_TOP_MARKS;
        sub [i j]' @COMBINING_TOP_MARKS by [idotless jdotless];
        sub [i-cy je-cy]' @COMBINING_TOP_MARKS by [idotless jdotless];
        # for iogonek, idotbelow, itildebelow either decompose or substitute for dotless glyphs
        sub iogonek' @COMBINING_TOP_MARKS by idotless ogonekcomb; # or "by iogonek.dotless"
        sub idotbelow' @COMBINING_TOP_MARKS by idotless dotbelowcomb; # or "by idotbelow.dotless"
        sub itildebelow' @COMBINING_TOP_MARKS by idotless tildebelowcomb; # or "by itildebelow.dotless"
        # for istroke, jstroke, imod or others substitute for dotless glyphs
        sub istroke' @COMBINING_TOP_MARKS by istroke.dotless;
        # etc.
    } ccmp_soft_dotted;
} ccmp;

Resources and exact process needed to replicate

The characters with the Unicode Soft Dotted property are defined in https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt and are listed on https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B:Soft_Dotted=Yes:%5D

The combining marks that should cause the dot to disappear are mostly in the Combining Diacritical Marks block, but also in the Combining Diacritical Marks Extended, Combining Diacritical Marks for Symbols, Cyrillic, Cyrillic Extended-A, Cyrillic Extended-B, Cyrillic Extended-D.

The sequences that are currently known to be used in orthographies are i̇́ i̇̀ i̋ i̍ i᷆ i᷇ i̓ i̊ i̇̃ i̐ ɨ́ ɨ̀ ɨ̂ ɨ̋ ɨ̏ ɨ̌ ɨ̄ ɨ̃ ɨ̈ ɨ̧́ ɨ̧̀ ɨ̧̂ ɨ̧̌ ɨ̱́ ɨ̱̀ ɨ̱̈ į́ į̇́ į̀ į̂ į̄ į̄́ į̄̀ į̄̂ į̄̌ į̃ į̇̃ į̌ ị́ ị̀ ị̂ ị̄ ị̃ ḭ́ ḭ̀ ḭ̄ j́ j̀ j̄ j̑ j̃ j̇̃ j̈ і́.

Expected Profile

Given this is a Unicode compliance check it should probably be in the universal profile.

Vendor-specific: Google Fonts
Vendor-specific: Adobe Fonts
OpenType (requirements imposed by the OpenType specification)
Universal (broadly accepted best practices on the type design community)
Other:

Expected Result

Which log result level should it have:

🔥 FAIL
The check should FAIL when a font displays the dot of soft dotted characters used in a sequence used in a language orthography.
For example:
- ị /idotbelow (LATIN SMALL LETTER I) is used in orthographies like Igbo (20 to 30 million speakers) where ị́ /idotbelow/acutecomb (U+0069 LATIN SMALL LETTER I, U+0301 COMBINING ACUTE) is used,
- j /j (U+006A LATIN SMALL LETTER J) is used in orthographies like Dutch (24 million speakers) where j́ /j/acutecomb (U+006A LATIN SMALL LETTER J, U+0301 COMBINING ACUTE) is used.
- į /iogonek (U+012F LATIN SMALL LETTER I WITH OGONEK) is used in orthographies like Navajo (160,000 speakers) where į́ /iogonek/acutecomb (U+012F LATIN SMALL LETTER I WITH OGONEK, U+0301 COMBINING ACUTE) is used.
⚠️ WARN
The check should WARN when a font displays the dot of soft dotted characters used in sequences not know to be used in an orthography.
For example:
- ⁱ (U+2071 SUPERSCRIPT LATIN SMALL LETTER I) is not used, as far as we know, in orthographies in combination with a combining mark above. Its combinations with mark above may be used in more specialized phonetic or scientific notation.
- 𝐢 (U+1D422 MATHEMATICAL BOLD SMALL I) is not used in orthographies, its combinations with mark above may be used in specialized scientific notation.

Severity assessment

4, fonts that would FAIL are unsable in the languages they apparently aim to cover.

Edit:

2023-05-20: simplified example ccmp lookup fix with UseMarkFilteringSet and comments.

The text was updated successfully, but these errors were encountered:

moyogo · 2023-02-27T05:38:54Z

From googlefonts/NunitoSans#16 (comment)

I usually create a idotless_ogonek ligature so it can work in Indesign, but I don't know if it is such a good idea.
sub iogonek' @CombiningTopAccents by idotless_ogonek;

That makes sense. InDesign would need both iogonek and idotbelow then. Fonts with more soft dotted characters would need more dotless glyphs for InDesign as well. Should that be the guidelines then, I can update #4059 accordingly?

Yep, we should add them all in the related glyphsets, and make sure the feature is complete when these glyphs are being added.

We could have

WARN: "There is a iogonek in this font, you may want to add idotless_ogonek to provide better support to african and native american languages"

FAIL : "this font contains idotless_ogonek, therefore it should have this line in the ccmp feature".

The question is, do we go with idotlessogonek (new unencoded glyph), idotless_ogonek (ligature) or iogonek.dotless (variante) in the glyphsets? The important thing is only the substitution to one glyph.

"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot." Added to the Universal Profile (issue fonttools#4059)

"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot." Added to the Universal Profile (issue #4059)

MariannaPaszkowska · 2023-06-16T16:12:57Z

@felipesanches Does this test verify if the full character set for languages requiring soft-dotted characters is present in the font? My tests on some fonts suggest it might not. I propose we might not need to flag a FAIL for fonts that don't support languages requiring the dropping of soft-dotted characters.

moyogo · 2023-06-17T06:38:48Z

@MariannaPaszkowska #4140 does change it to WARN among other things.

- Implementing the disappearing soft-dot on letter i per example at fonttools/fontbakery#4059 - Moved lookups inside their relevant feature blocks, since they were only referenced once (not sure if it changes anything in the end, but it seems more logical, at least for now)

moyogo added the New check proposal We expect new check proposals to include a detailed rationale description and a suggested check-id label Feb 16, 2023

moyogo mentioned this issue Feb 16, 2023

Add soft dotted characters check #4060

Merged

3 tasks

felipesanches modified the milestones: 0.8.12, 0.8.11 Feb 16, 2023

felipesanches assigned moyogo Feb 16, 2023

This was referenced Feb 22, 2023

Update ccmp feature for soft dotted characters googlefonts/NunitoSans#16

Merged

New check: Dutch ij with stress mark #4068

Open

felipesanches pushed a commit to moyogo/fontbakery that referenced this issue Feb 27, 2023

new check: com.google.fonts/check/soft_dotted

8c66e27

"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot." Added to the Universal Profile (issue fonttools#4059)

felipesanches pushed a commit that referenced this issue Feb 27, 2023

new check: com.google.fonts/check/soft_dotted

e55fdcc

"Ensure soft_dotted characters lose their dot when combined with marks that replace the dot." Added to the Universal Profile (issue #4059)

felipesanches closed this as completed Feb 27, 2023

felipesanches mentioned this issue Feb 27, 2023

check/soft_dotted needs code-tests #4069

Open

glenda-tn mentioned this issue Mar 15, 2023

Ensure soft_dotted characters lose their dot when combined with marks that replace the dot. canonical/Ubuntu-Sans-fonts#68

Closed

RosaWagner mentioned this issue Mar 17, 2023

diacritic check summary #3319

Open

10 tasks

vv-monsalve mentioned this issue Apr 19, 2024

⚠️ WARN Ensure soft_dotted characters lose their dot when combined with marks that replace the dot TypeTogether/Playwrite#39

Closed

kenmcd mentioned this issue Jul 1, 2024

Diacritics position for į̃ and ị̃ : is this a normal behaviour? psb1558/Junicode-font#288

Closed

kateliev mentioned this issue Aug 7, 2024

[Audit] FB Report Build 1.08: Ensure the font supports case swapping for all its glyphs. googlefonts/science-gothic#343

Closed

vv-monsalve mentioned this issue Nov 13, 2024

Add Asimovian google/fonts#7608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New check: dot of soft dotted characters should disappear #4059

New check: dot of soft dotted characters should disappear #4059

moyogo commented Feb 16, 2023 •

edited

Loading

moyogo commented Feb 27, 2023

MariannaPaszkowska commented Jun 16, 2023

moyogo commented Jun 17, 2023

New check: dot of soft dotted characters should disappear #4059

New check: dot of soft dotted characters should disappear #4059

Comments

moyogo commented Feb 16, 2023 • edited Loading

What needs to be checked?

Detailed description of the problem

Optional fix

Resources and exact process needed to replicate

Expected Profile

Expected Result

Severity assessment

moyogo commented Feb 27, 2023

MariannaPaszkowska commented Jun 16, 2023

moyogo commented Jun 17, 2023

moyogo commented Feb 16, 2023 •

edited

Loading