-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Category guessing leads to bizarre results #756
Comments
too much magic indeed. |
Only if you set the subcategory as well. |
The ligature-"magic" should only be applied when it finds a glyph info for all parts. As |
I really hoped the response to "this is way too much magic" was not going to be "let's add more magic". |
Fonts depend on this “magic”, if we were to remove it they will break. |
They’re breaking already, it’s just right now they’re breaking in strange ways instead of straightforward ways. |
If one has a glyph named |
What glyphsLib does does not match what Glyphs.app does. I have in the past tried to make them match but failed, there are quite a few edge cases where Glyphs.app thinks differently. I don't know how to improve the situation. Maybe a mechanically verified model for glyph names like the Universal Shaping Engine? |
Do you have examples of those differences? Maybe we can have a shared test suite in the GlyphInfo repo? |
I guess I can collect some. Or loop through all known glyph names plus names taken from random .glyphs files in the wild and compare (sub)categories before/after. Here's a static list of glyph names that I have to manually declare bases for the purpose of GDEF and anchor attachment in a certain project: KNOWN_BASES = {
"k_ssa-deva",
"j_nya-deva",
"k_ss-deva",
"k_ss-deva.alt2",
"k_ss-deva.alt3",
"k_ss-deva.alt4",
"k_ss-deva.alt5",
"k_ss-deva.alt6",
"k_ss-deva.alt7",
"j_ny-deva",
"j_ny-deva.alt2",
"j_ny-deva.alt3",
"j_ny-deva.alt4",
"j_ny-deva.alt5",
"j_ny-deva.alt6",
"j_ny-deva.alt7",
"j_ny-deva.alt8",
"ng_ya-deva",
"ch_ya-deva",
"tt_tta-deva",
"tt_ttha-deva",
"tt_ya-deva",
"tth_ttha-deva",
"tth_ya-deva",
"dd_dda-deva",
"dd_ddha-deva",
"dd_ya-deva",
"ddh_ddha-deva",
"ddh_ya-deva",
"t_ta-deva",
"t_ra-deva",
"d_ga-deva",
"d_gha-deva",
"d_da-deva",
"d_dha-deva",
"d_dh_ya-deva",
"d_ba-deva",
"d_bha-deva",
"d_ma-deva",
"d_ya-deva",
"d_ra-deva",
"d_va-deva",
"p_ta-deva",
"sh_ra-deva",
"ss_tta-deva",
"ss_ttha-deva",
"h_nna-deva",
"h_na-deva",
"h_ma-deva",
"h_ya-deva",
"h_ra-deva",
"h_la-deva",
"h_va-deva",
"h_ra_uMatra-deva",
"h_ra_uuMatra-deva",
"ba-khmer",
"ba-khmer.post",
"ba-khmer.post2",
"ba_aaSign-khmer",
"ba_aaSign-khmer.post2_",
"ba_aaSign-khmer.post_",
"ba_auSign-khmer",
"ba_auSign-khmer.post2_",
"ba_auSign-khmer.post_",
"beikoet-khmer",
"beiroc-khmer",
"buonkoet-khmer",
"buonroc-khmer",
"ca-khmer",
"ca_aaSign-khmer",
"ca_auSign-khmer",
"cha-khmer",
"cha_aaSign-khmer",
"cha_auSign-khmer",
"cho-khmer",
"cho-khmer.post",
"cho-khmer.post2",
"cho_aaSign-khmer",
"cho_aaSign-khmer.post2_",
"cho_aaSign-khmer.post_",
"cho_auSign-khmer",
"cho_auSign-khmer.post2_",
"cho_auSign-khmer.post_",
"co-khmer",
"co_aaSign-khmer",
"co_auSign-khmer",
"da-khmer",
"da_aaSign-khmer",
"da_auSign-khmer",
"dapBeikoet-khmer",
"dapBeiroc-khmer",
"dapBuonkoet-khmer",
"dapBuonroc-khmer",
"dapMuoykoet-khmer",
"dapMuoyroc-khmer",
"dapPiikoet-khmer",
"dapPiiroc-khmer",
"dapPramkoet-khmer",
"dapPramroc-khmer",
"dapkoet-khmer",
"daproc-khmer",
"do-khmer",
"do_aaSign-khmer",
"do_auSign-khmer",
"dottedCircle",
"ha-khmer",
"ha_aaSign-khmer",
"ha_auSign-khmer",
"ka-khmer",
"ka_aaSign-khmer",
"ka_auSign-khmer",
"kha-khmer",
"kha_aaSign-khmer",
"kha_auSign-khmer",
"kho-khmer",
"kho-khmer.post",
"kho-khmer.post2",
"kho_aaSign-khmer",
"kho_aaSign-khmer.post2_",
"kho_aaSign-khmer.post_",
"kho_auSign-khmer",
"kho_auSign-khmer.post2_",
"kho_auSign-khmer.post_",
"ko-khmer",
"ko_aaSign-khmer",
"ko_auSign-khmer",
"la-khmer",
"la_aaSign-khmer",
"la_auSign-khmer",
"lo-khmer",
"lo_aaSign-khmer",
"lo_auSign-khmer",
"mo-khmer",
"mo_aaSign-khmer",
"mo_auSign-khmer",
"muoykoet-khmer",
"muoyroc-khmer",
"ngo-khmer",
"ngo_aaSign-khmer",
"ngo_auSign-khmer",
"nno-khmer",
"nno_aaSign-khmer",
"nno_auSign-khmer",
"no-khmer",
"no_aaSign-khmer",
"no_auSign-khmer",
"nyo-khmer",
"nyo-khmer.less",
"nyo_aaSign-khmer",
"nyo_aaSign-khmer.less",
"nyo_auSign-khmer",
"nyo_auSign-khmer.less",
"pathamasat-khmer",
"pha-khmer",
"pha_aaSign-khmer",
"pha_auSign-khmer",
"pho-khmer",
"pho_aaSign-khmer",
"pho_auSign-khmer",
"piikoet-khmer",
"piiroc-khmer",
"po-khmer",
"po_aaSign-khmer",
"po_auSign-khmer",
"pramBeikoet-khmer",
"pramBeiroc-khmer",
"pramBuonkoet-khmer",
"pramBuonroc-khmer",
"pramMuoykoet-khmer",
"pramMuoyroc-khmer",
"pramPiikoet-khmer",
"pramPiiroc-khmer",
"pramkoet-khmer",
"pramroc-khmer",
"qa-khmer",
"qa_aaSign-khmer",
"qa_auSign-khmer",
"ro-khmer",
"ro_aaSign-khmer",
"ro_auSign-khmer",
"sa-khmer",
"sa-khmer.post",
"sa_aaSign-khmer",
"sa_aaSign-khmer.post_",
"sa_auSign-khmer",
"sa_auSign-khmer.post_",
"sha-khmer",
"sha_aaSign-khmer",
"sha_auSign-khmer",
"sso-khmer",
"sso-khmer.post",
"sso_aaSign-khmer",
"sso_aaSign-khmer.post_",
"sso_auSign-khmer",
"sso_auSign-khmer.post_",
"ta-khmer",
"ta_aaSign-khmer",
"ta_auSign-khmer",
"tha-khmer",
"tha_aaSign-khmer",
"tha_auSign-khmer",
"tho-khmer",
"tho_aaSign-khmer",
"tho_auSign-khmer",
"to-khmer",
"to_aaSign-khmer",
"to_auSign-khmer",
"ttha-khmer",
"ttha_aaSign-khmer",
"ttha_auSign-khmer",
"ttho-khmer",
"ttho-khmer.post",
"ttho-khmer.post2",
"ttho_aaSign-khmer",
"ttho_aaSign-khmer.post2_",
"ttho_aaSign-khmer.post_",
"ttho_auSign-khmer",
"ttho_auSign-khmer.post2_",
"ttho_auSign-khmer.post_",
"tuteyasat-khmer",
"vo-khmer",
"vo_aaSign-khmer",
"vo_auSign-khmer",
"yo-khmer",
"yo-khmer.post",
"yo-khmer.post2",
"yo_aaSign-khmer",
"yo_aaSign-khmer.post2_",
"yo_aaSign-khmer.post_",
"yo_auSign-khmer",
"yo_auSign-khmer.post2_",
"yo_auSign-khmer.post_",
} |
I can see three groups in the list.
|
According to some of our designers that deal in Deva, it makes sense. Internally, we avoid the underscore in conjuncts, so we have names like "KSsa-deva". Wouldn't that be more consistent? no more underscore to disambiguate. |
Maybe. But it makes processing the names much more complicated. |
I'm curious, can you please give an example? |
Getting from |
@schriftgestalt Can you please elaborate on this fix? I am having [a similar issue] (googlefonts/gftools#497).
|
Can you be more specific what you like to know? |
@schriftgestalt I have a similar issue(s) with certain conjuncts when I try to build the font with gftools. The marks for multiple conjuncts like If I change the subcategory from Ligature to Other, the For the glyph to actually work, I need to manually edit the A lot of Devanagari glyphs have this issue including all of the glyphs that end with I have logged multiple issues regarding this. In the latest issue logged, @simoncozens referred to this issue. Issues logged: Thanks! |
I can’t say much about the cjct feature in gftools. Only how it would be done in Glyphs. |
I'm getting really annoyed with this bug. Basically if a glyph name has an underscore in it, mark attachment probably won't work. |
Thinking about this more, the problem is not so much that glyphs are being incorrectly considered ligature glyphs. The problem is more that if they are ligature glyphs, ufo2ft does not perform mark-to-base attachment on them, only mark-to-ligature attachment. So a ligature glyph with a "top" anchor does not get top attachment; only "top_1" would be attached. If the anchor attachment was correct, the GDEF category would be less significant. That said, it's still obviously wrong to consider
But the anchor attachment issue is the real cause of pain here. |
However, since the guessing is only heuristic, I am going to strongly recommend (and for Noto insist) that Glyphs sources explicitly set the category and subcategory for all glyphs. |
I’m working on this right now and I think I can improve that quite a bit. I’m adding all my test cases from Glyphs and try to match the result as close as possible. But for names that Glyphs can’t recognize (like |
Thought: we could look at it the other way around. The only reason the ligature/base difference actually matters (unless we are doing very funky feature code) is for mark attachment. So why can't we just say that if a glyph has |
That could work if your font data is all set correctly. One wring "top" anchor could make it difficult to decide. And in Glyphs this info is used for more things (anchors, feature code) so I need to have that work anyway. |
Suspect that my algorithm is doing more than needed in glyphsLib. Is there any use to do name normalization? e.g. asking for info for |
What is the state of the Glyphs Info changes I did in the context of RTL kerning issue? That code will fix this issue. |
last time I looked at it, it looked unfinished. It should be rebased and cleaned up |
It won't. The code is still exactly the same in that branch. glyphsLib/Lib/glyphsLib/glyphdata.py Lines 563 to 567 in 5426cce
I do not want glyphsLib to do magic guessing based on glyph names. |
This is all properly implemented in the GlyphsInfo3 branch. |
|
This code is only used for fallback when the glyph data didn’t produce anything. I didn't touch that part. |
Why aren't my marks attaching to my bases? I have a base glyph "brm_RR" and that works fine, and another base glyph "brm_R" and marks don't attach. Turns out it's because:
brm_R
goes into ligatures not bases in GDEF, so marks aren't attached. But why isbrm_R
considered a ligature? Because...glyphsLib/Lib/glyphsLib/glyphdata.py
Lines 245 to 247 in 10c8b1d
brm_
is skipped, R is looked up, R is a letter, and so_translate_category
is called, and inside_translate_category
:glyphsLib/Lib/glyphsLib/glyphdata.py
Lines 295 to 298 in 10c8b1d
This is way too much magic.
I literally told Glyphs that it was Category=Letter, but no, glyphsLib thinks it knows better.
The text was updated successfully, but these errors were encountered: