What about Han unification? #2208

sommerluk · 2016-07-02T10:27:42Z

[This description is regularly updated to summarize the current state of discussion in the comments.]

What is the problem of Han unification for openstreetmap-carto?

The problem of Han unification is a general problem that is independent of any specific font!

Unicode encodes abstract characters (“meanings of signs”). It does not encode glyphs (“specific graphical representations of an abstract character”).
There are three Han scripts: The Chinese Han script, the Japanese Han script and the Korean Han script. According to the initials it is abbreviated “CJK scripts”.
A wide variety of abstract characters is shared between the CJK scripts.
There are glyphs that have the same appearance in all CJK scripts. There are other glyphs that are different in all CJK scripts.
Nevertheless, native language speakers expect to see the glyphs they are used to seeing (language-specific glyphs). Different to the Unicode consortium, they consider that the different glyph form makes also a difference in the meaning of the sign. They feel that the other glyph forms are a foreign language.
Furthermore, Chinese Han has two different script variants: simplified (People’s Republic of China) and traditional (Hong Kong, Macao, Taiwan). So it’s not enough to know the language, but you also have to know the script variant.
Furthermore, even Traditional Chinese Han glyphs are usually rendered differently in three different regions (Hong Kong, Macao, Taiwan). So it’s not enough to know the language and the script variant, but you also have to know the target region.
It is not possible with plain Unicode to distinguish these forms. (IVD does not help with Han unification.)
Good CJK fonts provide all these glyphs for all language variants. Via an OpenType feature, you can access the glyph variant that you need.
openstreetmap-carto uses yet the Noto fonts, which do support all Han target languages, Han target variants and Han target regions (except Macao).
The problem is how to make the choice between all the available glyph forms.
Web pages solve the problem by using HTML lang attribute. It contains an IETF language tag (BCP-47), which can provide information about language, script and region. So the rendering engine can easily choose the appropriate default glyph for this language-script-combination because it knows the target language and the target script and the target region.
openstreetmap-carto has currently no knowledge about the target language or the target script or the target region (in the “name” key) and no region-specific rendering rules.
openstreetmap-carto policy is however to display text in the native language.

Question: How to render CJK names and other text in the native language?

Does this problem also exist in other regions of the world?

Yes.

There are four variants of the Cyrillic alphabet: Russian, Bulgarian, Serbian and Macedonian, the style can be selected by locl .
The character 'LATIN CAPITAL LETTER ENG' (U+014A) has different shapes in African and European languages, the style can be selected by locl .
The Syriac script (ISO 15924: Syrc/135) has the variants “Syriac (Eastern Variant), ISO 15924 Syrn / 136” and “Syriac (Western Variant), ISO 15924 Syrj / 137” and “Syriac (Estrangelo variant), ISO 15924 Syre / 138”. The current Noto version provides one single font file for syriac, the style can be selected by locl . However, for the future there are plans to split it up into three different font files: one for each style.

Technically, these problems are almost identical to the problem of Han unification.

Are there other problems that have the same technical base?

Arabic: It seems like Pakistanis prefer a specific typographic style (Nastaliq) when using the Arabic script, which is in widespread use in Pakistan for example for the Urdu language. Almost all over the world, there are several typographic styles that are used simultaneously. (Think of Serif fonts, Sans fonts and Comic fonts for the Latin alphabet.) In Pakistan, however, there seems to be only one single typographic style in use. This is, however, a different question than the above ones: When talking about Han unification, we are talking about often completely different letter forms sharing the same Unicode code point, but having the same typographic style in our map. In Pakistan, however, we are talking about essentially similar letter forms being rendered in a different typographic style in our map (whose design is quite different from the Sans style we use all over the rest of the world). Noto itself provides various styles for Arabic, including also a Nastaliq style, each of them in different font files.

What is the current situation at openstreetmap-carto?

If we default to Chinese glyph forms, then also Japanese city names will be rendered with Chinese glyph forms, and Japanese people will feel like it is a Chinese map. If we default to Japanese glyph forms, then also Korean city names will be rendered with Japanese glyph forms, and Korean people will feel like it is a Japanese map…

Current defaults:

CJK: Default is Japanese. (In Korea, the Hangul script seems more common than the Han script anyway. So we would not gain much by using Korean as default. Between Japanese and Chinese, it seems that Japanese people are more sensible for this issue, so we go for Japanese. However, this is subjective.)
Cyrillic: openstreetmap-carto defaults to whatever the Noto (LGC) default is.
The letter “eng”: openstreetmap-carto defaults to whatever the Noto (LGC) default is.
Syriac: Default is Eastern Syriac Variant. (We suppose there are more speakers of Eastern Syriac dialects than of Western Syriac dialects?)
Arabic: Noto Sans Arabic is used.

What is necessary for a better solution?

1. Knowledge about the target language/script/region of each label

Selection by static polygons around Japan, China, Taiwan, Korea, Hong Kong, Macao: SQL queries are messy when they are based on polygons. CartoCSS does not support multi-polygon selection. Finally, this selection would also not be directly based on the OSM data.
Selection by comparing name with name:jp, name:zh … does not work. Example: The node http://www.openstreetmap.org/node/25248662 (english: Beijing) has name=北京市 and name:ja=北京市 and name:zh=北京市. They are identical. We cannot reliably determine the language of the name value.
Selection by information in the database. It might be the clearest solution to have the language information for the name value in the OSM database itself. As a separate tag that specifies the language code of the language that was used in the name value. Furthermore, for Chinese rendering, an information about the region (Taiwan, Hong Kong, Macau, People’s Republic of China) is necessary.
- A tagging proposal adding this language information on each considered element wasn’t successful. The disadvantage of this solution is that all elements with a name tag in China, Japan, Korea would need the language information key in the database, which would be much data in OSM to create and maintain.
- There are more detailed thoughts from Michael Glanznig and from Christoph Hormann about language information about how to do this more generally by adding the information “default language” only to the yet existing country boundary relations in OSM. There is also a wiki page about the key default_language and this key is yet used on more than 200 relations in the OSM database.

2. A way to get Mapnik actually render the correct localized fonts

Mapnik does not currently allow us to control the localization with OpenType’s locl feature. But we could make individual font lists for each of the four main CJK languages, and choose the actually used font list individually for each label. This is possible because Noto provides us four different versions of the CJK fonts: Each of them contains the same set of glyphs, but each of them defaults to a different representation (Japanese, Traditional Chinese, Simplified Chinese, Korean). So we can work around the lack of support for locl in Mapnik. However, this is only possible for the Noto CJK fonts, and not for the Cyrillic script and the eng letter.
The better solution would obviously be support for locl language settings in Mapnik, and the Mapnik team is interested in implementing this for Mapnik 3.1. Furthermore, we need CartoCSS (our interface to Mapnik) to support this new Mapnik feature also. This would enable us to also handle correctly the Russian, Bulgarian, Serbian and Macedonian variants of the Cyrillic alphabet and the eng letter (here locl is the only way to get this rendered correctly with Noto). This will not work for Noto Sans Arabic vs. Nastialiq, because these are two different font files.

The text was updated successfully, but these errors were encountered:

clkao · 2016-07-02T12:14:36Z

Is it possible to add some sort of regional specific selector (based on country or bbox) as cartocss extension? I think that's the basic facility required for using different glyph forms for their respective regions when preferred language is unspecified.

The default rendering should be using the region-specific glyph forms, assuming name is in that language. And when generating Chinese-version tiles (preferring name:zh) there should be overriding styles using the Chinese glyph forms.

mxa · 2016-07-02T17:11:33Z

I'm not sure, but the biggest issues for the map might be between Traditional Chinese and Japanese. Maybe also for traditional (Taiwan) vs. simplified (Mainland China) Chinese. The map for Korea has almost no place names with Chinese characters, They are all in Hangeul and there is no overlap with Chinese or Japanese. However, some objects in Korea have Hanja names too. It usually is in the name:zh tag, which is wrong, but there is no better proposal. See this discussion https://lists.openstreetmap.org/pipermail/talk-ko/2015-October/000228.html

pnorman · 2016-07-02T20:31:30Z

Is it possible to add some sort of regional specific selector (based on country or bbox) as cartocss extension? I think that's the basic facility required for using different glyph forms for their respective regions when preferred language is unspecified.

Not within CartoCSS as it exists now. If it got added we could consider using it.

It's possible to do something with more complicated SQL queries, but this gets messy, and is even messier without defining functions in PostgreSQL, which we avoid.

pnorman · 2016-07-02T20:42:09Z

And when generating Chinese-version tiles (preferring name:zh) there should be overriding styles using the Chinese glyph forms.

Chinese-specific tiles would require modifying the style, so modifying the font list for a better rendering is pretty easy for someone to do once they've started modifications.

fwiw, I think we're stuck with the Han unification problems with current technologies.

sommerluk · 2016-08-02T22:51:04Z

Unicode 9.0.0 core specification http://www.unicode.org/versions/Unicode9.0.0/UnicodeStandard-9.0.pdf has an implementation guideline about language information in plain text and especially Han unification: Chapter 5.10.

pnorman · 2016-08-03T01:45:18Z

Han unification: Chapter 5.10.

I've read over it and it doesn't help much. In the situations they describe there is implicit language information by the reader being either Japanese or Chinese and having corresponding fonts. With server-side rendering (and most client-side) the fonts are supplied so none of the scenarios are what we have.

I do note that "plain text remains legible in the absence [of format specifications]"

mboeringa · 2016-08-03T09:50:20Z

I do note that "plain text remains legible in the absence [of format specifications]"

And:

"The goal and methods of Han unification were to ensure that the text remained legible."

"There should never be any confusion in Unicode, because the distinctions between the unified characters are all within the range of stylistic variations that exist in each country."

Artoria2e5 · 2017-04-12T01:44:45Z

Some improvements may be possible based on 5e5fb3b by reducing JP coverage to the Noto/Source Han Sans subset font, and padding it up with a SC or TC variant that has all the glyphs. This way all characters used by Japanese would be made Japanese, while the rest can be left to be written in the Chinese ways.

Correction for my comment in #2608: It appears that Japanese, for example, don't quite use the character "门": https://ja.wiktionary.org/wiki/%E9%97%A8. Since it's possibly still in the subset file (according to the source han sans readme the subset still covers all the JIS X characters), someone may have to do some font editing to kick it out.

sommerluk · 2017-04-22T18:06:55Z

Since it's possibly still in the subset file […], someone may have to do some font editing to kick it out.

@Artoria2e5 If I understand you correctly, the proposal with the region-specific subsets is not a solution for our problem, right?

sommerluk · 2017-04-22T18:08:55Z

I’ve made some further investigations and updated the issue description (“first comment”).

It seems to me that the only reliable way to support this is having in the OSM database itself the information about the language that was used in the name tag.

Artoria2e5 · 2017-04-22T23:34:36Z

@sommerluk Taking subsets can be a good enough solution as you can isolate characters not (usually) used by one region and give it a writing style from a region that commonly uses it. The region subset files can appear quite a bit too inclusive for any given region though.

Name tagging is the ideal solution around this.

springmeyer · 2017-04-27T22:54:20Z

It seems to me that the only reliable way to support this is having in the OSM database itself the information about the language that was used in the name tag.

This could be paired nicely with Mapnik if Mapnik were extended to dynamically read this value in from the database and pass it to harfbuzz. I've sketched out how that could work at mapnik/mapnik#3655 (comment).

jojo4u · 2017-04-30T16:28:38Z

Using a name_lang=[lang code] tag or similar would solve the Han problem and the duplication of name tags by because the name:[local lang code]=[local name] tag could be omitted.

sommerluk · 2017-05-01T14:09:23Z

@springmeyer Thanks!

sommerluk · 2017-05-01T14:09:48Z

I’ve written a proposal for language information tagging at https://wiki.openstreetmap.org/wiki/Proposed_features/Language_information_for_name

Feedback is welcome.

nebulon42 · 2017-05-01T14:26:22Z

@sommerluk I have thought about something similar, but limited to multilingual names: https://wiki.openstreetmap.org/wiki/User:Nebulon42/Multilingual_names

Maybe something there is of value for this problem. Or vice versa :)

sommerluk · 2017-05-04T06:55:01Z

@nebulon42 Thanks! I did not know your proposal. Great work!

The syntax is essentially the same: semicolon-separated list of the language codes that are already used for name:=

Additional to your proposal, I simply admit single-language values. Would you consider that my proposal is a superset of your proposal and would be enouth to also serve your purposes?

nebulon42 · 2017-05-04T19:06:42Z

I did not know your proposal.

Yes, I drafted it some time ago but did not have the time to push it further.

Would you consider that my proposal is a superset of your proposal and would be enouth to also serve your purposes?

Definitely. If you have the time and energy to push this further I really appreciate that. If you need some help please tell me. If anything on my Wiki page suits your needs for the proposal like the example renderings etc. please do not hesitate to use it.

I saw that there is some progress on the Mapnik side. If there is anything that needs to be done for CartoCSS, please create an issue and I will try to get it into the next release.

sommerluk · 2017-05-05T11:59:35Z

Yes, that sounds good.

The RFC at the tagging mailing list is done. The multilingual name processing is added as use case. Overall, the proposal is still quite short, also because my english is not so good. Hopefully that’s not an obstacle…

About Mapnik and CartoCSS: The most important part is getting support for controling locl via a property in the stylesheet. As far as I know, that’s not done yet. Once it’s done, will it be automatically available in CartoCSS, or is it necesarry to add in manually?

nebulon42 · 2017-05-07T15:19:35Z

If it's only about the property then adding it to https://github.com/mapnik/mapnik-reference for 3.1.0 or which version it is released in would be sufficient. Then carto needs to used that updated reference in a new version and it is available.

sommerluk · 2017-05-23T18:05:27Z

The voting for a language tag is now open at https://wiki.openstreetmap.org/wiki/Proposed_features/Language_information_for_name

Support is welcome ;-)

sommerluk · 2019-03-21T05:23:38Z

iirc the "beta" character in Greek also have some similar situation? Is there any exhaustive list on all language/script/glyph that are affected by the Unicode glyph unification?

To both questions: I don’t know, neither did I found a good answer searching on the web.

bdon · 2020-08-21T22:04:43Z

Working on Step 2 above here: mapnik/mapnik#3655 (comment)

For Step 1, my feeling is that for the zh-hant locale this problem is so pervasive that an OSM data based approach isn't realistic - essentially any character that uses the 辶 radical is affected including common names of linear features like 大道(Boulevard), 步道(Trail), etc. (before/after using my lang fork:)

For the region-based approach, I'm skeptical that a polygon-based approach will result in an elegant implementation; what about a raster/bitmap solution? For example, a GeoTiff where each pixel encodes an 8-bit value corresponding to BCP47 language tag that can be sampled for every symbolizer - for my own uses I would probably implement this in-memory directly in the mapnik C++ code, but for OSM Carto I'm not sure how this would fit into the tile rendering path.

A bitmap based approximation based on Z14 tiles might be detailed enough and would be 16384x16384 px, which is a reasonable size.

mapmeld · 2025-02-14T18:00:50Z

Some good news re Han Unification: Mapnik merged mapnik/mapnik#4493 which accepts a lang attribute on a TextSymbolizer and passes it to HarfBuzz.
Two notes:

We noticed that older Noto Chinese fonts already restrict output to Simplified or Traditional because they shipped with one variant. Possibly the modern font repos could be built this way, too.
It's unlikely that mapbox/carto (last commit 2020, archived 1 year ago) would update its output XML for Mapnik. If following this route, I would suggest having this OSM repo edit the XML (i.e. if the fontset on a TextSymbolizer puts CJK TC font first, add lang="zh-Hant" or lang="[cjk]").

imagico · 2025-02-14T19:19:42Z

It's unlikely that mapbox/carto (last commit 2020, archived 1 year ago) would update its output XML for Mapnik.

This is not necessarily a barrier - moving to a carto fork would not be out of the question.

Ultimately, of course, support for manually selecting font locl in Mapnik and Carto is only a smaller part of addressing the whole problem. The larger part is determining the language.

A demonstration how this could be done was shown and discussed in

https://imagico.de/blog/en/rethinking-name-labeling-in-openstreetmap-map-styles/

which does not, however, include the code of generating the language_regions data from OSM data.

dch0ph · 2025-02-15T13:06:59Z

@mapmeld I see you're being modest about your contributions to mapnik!

It's great to somebody taking an interest in mapnik development. Most of the commits these days seem to be code maintenance rather fixes.

If you do have a functional mapnik development environment, it might be interesting to look at this issue, which is indirectly related to the current topic. We can probably find examples where ascenders / descenders mess up the placement of text in shield boxes.

dch0ph · 2025-02-17T19:00:02Z

The use of admin boundaries to determine language display preferences has been discussed in another context (community forum?). It would be useful to review options for processing the admin boundaries using the new flex input.

In #4431 a new table was created for admin boundaries. But this deliberately removed overlaps so that only the highest priority admin boundary way was included. This is probably not useful for the current application.

As I read the current flex input, "boundaries" are loaded into both the roads and polygon tables, with "admin level" boundaries being read from roads and others from the polygon table. Is this because "admin" boundaries can't be guaranteed to be closed, and so it is only safe to treat them as a collection of ways?

It would useful to have the admin boundaries stored as a set of nested rings, but I imagine that it will never be as neat as this.

imagico · 2025-02-17T22:02:03Z

The original (unsplit) admin polygons are not suitable for efficient on-the-fly point in polygon lookup.

And given the nature of the problem it would definitely not be necessary for the language lookup data to be updated in realtime.

dch0ph · 2025-02-18T18:32:47Z

I wonder whether we can use a generated column to perform the job of your carto_label_name function?

If the function returned "label to use" and locl, the label and locl can be passed directly to the CSS.

Inefficiency of the polygon lookup is less of an issue if it is only happening when the object is added/changed.

The generated column would need to be rebuilt if the "language polygons" changed, but, as you say, this would not be done in real time (not least to avoid abuse).

mapmeld · 2025-02-18T19:09:05Z

My hope is that we could pilot this in a few areas, before bringing in every design concern (updatable admin boundaries, borders). Currently there's too much uncertainty about whether this will happen at all. We'd be in a better state if the pipeline exists, and we can compare real world tiles somewhere. Based on when I was looking up test cases, the number of pixels changed on real world tiles may be... underwhelming

imagico · 2025-02-20T23:20:27Z

Changes can be made step by step but there needs to be a viable strategy.

mapmeld · 2025-02-21T00:22:38Z

I'm re-reading your post and taking a look at the Noto CJK docs recently.
As you said in the post, there are single language fonts. Documentation link. We can use the latest release from 2021/22 and download Simplified, Traditional, and Hong Kong as separate fonts (I don't know the specifics of HK vs. TC fonts, but wanted to mention it)
Using fonts, there's no need to update mapnik or the carto compiler, just writing carto to support it based on a column.

A default_language column sounds like a good idea, it can be added to the DB import Lua script (when I was testing I named it cjk, but with context, it's better to be open-ended). Then it already has the tag it would be preserved, i.e. WHERE St_Contains(...) AND default_language IS NULL.
I could see there being a question of which tag, but this isn't asking anyone to add or change that tag value. If someone added a bunch of default_language=My Value nodes out there, the labels and data would stay the same, so what would be the objection...

I didn't mention it in this thread yet, I gave in Option 2 of https://github.com/mapmeld/osm2pgsql-cjk some research on the name:zh-Hans and name:zh-Hant tags. Those get used on about 5% of Hong Kong's named nodes. If I can rewrite / summarize what I said at that link: the script could update Chinese names in HK avoiding ones where the name value better matches the name:zh-Hans tag.
If someone wanted to avoid rendering in Traditional Chinese, they could add a name:zh-Hans tag or default_language tag, which are both options that people do today.

Artoria2e5 · 2025-02-21T04:54:12Z

My two cents on polygons is this: the polygons do not need to be very precise in practice. Only the parts that border another Sinosphere/glyph-style region actually matter, and even within this subset of points, the maritime polygon vertices are going to affect way fewer labels than the land vertices.

So really the parts that require "reasonably" high resolution are:

China - North Korea
China - Taiwan (Kinmen)
China - Macau
China - Hong Kong

That's it. The rest can use very crude hand-simplified polygons.

imagico · 2025-02-21T10:16:44Z

Using fonts, there's no need to update mapnik or the carto compiler

That is true for CJK - in other cases it might be necessary to modify the fonts to be single language.

That's it. The rest can use very crude hand-simplified polygons.

We definitively do not want to increase maintenance burden on the style by hand designing and shipping a spatial data set - even if very low detail. And this also would not be compatible with our practice (as also mandated by the Guidelines for new tile layers) to use OSM data where possible.

And substantial long term mismatches between spatially based language assignments and the administrative boundaries as mapped in OSM would just be a source of complaints and disputes in an already sensitive domain. So - for production use - i do not consider this viable. Nothing, however, speaks against starting development of this with a placeholder data set - as long as you keep in mind that this will eventually need to be tackled.

And please everyone keep in mind that this issue - despite its title - is generically about solving the issue that different scripts should be used for different languages and that we have no strait away method to determine the language of the name tag. In my proof-of-concept demonstration i combined this with a move away from showing the generic name tag everywhere - which would be a much bigger strategic change here, but which - as @mapmeld also hints at in his contemplations - naturally ties with practical solutions to this issue since those will often involve analyzing and possibly matching the various name tags.

c933103 · 2025-02-23T03:03:31Z

I'm re-reading your post and taking a look at the Noto CJK docs recently. As you said in the post, there are single language fonts. Documentation link. We can use the latest release from 2021/22 and download Simplified, Traditional, and Hong Kong as separate fonts (I don't know the specifics of HK vs. TC fonts, but wanted to mention it) Using fonts, there's no need to update mapnik or the carto compiler, just writing carto to support it based on a column.

The SC TC HK font variants represent the glyph for Chinese characters adopted by local standard-setting bodies in China, Taiwan, Hong Kong respectively. Note that this difference is not mainly about simplified vs traditional, despite their name. And thus for example the glyph for a Traditional Chinese character in SC set based on China standard could be different from TC/HK set and that represent how the standard-setting body in China think that character should be written. (Although there are more nuances behind the situation).

But given majority of Simplified users are most likely accustomed to the Chinese standard on how the characters should look, it might make sense to use the SC font even if Simplified characters appear in places outside China.

But how to determine a name with Han character in Vancouver or Los Angeles is intended to be rendered using which glyph, I have no idea.

A default_language column sounds like a good idea, it can be added to the DB import Lua script (when I was testing I named it cjk, but with context, it's better to be open-ended). Then it already has the tag it would be preserved, i.e. WHERE St_Contains(...) AND default_language IS NULL. I could see there being a question of which tag, but this isn't asking anyone to add or change that tag value. If someone added a bunch of default_language=My Value nodes out there, the labels and data would stay the same, so what would be the objection...

I didn't mention it in this thread yet, I gave in Option 2 of https://github.com/mapmeld/osm2pgsql-cjk some research on the name:zh-Hans and name:zh-Hant tags. Those get used on about 5% of Hong Kong's named nodes. If I can rewrite / summarize what I said at that link: the script could update Chinese names in HK avoiding ones where the name value better matches the name:zh-Hans tag. If someone wanted to avoid rendering in Traditional Chinese, they could add a name:zh-Hans tag or default_language tag, which are both options that people do today.

For the purpose of name in Hong Kong, I don't think it is an established custom for people to add zh-Hant name as it is the default and would simply duplicate with name:zh. And name:zh-Hans I think are mostly added by Simplified Chinese users visiting/arriving at the city. So I don't think evaluating whether they match the Han character in default name tag is a good way to figure out whether the name is Simplified or not.

mapmeld · 2025-02-24T08:02:38Z

An update: I have a test branch with Simplified-only font + data from Liaoning (near China/DPRK border). I made red text in the same Carto selectors to see what's changed and what's remaining:

Aside from the tagging / post-upload querying which was discussed earlier:

queries in project.mml have to SELECT the default_language column (cjk in my branch), including in nested / UNION queries
didn't yet make a font variable which falls back to oblique
It's verbose; I don't know a way in Carto to either replace @book-fonts with a conditional, or to write a blanket #country, #state, #city { &[default_language = 'zh'] { text-face-name: @simplified-chinese-fonts; } } like I could in SCSS. I believe the compiler is trying to save some trouble by rejecting conflicts / multiple applicable style sections. If there are any Carto wizards out there and I'm missing something, let me know. (edit: when we can pass default_language directly to a future version of Mapnik, this would be much neater)

mapmeld · 2025-03-08T21:05:01Z

I put a server with the Simplified Chinese demo online at 159_65_238_18/openstreetmap-carto/#16/39.8713/124.1505 for review. It's generating tiles dynamically so don't zoom out a lot and expect it to render immediately.

1ec5 · 2025-03-08T22:44:56Z

But how to determine a name with Han character in Vancouver or Los Angeles is intended to be rendered using which glyph, I have no idea.

In the San Francisco Bay Area, it was not too long ago that Toishanese rendered in Traditional (Hong Kong) characters would’ve been the closest thing to a local default. However, that’s definitely no longer the case, especially outside of San Francisco Chinatown proper. Some city governments use Traditional (Hong Kong) Chinese, while some public transportation systems use Traditional Chinese in simplified fonts, and private businesses are an unruly mix of both traditional and simplified usage, depending on the owner’s background or clientele. In OSM, name:zh=* is locally much more common than name:zh-Hant=*, name:zh-Hans=*, name:cmn=* or name:yue=*, though I try to add all of them redundantly whenever possible, converting them using OpenCC or manual Wiktionary lookup. I’m not very confident that other mappers here would be able to do the same without more guidance and tooling.

Anyways, this is just one data point from a non-native speaker. Others in the OSM community might be able to provide more insight about tagging and groundtruth, but it would probably be considered somewhat off-topic for this issue tracker.

sommerluk mentioned this issue Jul 2, 2016

Hangeul (Korean) font not readable #2204

Closed

sommerluk mentioned this issue Sep 15, 2016

Initial Noto support #2349

Merged

6 tasks

kocio-pl added the text label Nov 15, 2016

pnorman mentioned this issue Apr 11, 2017

Locale-aware Typography #2608

Closed

springmeyer mentioned this issue May 1, 2017

Text rendering should attempt to respect locale (OpenType locl) variants for characters mapnik/mapnik#3655

Open

sommerluk mentioned this issue Sep 3, 2017

Simplified Chinese characters rendered using Japanese font #2807

Closed

matthijsmelissen added the general label Sep 11, 2017

pnorman mentioned this issue Sep 23, 2017

Chinese render error #2862

Closed

sommerluk mentioned this issue Dec 29, 2017

Use OpenType features to get better legibility #2647

Closed

jeisenbe mentioned this issue Apr 3, 2020

CJK containing highway numbers placed too low within their rectangular shields #4103

Closed

xmd5a2 mentioned this issue Jul 9, 2020

[Feature request] improve Korean font on the online tiles osmandapp/OsmAnd#9371

Closed

imagico mentioned this issue May 14, 2022

Enable Noto Nastaliq Urdu as the default font for locations in Pakistan #4547

Closed

1ec5 mentioned this issue Jun 18, 2022

Wrong name taken from external source wikidata for German place names osm-americana/openstreetmap-americana#428

Closed

bgo-eiu mentioned this issue Aug 14, 2022

Recommendation to include additional Arabic Naskh fallback font #4644

Open

sommerluk mentioned this issue Aug 26, 2022

Cursive Cyrillic fonts should be avoided #4327

Closed

depth221 mentioned this issue Oct 4, 2022

Add a bash script to allow a user to change the CJK font conveniently #4707

Merged

1ec5 mentioned this issue Dec 10, 2022

Render CJK glyphs locally osm-americana/openstreetmap-americana#613

Closed

imagico mentioned this issue Dec 17, 2022

Reformat semicolons in names #4755

Open

SCLu17 mentioned this issue Oct 20, 2023

[CJK] Add support for Hong Kong SAR and Macau SAR locales tracestrack/tracestrack-maps#3

Closed

imagico mentioned this issue Apr 18, 2024

Render more specific access tags i.e. bicycle, motor_vehicle etc #214

Closed

imagico mentioned this issue Oct 26, 2024

Avoid a new line break at the hyphen in "Co-op" in labels #5028

Open

imagico mentioned this issue Jan 16, 2025

RFC for strategy on font management #5043

Open

6 tasks

What about Han unification? #2208

What about Han unification? #2208

Comments

sommerluk commented Jul 2, 2016 • edited Loading

What is the problem of Han unification for openstreetmap-carto?

Does this problem also exist in other regions of the world?

Are there other problems that have the same technical base?

What is the current situation at openstreetmap-carto?

What is necessary for a better solution?

1. Knowledge about the target language/script/region of each label

2. A way to get Mapnik actually render the correct localized fonts

clkao commented Jul 2, 2016

mxa commented Jul 2, 2016

pnorman commented Jul 2, 2016

pnorman commented Jul 2, 2016

sommerluk commented Aug 2, 2016 • edited Loading

pnorman commented Aug 3, 2016

mboeringa commented Aug 3, 2016

Artoria2e5 commented Apr 12, 2017 • edited Loading

sommerluk commented Apr 22, 2017 • edited Loading

sommerluk commented Apr 22, 2017

Artoria2e5 commented Apr 22, 2017

springmeyer commented Apr 27, 2017

jojo4u commented Apr 30, 2017

sommerluk commented May 1, 2017

sommerluk commented May 1, 2017

nebulon42 commented May 1, 2017

sommerluk commented May 4, 2017

nebulon42 commented May 4, 2017

sommerluk commented May 5, 2017

nebulon42 commented May 7, 2017

sommerluk commented May 23, 2017

sommerluk commented Mar 21, 2019

bdon commented Aug 21, 2020

mapmeld commented Feb 14, 2025

imagico commented Feb 14, 2025

dch0ph commented Feb 15, 2025

dch0ph commented Feb 17, 2025

imagico commented Feb 17, 2025

dch0ph commented Feb 18, 2025

mapmeld commented Feb 18, 2025

imagico commented Feb 20, 2025

mapmeld commented Feb 21, 2025 • edited Loading

Artoria2e5 commented Feb 21, 2025 • edited Loading

imagico commented Feb 21, 2025

c933103 commented Feb 23, 2025

mapmeld commented Feb 24, 2025 • edited Loading

mapmeld commented Mar 8, 2025

1ec5 commented Mar 8, 2025

sommerluk commented Jul 2, 2016 •

edited

Loading

sommerluk commented Aug 2, 2016 •

edited

Loading

Artoria2e5 commented Apr 12, 2017 •

edited

Loading

sommerluk commented Apr 22, 2017 •

edited

Loading

mapmeld commented Feb 21, 2025 •

edited

Loading

Artoria2e5 commented Feb 21, 2025 •

edited

Loading

mapmeld commented Feb 24, 2025 •

edited

Loading