Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Chinese labels should be wrapped by character with balanced lines #1223

Closed
barrycug opened this issue Apr 7, 2015 · 12 comments
Closed

Chinese labels should be wrapped by character with balanced lines #1223

barrycug opened this issue Apr 7, 2015 · 12 comments

Comments

@barrycug
Copy link

barrycug commented Apr 7, 2015

the font is "Unifont Medium” .The english is ok

@mikemorris
Copy link
Contributor

@barrycug Can you please provide more details on what the expected result is versus what is actually rendered?

@barrycug
Copy link
Author

barrycug commented Apr 8, 2015

issure
on the screenshot,you can find the English text have 2 lines,and the chinese text have 1 line.
and i set the text-max-width =5;
"text-max-width":5

@barrycug
Copy link
Author

barrycug commented Apr 8, 2015

I know ,the text wrapped line depend on space.But in chinese ,the text have not include the space.

@friedbunny
Copy link
Contributor

Japanese has a similar issue, to a lesser degree because names tend to be shorter. The problem is that it would be difficult to determine where to break an un-spaced line — it would likely have to be specific to each language.

Off the top of my head, Japanese could be split on certain characters (like or other punctuation) or when switching character sets, e.g., syllabaries giving way to kanji like in カタカナ漢字 or alphabet to kanji ABC公園, etc.

Here's a Yahoo Japan map, which is probably the best designed web map in Japan, if anyone is looking for inspriation. ;)

@barrycug
Copy link
Author

barrycug commented Apr 8, 2015

thank you very mach. I split the text by the size. It is the most simple solution。Mayby I can insert space into the data,this is the correct solution,because it can use the natural language broken。But it is a hard work。

bool hasspace = false;
uint32_t space = 32;
if(!(string.find(space) ==std::string::npos))
{
    hasspace = true;
}
long breakOffset=-1;
if (!hasspace) {
    if (string.size() > maxWidth/24) {
        breakOffset = string.size()/2+1;
    }
}

int i=0;
for (uint32_t chr : string) {
    if (!hasspace &&i==breakOffset) {
        shaping.emplace_back(space, x, y);
        auto metric = metrics.find(space);
        if (metric != metrics.end()) {
            x += metric->second.advance + spacing;
        }
    }
    shaping.emplace_back(chr, x, y);
    auto metric = metrics.find(chr);
    if (metric != metrics.end()) {
        x += metric->second.advance + spacing;
    }
    i++;
}

@barrycug
Copy link
Author

barrycug commented Apr 8, 2015

the code only support 2 line wrapped.

@mikemorris
Copy link
Contributor

Thanks for the detailed explanation @barrycug and @friedbunny! The "proper" way to implement this would probably be to use something like ICU's BreakIterator for boundary analysis, then calculate the width of glyph clusters similarly to Mapnik and split on the closest break point. A similar approach for complex scripts that require shaping is described at http://lists.freedesktop.org/archives/harfbuzz/2014-February/004140.html

@mikemorris
Copy link
Contributor

Linking Mapnik line-breaking implementation here for future reference https://github.com/mapnik/mapnik/blob/master/src/text/text_layout.cpp

@friedbunny
Copy link
Contributor

That's really interesting reading and looks like the ideal path to pursue, thanks @mikemorris. Do you have any idea how feasible it would be to do this properly/robustly in GL?

@mikemorris
Copy link
Contributor

@friedbunny I think it should be pretty straightforward to implement in gl-native because of existing libraries like ICU and HarfBuzz; the challenge will mostly be avoiding performance slowdowns and minimizing bloat. I'd like to tackle this in conjunction with other render-time glyph issues like complex text shaping and RTL/bidirectional script support.

What I expect will be difficult is implementing comparable functionality in mapbox-gl-js, as I'm not sure if it would be feasible to bundle (or request dynamically) contextual line-breaking support or shaping tables, and I don't know any existing JavaScript libraries that have already solved these problems.

@mikemorris
Copy link
Contributor

@1ec5
Copy link
Contributor

1ec5 commented Oct 19, 2016

CJK line breaking is orthogonal to complex text shaping. For example, the most cartographically sound way to wrap Chinese text (but not Japanese or Korean) would be character by character, keeping the lines balanced.

Reopening.

@1ec5 1ec5 reopened this Oct 19, 2016
@1ec5 1ec5 changed the title Chinese Text can't support label wrapped Line breaking by character for Chinese labels Oct 19, 2016
@1ec5 1ec5 changed the title Line breaking by character for Chinese labels Chinese labels should be wrapped by character with balanced lines Oct 19, 2016
1ec5 added a commit that referenced this issue Oct 26, 2016
Allow a line break to be inserted after any supported Chinese, Japanese, or Yi character in a point-placed label. Balance the lines unless non-ideographic text such as Latin letters are present.

Fixes #1223.
1ec5 added a commit that referenced this issue Nov 12, 2016
Allow a line break to be inserted after any supported Chinese, Japanese, or Yi character in a point-placed label. Balance the lines unless non-ideographic text such as Latin letters are present.

Fixes #1223.
1ec5 added a commit that referenced this issue Nov 14, 2016
Allow a line break to be inserted after any supported Chinese, Japanese, or Yi character in a point-placed label. Balance the lines unless non-ideographic text such as Latin letters are present.

Fixes #1223.
1ec5 added a commit that referenced this issue Nov 14, 2016
* [core] Line-break ideographic text by character

Allow a line break to be inserted after any supported Chinese, Japanese, or Yi character in a point-placed label. Balance the lines unless non-ideographic text such as Latin letters are present.

Fixes #1223.

* [core] Moved more character classing into util::i18n

* [core] Detect character properties by Unicode block

* [test] Reenabled ideographic breaking tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants