Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Western fonts cannot be changed in the output docx #7022

Closed
TomBener opened this issue Jan 12, 2021 · 11 comments
Closed

Western fonts cannot be changed in the output docx #7022

TomBener opened this issue Jan 12, 2021 · 11 comments

Comments

@TomBener
Copy link
Contributor

TomBener commented Jan 12, 2021

I use Pandoc 2.11.3.2 on macOS 11.1.

When the lang is set as zh-CN in yaml block:

lang: zh-CN

Then run the command:

$ pandoc -C input.md -o main.docx 

The western fonts in the output main.docx is the same as the Chinese fonts 宋体 like the screen below. In addition, it seems the western fonts are frozen which can’t be changed, although the default fonts in the template of Pandoc for western fonts are Calibri and Cambria.

CleanShot 2021-01-12 at 15 02 06@2x

If the lang: zh-CN is removed, the issue will disappear, but some other problems emerge. For example, the Chinese quotation marks “” and ‘’ are set as western fonts, which should be avoided.

So I'd like the western fonts are with different fonts from Chinese fonts 宋体. Is it possible when lang: zh-CN is added or other solutions? Thank you!

@jgm
Copy link
Owner

jgm commented Jan 12, 2021

I don't know. To my knowledge, we don't do anything to change the font depending on the lang.
You might have a look at data/docx/word/styles.xml in this repository and let us know if anything there seems wrong.

@TomBener
Copy link
Contributor Author

TomBener commented Sep 5, 2022

After nearly 18 months, I’ve tried to look into this issue again. Sorry for the long time…


With the option --metadata lang=zh-CN, i.e. pandoc main.md -o main.docx --metadata lang=zh-CN, there will be two changes in the output document compared to the default command pandoc main.md -o main.docx.

  1. The property w:val for specifying Latin Languages were changed from en-US to zh-CN in word/styles.xml.
  2. <dc:language>zh-CN</dc:language> was added in docProps/core.xml.

The first change is the cause for this issue, as the Chinese language is not the Latin Language but the East Asian Language.

And why the western font (Latin font) cannot be changed in Microsoft Word after --metadata lang=zh-CN was added here? Because all texts are regarded as the Chinese language within the application. However, the typical font for instance, Cambria in the default template is unavailable for Chinese. If you apply a font like LXGW WenKai, it will work as expected.

For a general docx document created by hand in Microsoft Word, the language properties are set as below:

<lang w:val="en-US" w:eastAsia="zh-CN" w:bidi="ar-SA" />

While the Pandoc’s default ouput is (with --metadata lang=zh-CN):

<lang w:val="zh-CN" w:eastAsia="en-US" w:bidi="ar-SA" />

As a consequence, I think when the option --metadata lang=zh-CN is enabled, the property w:eastAsia should be specified as zh-CN, while the property w:val remains unchanged. I’m not sure what potential problems would appear if this property changes, but for fixing this issue, it might be taken into account.

@jgm
Copy link
Owner

jgm commented Sep 5, 2022

Thanks @TomBener

@jgm
Copy link
Owner

jgm commented Sep 5, 2022

This suggests that we should determine whether the language is an East Asian language in addLang and maybe getTextProps, modifying w:val or w:eastAsia depending on the answer. I'd like to be more convinced that this is right (e.g. with documentation) and wouldn't have other bad effects. We'd also need a way to select East Asian languages. I guess we could simply hard-code a list of langs, but I don't know what that list should be.

@TomBener
Copy link
Contributor Author

TomBener commented Sep 6, 2022

East Asian languages, also known as CJK, according to Microsoft, are defined as Chinese (Simplified), Chinese (Traditional), Japanese and Korean. And sometimes Vietnamese can be included.

In the list of BCP 47, these language tags are included:

  • zh-CN Chinese (Simplified), Mainland China
  • zh-HK Chinese (Traditional), Hond Kong
  • zh-TW Chinese (Traditional), Taiwan
  • ja-JP Japanese, Japan
  • ko-KR Korean, Republic of Korea

However, zh-HK is not included in Microsoft Office 2016 and Windows.

Hope this could be a reference for the list of langs.

@jgm jgm closed this as completed in 47dcb57 Sep 6, 2022
@jgm
Copy link
Owner

jgm commented Sep 6, 2022

Okay, thanks for diagnosing this. I've pushed a fix that seems to work, but more testing would be helpful. A nightly with this change should be available within 24 hours.

@TomBener
Copy link
Contributor Author

TomBener commented Sep 6, 2022

I just found that the language option also affects the behavior of the citation.

I want to cite both English and Chinese bibliographies in an article. Furthermore, I’d like to:

  1. Sort the English bibliography with the alphabetical order
  2. Sort the Chinese bibliography with the Pinyin order

For this purpose, this command is executed according to the manual:

pandoc main.md -o main.docx --metadata lang=zh-u-co-pinyin -C

So far, the sorting works perfectly. However, there are bad effects for the localization of the bibliography. There is an example below:

Morris, Carwyn. 2022. 《Spatial Governance in Beijing: Informality, Illegality and the Displacement of the 〈Low-end Population〉》. The China Quarterly, 八月, 1–21. https://doi.org/10.1017/S0305741022000868.

Here, 《〈〉》八月 are all for Chinese usages. It is improper to use them in the English texts.

As a comparison, the output with the default language en-US is as below, which is the expected result.

Morris, Carwyn. 2022. “Spatial Governance in Beijing: Informality, Illegality and the Displacement of the ‘Low-end Population’.” The China Quarterly, August, 1–21. https://doi.org/10.1017/S0305741022000868.

I guess this is the result of the citeproc localization, but I have no idea how to avoid the problem.

What I desire

When enabling lang=zh-u-co-pinyin, only sorting the Chinese bibliography based on the Pinyin, but not changing the localization behaviors.

@jgm
Copy link
Owner

jgm commented Sep 6, 2022

I'm not sure if citeproc offers a way to use different locale-dependent quoting styles for entries in different languages. @denismaier do you know?

Have you set the language field in the CSL JSON bibliography entries?

Note that it would not be desirable behavior, in general, for citeproc to use the quote style appropriate to the language of each source. For example, if you're writing an article in English and citing a French source, you wouldn't want to use French quotes for the title. Chinese/English might be an exception, but we'll need to ask the citeproc people if there's a way to do this.

@TomBener
Copy link
Contributor Author

TomBener commented Sep 6, 2022

Have you set the language field in the CSL JSON bibliography entries?

Nope. My bibliography file was exported as BibLaTeX from Better BibTeX for Zotero. The language field was not exported.

Chinese/English might be an exception

I think so. Generally, when writing papers in Chinese, we would like to cite the English reference as it is.

@denismaier
Copy link
Contributor

denismaier commented Sep 6, 2022

Vanilla CSL does not support this. There's an extended variant CSL-M that supports that kind of stuff, and a lot more. You can have multiple variants of a field (translations, transliteration, original script). Some of that stuff will make its way into the next version of CSL, but, IIRC, complete support for multilingual citations is still out if scope.

@denismaier
Copy link
Contributor

@fbennett has solved this in CSL-M by allowing multiple layout nodes. One argument against this was always that this may add too much complexity. As a citeproc author, do you think that would be feasible in your citeproc, or would that indeed be too complicated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants