Skip to content

Commit

Permalink
Do auto-unwrapping of multi-byte chars only for Chinese locale
Browse files Browse the repository at this point in the history
Fixes #300.
  • Loading branch information
kaushalmodi committed Nov 30, 2019
1 parent 762b7cc commit 782741f
Show file tree
Hide file tree
Showing 7 changed files with 135 additions and 20 deletions.
23 changes: 23 additions & 0 deletions doc/ox-hugo-manual.org
Original file line number Diff line number Diff line change
Expand Up @@ -2828,6 +2828,29 @@ Here's an example of inlining an SVG:
#+caption: An SVG with hyperlink
#+attr_html: :inlined t
[[file:../test/site/content-org/images/svg-with-hyperlinks.svg]]
*** Chinese Support
:PROPERTIES:
:EXPORT_FILE_NAME: chinese-support
:END:
**** Auto-unwrapping of lines with multi-byte characters
This issue came up on this [[https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547][emacs-china thread]] .. the issue was that
when consecutive lines in Org source had Chinese characters, in HTML
the last character on one line got separated from the first character
on the next line by a space, which is not grammatically correct in
Chinese.

So in such cases, those lines must be unwrapped _without any spaces_
to separate those characters across the lines.

That of course would not be grammatically correct in English and even
other languages with multi-byte characters (few examples: Hindi,
Gujarati).

So that line-unwrapping _with space removal_ is done *only if*,
1. The /locale/ is [[https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html][auto-detected]] to be Chinese via environment
variables /LANGUAGE/, /LC_ALL/ or /LANG/, or
2. The /locale/ is *manually set* to Chinese by setting it to *zh*
using ~#+hugo_locale:~ keyword (or ~EXPORT_HUGO_LOCALE~ property).
*** COMMENT Hugo Bundle
:PROPERTIES:
:EXPORT_FILE_NAME: hugo-bundle
Expand Down
55 changes: 40 additions & 15 deletions ox-hugo.el
Original file line number Diff line number Diff line change
Expand Up @@ -1635,6 +1635,35 @@ INFO is a plist used as a communication channel."
;; (message "[ox-hugo section-path DBG] section path: %S" section-path)
section-path))

;;;; Get Language
(defun org-hugo--get-lang (info)
"Return the language used for the content.

The returned value is a string that can consist of only English
alphabets and an underscore.

The first 2 characters of this string is a language codes as per
ISO 639-1 standard. See
https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes.

INFO is a plist used as a communication channel."
(let ((lang (plist-get info :lang-iso-code)))
(unless lang
(setq lang
(or (plist-get info :hugo-locale)
;; https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html
(getenv "LANGUAGE")
(getenv "LC_ALL")
(getenv "LANG")))
(when (stringp lang)
(setq lang
(replace-regexp-in-string "\\`\\([a-z]+_[A-Z]+\\).*\\'" "\\1" lang)))
(setq lang (org-string-nw-p lang))
(when lang
;; (message "[org-hugo--get-lang DBG] language: %s" lang)
(plist-put info :lang-iso-code lang)))
lang))



;;; Transcode Functions
Expand Down Expand Up @@ -2434,15 +2463,18 @@ communication channel."
ret)

;; (message "[org-hugo-paragraph DBG] para 1: %s" contents)

;; Join consecutive Chinese lines into a single long line without
;; unwanted space inbetween.
;; https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5
;; Example: 这是一个测试 -> 这是一个测试文本 ("This is a test text")
;; 文本
(setq contents (replace-regexp-in-string
"\\([[:multibyte:]]\\)[[:blank:]]*\n[[:blank:]]*\\([[:multibyte:]]\\)" "\\1\\2"
contents))
;; (message "[org-hugo-paragraph DBG] para 2: %s" contents)
(when (member (org-hugo--get-lang info) '("zh"))
;; https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5
;; Example: 这是一个测试 -> 这是一个测试文本 ("This is a test text")
;; 文本
(setq contents (replace-regexp-in-string
"\\([[:multibyte:]]\\)[[:blank:]]*\n[[:blank:]]*\\([[:multibyte:]]\\)" "\\1\\2"
contents))
;; (message "[org-hugo-paragraph DBG] para 2: %s" contents)
)

(unless (org-hugo--plist-get-true-p info :hugo-preserve-filling)
(setq contents (concat (mapconcat 'identity (split-string contents) " ") "\n")))
Expand Down Expand Up @@ -3154,14 +3186,7 @@ INFO is a plist used as a communication channel."
(creator (and (plist-get info :with-creator)
(plist-get info :creator)))
(locale (and (plist-get info :hugo-with-locale)
(let* ((lang (or (plist-get info :hugo-locale)
;; https://www.gnu.org/software/gettext/manual/html_node/Locale-Environment-Variables.html
(getenv "LANGUAGE")
(getenv "LC_ALL")
(getenv "LANG")))
(lang (when (stringp lang)
(replace-regexp-in-string "\\`\\([a-z]+_[A-Z]+\\).*\\'" "\\1" lang))))
lang)))
(org-hugo--get-lang info)))
(description (org-string-nw-p (plist-get info :description)))
(aliases-raw (let ((aliases-raw-1 (org-string-nw-p (plist-get info :hugo-aliases))))
(when aliases-raw-1
Expand Down
44 changes: 41 additions & 3 deletions test/site/content-org/all-posts.org
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ get escaped.. =foo_bar= must not become =foo\_bar=.
:END:
This post will be exported without =title= in the front-matter because
it is explicitly set to /empty/ using =:EXPORT_TITLE:=.
** En dash --, Em dash ---, Horizontal ellipsis ... in titles :@upstream:
** En dash --, Em dash ---, Horizontal ellipsis ... in titles :@upstream:
:PROPERTIES:
:EXPORT_FILE_NAME: en-dash-em-dash-hellip-in-titles
:END:
Expand Down Expand Up @@ -4938,8 +4938,10 @@ current subtree.
abc
def
ghi
这是一个测试
文本

{{{oxhugoissue(300)}}} --
Тест
Тест
** Filling is not preserved
:PROPERTIES:
:EXPORT_FILE_NAME: filling-is-not-preserved
Expand All @@ -4948,8 +4950,43 @@ ghi
abc
def
ghi
** Chinese locale :chinese:locale:
:PROPERTIES:
:EXPORT_HUGO_LOCALE: zh
:END:
*** Filling automatically not preserved for Chinese characters (preserve filling on)
:PROPERTIES:
:EXPORT_FILE_NAME: filling-not-preserved-for-chinese-characters--preserve-filling-on
:EXPORT_HUGO_PRESERVE_FILLING: t
:END:
#+begin_description
Ensure that multi-byte characters are force-unwrapped if the locale is
manually set or auto-detected as Chinese.
#+end_description
abc
def
ghi
这是一个测试
文本

[[https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5][Reference]]
*** Filling automatically not preserved for Chinese characters (preserve filling off)
:PROPERTIES:
:EXPORT_FILE_NAME: filling-not-preserved-for-chinese-characters--preserve-filling-off
:EXPORT_HUGO_PRESERVE_FILLING:
:END:
#+begin_description
Ensure that even when auto-unwrapping is enabled, no space is kept
between the unwrapped multi-byte characters if the locale is manually
set or auto-detected as Chinese.
#+end_description
abc
def
ghi
这是一个测试
文本

[[https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5][Reference]]
* Section Inheritance :section_inheritance:
** Section A
:PROPERTIES:
Expand Down Expand Up @@ -6360,6 +6397,7 @@ y slug para múltiples publicaciones, pero en diferentes idiomas.
:EXPORT_AUTHOR: Feng Ruohang
:EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :source https://github.com/Vonng/Math/blob/master/nndl/nn-intro.md
:EXPORT_DATE: 2017-11-29
:EXPORT_HUGO_LOCALE: zh
:END:
- Disclaimer :: This post is from the [[https://github.com/Vonng/Math/blob/master/nndl/nn-intro.md][link]] posted by GitHub user
[[https://github.com/Vonng][*Vonng*]] in [[https://github.com/gohugoio/hugo/issues/234#issuecomment-347725391][this comment]]. All credit for this post
Expand Down
2 changes: 1 addition & 1 deletion test/site/content/posts/filling-is-not-preserved.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ tags = ["filling"]
draft = false
+++

abc def ghi 这是一个测试文本
abc def ghi
5 changes: 4 additions & 1 deletion test/site/content/posts/filling-is-preserved.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,7 @@ draft = false
abc
def
ghi
这是一个测试文本

`ox-hugo` Issue #[300](https://github.com/kaushalmodi/ox-hugo/issues/300) --
Тест
Тест
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
+++
title = "Filling automatically not preserved for Chinese characters (preserve filling off)"
description = "Ensure that even when auto-unwrapping is enabled, no space is kept between the unwrapped multi-byte characters if the locale is manually set or auto-detected as Chinese."
tags = ["filling", "chinese", "locale"]
draft = false
+++

abc def ghi 这是一个测试文本

[Reference](https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5)
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
+++
title = "Filling automatically not preserved for Chinese characters (preserve filling on)"
description = """
Ensure that multi-byte characters are force-unwrapped if the locale is
manually set or auto-detected as Chinese.
"""
tags = ["filling", "chinese", "locale"]
draft = false
+++

abc
def
ghi
这是一个测试文本

[Reference](https://emacs-china.org/t/ox-hugo-auto-fill-mode-markdown/9547/5)

0 comments on commit 782741f

Please sign in to comment.