-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support all Unicode Versions #23
Conversation
NOT FINISHED, DO NOT EXECUTE :)
Hi @jquast , I am wondering if you might be interested in the "multiple unicodedata versions" problem being solved in a separate library. I created fonttools/unicodedata2#28 about this, as I know that project is already partially solving that problem. |
More than anything, I've been mulling over the idea, "How best should users select their unicode version support level?" And recently, woah! iTerm2 supports a way to switch versions, see "Unicode Version" in https://iterm2.com/documentation-escape-codes.html And, I think I can devise a way to determine the support version, by introspection of the terminal, to display 1 double-width char that is new for each unicode support level, and use report-cursor-position to determine what support level the connected terminal is at. So, in the years since I first developed wcwidth for python, there have been some enhancements to the general ecosystem for determining or setting the version level, but nothing particularly universal or portable/common. I waited for a few years to add 24-bit color support for https://github.com/jquast/blessed because there was no way to determine whether the terminal would support it, and I couldn't decide how to expose an easy API to select 24-bit color support. Over the years, all terminals implementing 24-bit colors added a So now the code is perfectly clear and straight-forward for me as a library, and all downstream applications, even users, also do not have to specify this terminal support level, even existing applications that use the library can support 24-bit colors without changes by users or the application developers. So anyway, I do think environment variable is the best way to go, at least from a terminal support level perspective. |
@@ -92,26 +97,48 @@ def flushout(): | |||
assert 'narrow Python build' in err.args[0], err.args | |||
LIMIT_UCS = 0x10000 | |||
|
|||
#: printable length of highest unicode character | |||
#: printable length of highest unicode character description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mistaken comment, revert
if inp.code == term.KEY_ENTER: | ||
break | ||
elif inp.code == term.KEY_ESCAPE or inp == chr(3): | ||
text = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not return None
bin/wcwidth-browser.py
Outdated
for version, boundaries in ZERO_WIDTH.items(): | ||
for (begin, end) in boundaries: | ||
if version == _wcmatch_version(unicode_version): | ||
for val in [_val for _val in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanuo
|
||
.. autofunction:: wcwidth._get_package_version | ||
|
||
.. autofunction:: wcwidth._wcmatch_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate. Should make function public.
Documentation for And in the README, we will be clear to spell out this transitional time of terminal support, and how to set the environment variable for version level 9, if you like, for terminals like iTerm2, to see the results magically appear in any downstream programs like bpython without changes. And that's the real goal here, if terminal applications or power users can start exporting this variable, we can have a language-independent solution for unicode version level selection. |
Codecov Report
@@ Coverage Diff @@
## master #23 +/- ##
=========================================
Coverage ? 97.84%
=========================================
Files ? 3
Lines ? 93
Branches ? 18
=========================================
Hits ? 91
Misses ? 1
Partials ? 1 Continue to review full report at Codecov.
|
Support all versions of Unicode, using the
UNICODE_VERSION
environment variable, when defined, or, for non-shells, explicitly by passing argumentunicode_version
to the wcwidth family of functions.A demonstration utility that determines the Terminal's Unicode Version is made available as a separate package, https://github.com/jquast/ucs-detect/ which contains a Problem and Solution statement, copied here:
Problem
Chinese, Japanese, Korean, and Emoticon characters are "double-wide", occupying 2 cells, instead of 1, and some other special characters are "zero-width".
Any terminal application that formats and displays these characters may have trouble determining how it will be displayed to the end-user.
This problem happens often, because the Unicode Consortium releases new versions of the Unicode Standard periodically, but the source code of libraries and applications are not updated at the same time, or at all!
Many languages and libraries continue to conform only to Unicode 5.0, which is the last version of wcwidth.c released by Markus Kuhn in 2007.
Solution
The most important factor is to determine: What version of unicode is the Terminal Emulator using?
This program, ucs-detect, is able to automatically detect the version of unicode that the connecting Terminal supports. The python wcwidth library supports all Unicode versions, 4.1.0 through 12.1.0 at time of this writing, and so it is able to select and match the correct return value for by using the given value of the UNICODE_VERSION environment variable.