-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document encode_utf16() endianness; maybe add endianness and BOM options. #83102
Comments
rust/library/core/src/char/methods.rs Lines 1641 to 1646 in 0ab7c1d
The decode functions also assume native endian UTF-16. This makes sense as a default. If necessary, endian conversion can be done before decoding or after encoding by mapping the |
Thanks. My read was too quick.
I can submit a PR if folks like.
I am now thoroughly confused, as usual. I swear I saw something with endianness somewhere in std, but I can't find it now. Anyhow, I can add the documentation about endianness and the lack of a BOM in the appropriate spots. LMK what you think of me getting a PR together. Thanks! |
Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
Rollup merge of rust-lang#136283 - hkBst:patch-31, r=workingjubilee Update encode_utf16 to mention it is native endian Fixes rust-lang#83102
The documentation does not specify the endianness of
str::encode_utf16()
andchar::encode_utf16()
: it looks from the source like they are big-endian (UTF-16BE), but I may be reading it wrong and they are little-endian (UTF-16LE) or native-endian.This may be a deliberate design decision: if so I think it should be reconsidered, as the encoding is useless for some purposes if you don't know its endianness.
It would also be nice to indicate whether
str::encode_utf16()
inserts a byte-order mark (BOM): pretty sure it does not from the source, which is fine.It is probably too late to rename these functions or to add equivalents of opposite endianness at this point, which is too bad. It's an odd API given that the corresponding decode functions have little-endian and big-endian variants.
The text was updated successfully, but these errors were encountered: