Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Statically enforce Unicode in std::fmt #526

Merged
merged 1 commit into from
Dec 30, 2014

Conversation

alexcrichton
Copy link
Member

Statically enforce that the std::fmt module can only create valid UTF-8 data
by removing the arbitrary write method in favor of a write_str method.

Rendered

Statically enforce that the `std::fmt` module can only create valid UTF-8 data
by removing the arbitrary `write` method in favor of a `write_str` method.
@alexcrichton
Copy link
Member Author

cc @SimonSapin
cc #504

@SimonSapin
Copy link
Contributor

👍

@SimonSapin
Copy link
Contributor

The fmt::Writer trait can also be located as io::TextWriter instead to emphasize its possible future connection with I/O, although there are not concrete plans today to develop these connections.

It is unclear to what degree a fmt::Writer needs to interact with io::Writer and the various adaptors/buffers. For example one would have to implement their own BufferedWriter for a fmt::Writer.

I think it’s fine for this trait to be in std::fmt. My current plan is to consider it an implementation detail of formatting, experiment on crates.io with separate traits for general-purpose Unicode streams, and if that proves successful propose adding them to std at some point after Rust 1.0. Then, std::fmt::Writer could (maybe, if it’s compatible) become a re-export of std::io::TextWriter (or whatever it’ll end up being named.)

@sfackler
Copy link
Member

I seem to remember from way back when I was working on the base64 and hex modules that push_char was significantly slower than push_byte. Would it be worth adding an unsafe write_u8 as well for things like {, ,, etc? Note that I haven't actually checked that this is still the case :)

@alexcrichton
Copy link
Member Author

I'd probably be more in favor of an unsafe function to go from &'a u8 to &'a str and then calling write_str instead, would you be ok with that?

@SimonSapin
Copy link
Contributor

@sfackler Looks like String::push (formerly push_char) doen’t have a fast path for ASCII (single-byte) code points. Would that help?

Also, if the char/byte is literal, how does .push_str("{") (with a &str of length 1) compare?


```rust
pub trait Writer {
fn write_str(&mut self, data: &str) -> Result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Result need to be instantiated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is std::fmt::Result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, thanks. I somehow missed that one looking for other Results.

@sfackler
Copy link
Member

@SimonSapin I'd imagine that'd help, but I ran into this like ~1.5 years ago so my memory's a bit rusty :)

@alexcrichton Yeah, implementations just transmuting from &[u8] to &str seems fine for now.

@SimonSapin
Copy link
Contributor

@sfackler You can always used str::from_utf8_unchecked, but that seems unnecessary for the example you gave where you can just have a "{" string literal. (As opposed to '{'.)

@Sgeo
Copy link

Sgeo commented Dec 16, 2014

Could it be confusing for newcomers for the stdlib to have two traits called Writer? Having to check to see which is in use, etc.

@SimonSapin
Copy link
Contributor

@Sgeo I think we more or less have a convention to take advantage of the namespacing provided by modules and not try to duplicate it within names. Rather than have use std::fmt::Writer; and later use Writer (which is unclear), you can (and maybe should) have use std::fmt; and then use fmt::Writer.

@aturon
Copy link
Member

aturon commented Dec 30, 2014

This RFC, which represents a longstanding request to enforce a strong Unicode convention and appears to have no real downsides, has been accepted. Tracking issue.

@lambda-fairy
Copy link
Contributor

I notice that in the "after" benchmark, .to_string() is still about 3x slower for small strings. Can we do anything about the remaining overhead?

@alexcrichton alexcrichton deleted the fmt-text-writer branch January 7, 2015 05:23
@Centril Centril added the A-fmt Proposals relating to std::fmt and formatting macros. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Proposals relating to std::fmt and formatting macros.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants