-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: Stabilize the std::str module #19741
Conversation
@alexcrichton / @aturon: Have you considered replacing fn foo<T: AsSlice>(x: T) {
let x = x.as_slice();
...
}
foo("foo");
foo([1, 2, 3]); Otherwise to implement this pattern we'd need to make another trait to support this pattern. |
@erickt I think that it leads to ambiguities when you just call impl ::slice::AsSlice<u8> for str {
fn as_slice(&self) -> &[u8] { self.as_bytes() }
}
|
See #19612 (comment) |
re: |
@alexcrichton / @aturon: Yeah, we could do db.set("foo".as_bytes(), "abc".as_bytes()).unwrap();
db.set("bar".as_bytes(), "def".as_bytes()).unwrap();
db.set("baz".as_bytes(), "ghi".as_bytes()).unwrap(); Or add a wrapper for setting string keys with: db.set_str("foo", "abc".as_bytes()).unwrap();
db.set_str("bar", "def".as_bytes()).unwrap();
db.set_str("baz", "ghi".as_bytes()).unwrap(); There's a bit of line noise in both approaches. It would be much nicer to have something like db.set("foo", "abc").unwrap();
db.set("bar", "def").unwrap();
db.set("baz", "ghi").unwrap(); I could write a trait for my library to do this, but this pattern would then force people wanting to support |
@@ -649,10 +655,11 @@ impl BorrowFrom<String> for str { | |||
|
|||
#[unstable = "trait is unstable"] | |||
impl ToOwned<String> for str { | |||
fn to_owned(&self) -> String { self.into_string() } | |||
fn to_owned(&self) -> String { self.to_string() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcrichton Please implement this as String(self.as_bytes().to_vec())
(you may need to move it to collections/string.rs). Let's avoid degrading the performance of this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or you could use String::from_str()
, I think it does the same thing, and doesn't need moving this impl
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll switch it over.
I dislike that this leaves the obvious thing to do when converting a string literal to a I know that it's more consistent and micro-benchmarks etc., but it feels just silly for the most obvious thing to make a full roundtrip through the formatting infrastructure and a redundant check for valid utf-8, in addition to over-allocating. This was a wart when the answer was |
The question of the efficiency of the formatting subsystem is somewhat orthogonal in my mind because |
@alexcrichton Except that this PR is stabilizing this status quo. |
Remember that this is deprecating |
It's not orthogonal. You're causing a severe performance regression. Attention to performance is part of API design, and even if it was an implementation issue it is still a stupid regression. |
9346257
to
1e730b7
Compare
@alexcrichton: The conflict you're having with #![feature(lang_items, macro_rules)]
#![no_std]
#![crate_type = "staticlib"]
extern crate core;
pub unsafe fn replace<T>(dest: *mut T, mut src: T) -> T { |
use core::kinds::Sized;
#[lang = "stack_exhausted"] extern fn stack_exhausted() {}
#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} }
#[unstable = "may merge with other traits"]
pub trait AsSlice<T> for Sized? {
fn as_slice<'a>(&'a self) -> &'a [T];
}
#[unstable = "trait is unstable"]
impl<T> AsSlice<T> for [T] {
#[inline(always)]
fn as_slice<'a>(&'a self) -> &'a [T] { self }
}
impl<'a, T, Sized? U: AsSlice<T>> AsSlice<T> for &'a U {
#[inline(always)]
fn as_slice<'a>(&'a self) -> &'a [T] { AsSlice::as_slice(*self) }
}
impl<'a, T, Sized? U: AsSlice<T>> AsSlice<T> for &'a mut U {
#[inline(always)]
fn as_slice<'a>(&'a self) -> &'a [T] { AsSlice::as_slice(*self) }
}
impl<'a> AsSlice<u8> for str {
#[inline(always)]
fn as_slice<'a>(&'a self) -> &'a [u8] {
unsafe { core::mem::transmute(self) }
}
} @aturon: Will the @alexcrichton: I'm very sad to see // Trigger a copy for me.
let o = ObjectBuilder::new().insert("foo", ...).unwrap();
// Move the string into the `json::Value` enum with no allocation.
let key = String::new("foo");
let o = ObjectBuilder::new().insert(key, ...).unwrap(); Since I'm betting most users are going to use I could have two APIs again, |
@aturon: This might just be your cast trait with a different name, but this variation on trait BorrowFrom<'a, To> {
fn borrow_from(&'a self) -> To;
}
impl<'a> BorrowFrom<'a, &'a [u8]> for Vec<u8> {
fn borrow_from(&'a self) -> &'a [u8] {
self.as_slice()
}
}
impl<'a> BorrowFrom<'a, &'a str> for String {
fn borrow_from(&'a self) -> &'a str {
self.as_slice()
}
}
impl<'a> BorrowFrom<'a, &'a [u8]> for String {
fn borrow_from(&'a self) -> &'a [u8] {
self.as_bytes()
}
}
impl<'a> BorrowFrom<'a, &'a [u8]> for &'a str {
fn borrow_from(&'a self) -> &'a [u8] {
self.as_bytes()
}
}
impl<'a, T: BorrowFrom<'a, U>, U> BorrowFrom<'a, U> for &'a T {
fn borrow_from(&'a self) -> U {
(**self).borrow_from()
}
}
#[deriving(Show)]
struct Datum<'a> { data: &'a [u8] }
impl<'a> BorrowFrom<'a, Datum<'a>> for String {
fn borrow_from(&'a self) -> Datum<'a> {
self.as_slice().borrow_from()
}
}
impl<'a> BorrowFrom<'a, Datum<'a>> for &'a str {
fn borrow_from(&'a self) -> Datum<'a> {
Datum { data: self.as_bytes() }
}
}
fn foo_slice<'a, T>(t: &'a T) where T: BorrowFrom<'a, &'a [u8]> {
let datum: &'a [u8] = t.borrow_from();
println!("datum: {}", datum);
}
fn foo_str<'a, T>(t: &'a T) where T: BorrowFrom<'a, &'a str> {
let datum: &'a str = t.borrow_from();
println!("datum: {}", datum);
}
fn foo_custom<'a, T>(t: &'a T) where T: BorrowFrom<'a, Datum<'a>> {
let datum: Datum<'a> = t.borrow_from();
println!("datum: {}", datum);
}
fn main() {
let s = "hello world".to_string();
foo_slice(&s);
foo_str(&s);
foo_custom(&s);
} |
e71d542
to
66925c4
Compare
@erickt Yes I didn't rename to For your use case I know @aturon has also been thinking about a generic set of conversion traits recently to serve a more broad purpose. Having lots of little one-off traits would be unfortunate for all types in the standard library (e.g. why should we not have |
@alexcrichton The discomfort (at least for me) is not so much that this basically changes the idiom from |
Yes, that's right -- for traits whose sole purpose is generic programming over conversions (i.e. providing implicit conversions via overloading), we should be able to replace them with a single set of traits that everyone knows/uses/implements. This should cut down on the problem of people having to know and implement your custom trait to be compatible with your library. |
Yep. The trait will be
I think that generic conversion traits will serve this role much better, as I mentioned in my previous comment. "Overloading over ownership" is a pattern that's emerging in several APIs (you can see it in the |
@alexcrichton Ok, I've looked this over and it looks good to me -- just a couple of tiny typos. r=me once we've resolved the (We'll need to discuss methods like |
66925c4
to
a89f819
Compare
0f118c7
to
664004f
Compare
664004f
to
213a3de
Compare
This commit starts out by consolidating all `str` extension traits into one `StrExt` trait to be included in the prelude. This means that `UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into one `StrExt` exported by the standard library. Some functionality is currently duplicated with the `StrExt` present in libcore. This commit also currently avoids any methods which require any form of pattern to operate. These functions will be stabilized via a separate RFC. Next, stability of methods and structures are as follows: Stable * from_utf8_unchecked * CowString - after moving to std::string * StrExt::as_bytes * StrExt::as_ptr * StrExt::bytes/Bytes - also made a struct instead of a typedef * StrExt::char_indices/CharIndices - CharOffsets was renamed * StrExt::chars/Chars * StrExt::is_empty * StrExt::len * StrExt::lines/Lines * StrExt::lines_any/LinesAny * StrExt::slice_unchecked * StrExt::trim * StrExt::trim_left * StrExt::trim_right * StrExt::words/Words - also made a struct instead of a typedef Unstable * from_utf8 - the error type was changed to a `Result`, but the error type has yet to prove itself * from_c_str - this function will be handled by the c_str RFC * FromStr - this trait will have an associated error type eventually * StrExt::escape_default - needs iterators at least, unsure if it should make the cut * StrExt::escape_unicode - needs iterators at least, unsure if it should make the cut * StrExt::slice_chars - this function has yet to prove itself * StrExt::slice_shift_char - awaiting conventions about slicing and shifting * StrExt::graphemes/Graphemes - this functionality may only be in libunicode * StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in libunicode * StrExt::width - this functionality may only be in libunicode * StrExt::utf16_units - this functionality may only be in libunicode * StrExt::nfd_chars - this functionality may only be in libunicode * StrExt::nfkd_chars - this functionality may only be in libunicode * StrExt::nfc_chars - this functionality may only be in libunicode * StrExt::nfkc_chars - this functionality may only be in libunicode * StrExt::is_char_boundary - naming is uncertain with container conventions * StrExt::char_range_at - naming is uncertain with container conventions * StrExt::char_range_at_reverse - naming is uncertain with container conventions * StrExt::char_at - naming is uncertain with container conventions * StrExt::char_at_reverse - naming is uncertain with container conventions * StrVector::concat - this functionality may be replaced with iterators, but it's not certain at this time * StrVector::connect - as with concat, may be deprecated in favor of iterators Deprecated * StrAllocating and UnicodeStrPrelude have been merged into StrExit * eq_slice - compiler implementation detail * from_str - use the inherent parse() method * is_utf8 - call from_utf8 instead * replace - call the method instead * truncate_utf16_at_nul - this is an implementation detail of windows and does not need to be exposed. * utf8_char_width - moved to libunicode * utf16_items - moved to libunicode * is_utf16 - moved to libunicode * Utf16Items - moved to libunicode * Utf16Item - moved to libunicode * Utf16Encoder - moved to libunicode * AnyLines - renamed to LinesAny and made a struct * SendStr - use CowString<'static> instead * str::raw - all functionality is deprecated * StrExt::into_string - call to_string() instead * StrExt::repeat - use iterators instead * StrExt::char_len - use .chars().count() instead * StrExt::is_alphanumeric - use .chars().all(..) * StrExt::is_whitespace - use .chars().all(..) Pending deprecation -- while slicing syntax is being worked out, these methods are all #[unstable] * Str - while currently used for generic programming, this trait will be replaced with one of [], deref coercions, or a generic conversion trait. * StrExt::slice - use slicing syntax instead * StrExt::slice_to - use slicing syntax instead * StrExt::slice_from - use slicing syntax instead * StrExt::lev_distance - deprecated with no replacement Awaiting stabilization due to patterns and/or matching * StrExt::contains * StrExt::contains_char * StrExt::split * StrExt::splitn * StrExt::split_terminator * StrExt::rsplitn * StrExt::match_indices * StrExt::split_str * StrExt::starts_with * StrExt::ends_with * StrExt::trim_chars * StrExt::trim_left_chars * StrExt::trim_right_chars * StrExt::find * StrExt::rfind * StrExt::find_str * StrExt::subslice_offset
41482f4
to
8c60c0e
Compare
8c60c0e
to
2728a39
Compare
2728a39
to
082bfde
Compare
This commit starts out by consolidating all `str` extension traits into one `StrExt` trait to be included in the prelude. This means that `UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into one `StrExt` exported by the standard library. Some functionality is currently duplicated with the `StrExt` present in libcore. This commit also currently avoids any methods which require any form of pattern to operate. These functions will be stabilized via a separate RFC. Next, stability of methods and structures are as follows: Stable * from_utf8_unchecked * CowString - after moving to std::string * StrExt::as_bytes * StrExt::as_ptr * StrExt::bytes/Bytes - also made a struct instead of a typedef * StrExt::char_indices/CharIndices - CharOffsets was renamed * StrExt::chars/Chars * StrExt::is_empty * StrExt::len * StrExt::lines/Lines * StrExt::lines_any/LinesAny * StrExt::slice_unchecked * StrExt::trim * StrExt::trim_left * StrExt::trim_right * StrExt::words/Words - also made a struct instead of a typedef Unstable * from_utf8 - the error type was changed to a `Result`, but the error type has yet to prove itself * from_c_str - this function will be handled by the c_str RFC * FromStr - this trait will have an associated error type eventually * StrExt::escape_default - needs iterators at least, unsure if it should make the cut * StrExt::escape_unicode - needs iterators at least, unsure if it should make the cut * StrExt::slice_chars - this function has yet to prove itself * StrExt::slice_shift_char - awaiting conventions about slicing and shifting * StrExt::graphemes/Graphemes - this functionality may only be in libunicode * StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in libunicode * StrExt::width - this functionality may only be in libunicode * StrExt::utf16_units - this functionality may only be in libunicode * StrExt::nfd_chars - this functionality may only be in libunicode * StrExt::nfkd_chars - this functionality may only be in libunicode * StrExt::nfc_chars - this functionality may only be in libunicode * StrExt::nfkc_chars - this functionality may only be in libunicode * StrExt::is_char_boundary - naming is uncertain with container conventions * StrExt::char_range_at - naming is uncertain with container conventions * StrExt::char_range_at_reverse - naming is uncertain with container conventions * StrExt::char_at - naming is uncertain with container conventions * StrExt::char_at_reverse - naming is uncertain with container conventions * StrVector::concat - this functionality may be replaced with iterators, but it's not certain at this time * StrVector::connect - as with concat, may be deprecated in favor of iterators Deprecated * StrAllocating and UnicodeStrPrelude have been merged into StrExit * eq_slice - compiler implementation detail * from_str - use the inherent parse() method * is_utf8 - call from_utf8 instead * replace - call the method instead * truncate_utf16_at_nul - this is an implementation detail of windows and does not need to be exposed. * utf8_char_width - moved to libunicode * utf16_items - moved to libunicode * is_utf16 - moved to libunicode * Utf16Items - moved to libunicode * Utf16Item - moved to libunicode * Utf16Encoder - moved to libunicode * AnyLines - renamed to LinesAny and made a struct * SendStr - use CowString<'static> instead * str::raw - all functionality is deprecated * StrExt::into_string - call to_string() instead * StrExt::repeat - use iterators instead * StrExt::char_len - use .chars().count() instead * StrExt::is_alphanumeric - use .chars().all(..) * StrExt::is_whitespace - use .chars().all(..) Pending deprecation -- while slicing syntax is being worked out, these methods are all #[unstable] * Str - while currently used for generic programming, this trait will be replaced with one of [], deref coercions, or a generic conversion trait. * StrExt::slice - use slicing syntax instead * StrExt::slice_to - use slicing syntax instead * StrExt::slice_from - use slicing syntax instead * StrExt::lev_distance - deprecated with no replacement Awaiting stabilization due to patterns and/or matching * StrExt::contains * StrExt::contains_char * StrExt::split * StrExt::splitn * StrExt::split_terminator * StrExt::rsplitn * StrExt::match_indices * StrExt::split_str * StrExt::starts_with * StrExt::ends_with * StrExt::trim_chars * StrExt::trim_left_chars * StrExt::trim_right_chars * StrExt::find * StrExt::rfind * StrExt::find_str * StrExt::subslice_offset
This commit starts out by consolidating all `str` extension traits into one `StrExt` trait to be included in the prelude. This means that `UnicodeStrPrelude`, `StrPrelude`, and `StrAllocating` have all been merged into one `StrExt` exported by the standard library. Some functionality is currently duplicated with the `StrExt` present in libcore. This commit also currently avoids any methods which require any form of pattern to operate. These functions will be stabilized via a separate RFC. Next, stability of methods and structures are as follows: Stable * from_utf8_unchecked * CowString - after moving to std::string * StrExt::as_bytes * StrExt::as_ptr * StrExt::bytes/Bytes - also made a struct instead of a typedef * StrExt::char_indices/CharIndices - CharOffsets was renamed * StrExt::chars/Chars * StrExt::is_empty * StrExt::len * StrExt::lines/Lines * StrExt::lines_any/LinesAny * StrExt::slice_unchecked * StrExt::trim * StrExt::trim_left * StrExt::trim_right * StrExt::words/Words - also made a struct instead of a typedef Unstable * from_utf8 - the error type was changed to a `Result`, but the error type has yet to prove itself * from_c_str - this function will be handled by the c_str RFC * FromStr - this trait will have an associated error type eventually * StrExt::escape_default - needs iterators at least, unsure if it should make the cut * StrExt::escape_unicode - needs iterators at least, unsure if it should make the cut * StrExt::slice_chars - this function has yet to prove itself * StrExt::slice_shift_char - awaiting conventions about slicing and shifting * StrExt::graphemes/Graphemes - this functionality may only be in libunicode * StrExt::grapheme_indices/GraphemeIndices - this functionality may only be in libunicode * StrExt::width - this functionality may only be in libunicode * StrExt::utf16_units - this functionality may only be in libunicode * StrExt::nfd_chars - this functionality may only be in libunicode * StrExt::nfkd_chars - this functionality may only be in libunicode * StrExt::nfc_chars - this functionality may only be in libunicode * StrExt::nfkc_chars - this functionality may only be in libunicode * StrExt::is_char_boundary - naming is uncertain with container conventions * StrExt::char_range_at - naming is uncertain with container conventions * StrExt::char_range_at_reverse - naming is uncertain with container conventions * StrExt::char_at - naming is uncertain with container conventions * StrExt::char_at_reverse - naming is uncertain with container conventions * StrVector::concat - this functionality may be replaced with iterators, but it's not certain at this time * StrVector::connect - as with concat, may be deprecated in favor of iterators Deprecated * StrAllocating and UnicodeStrPrelude have been merged into StrExit * eq_slice - compiler implementation detail * from_str - use the inherent parse() method * is_utf8 - call from_utf8 instead * replace - call the method instead * truncate_utf16_at_nul - this is an implementation detail of windows and does not need to be exposed. * utf8_char_width - moved to libunicode * utf16_items - moved to libunicode * is_utf16 - moved to libunicode * Utf16Items - moved to libunicode * Utf16Item - moved to libunicode * Utf16Encoder - moved to libunicode * AnyLines - renamed to LinesAny and made a struct * SendStr - use CowString<'static> instead * str::raw - all functionality is deprecated * StrExt::into_string - call to_string() instead * StrExt::repeat - use iterators instead * StrExt::char_len - use .chars().count() instead * StrExt::is_alphanumeric - use .chars().all(..) * StrExt::is_whitespace - use .chars().all(..) Pending deprecation -- while slicing syntax is being worked out, these methods are all #[unstable] * Str - while currently used for generic programming, this trait will be replaced with one of [], deref coercions, or a generic conversion trait. * StrExt::slice - use slicing syntax instead * StrExt::slice_to - use slicing syntax instead * StrExt::slice_from - use slicing syntax instead * StrExt::lev_distance - deprecated with no replacement Awaiting stabilization due to patterns and/or matching * StrExt::contains * StrExt::contains_char * StrExt::split * StrExt::splitn * StrExt::split_terminator * StrExt::rsplitn * StrExt::match_indices * StrExt::split_str * StrExt::starts_with * StrExt::ends_with * StrExt::trim_chars * StrExt::trim_left_chars * StrExt::trim_right_chars * StrExt::find * StrExt::rfind * StrExt::find_str * StrExt::subslice_offset
These were breaking. |
This commit starts out by consolidating all
str
extension traits into oneStrExt
trait to be included in the prelude. This means thatUnicodeStrPrelude
,StrPrelude
, andStrAllocating
have all been merged intoone
StrExt
exported by the standard library. Some functionality is currentlyduplicated with the
StrExt
present in libcore.This commit also currently avoids any methods which require any form of pattern
to operate. These functions will be stabilized via a separate RFC.
Next, stability of methods and structures are as follows:
Stable
Unstable
Result
, but the error type hasyet to prove itself
the cut
the cut
libunicode
it's not certain at this time
Deprecated
not need to be exposed.
Pending deprecation -- while slicing syntax is being worked out, these methods
are all #[unstable]
replaced with one of [], deref coercions, or a generic conversion trait.
Awaiting stabilization due to patterns and/or matching