Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACP: Add ASCII whitespace trimming functions to &str #313

Closed
okaneco opened this issue Dec 9, 2023 · 1 comment
Closed

ACP: Add ASCII whitespace trimming functions to &str #313

okaneco opened this issue Dec 9, 2023 · 1 comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@okaneco
Copy link

okaneco commented Dec 9, 2023

Proposal

Problem statement

There is no safe and efficient way to remove leading or trailing ASCII whitespace on &str in core today.

Motivating examples or use cases

The currently available str::trim function family1 is Unicode-aware, which makes users pay a performance cost due to Unicode code point processing.

Additionally, using str::trim may increase code size due to char::is_whitespace using a 256 byte lookup table to determine Unicode whitespace.

Today, str::split_ascii_whitespace exists as an alternative to str::split_whitespace so users do not have to sacrifice performance when they are not concerned with Unicode. Even on ASCII-only text, split_ascii_whitespace is considerably faster than split_whitespace.

The added ASCII trim functions would be an extension of that performance-oriented concept and make the API more consistent between [u8] and &str, mirroring the currently unstable byte slice functions trim_ascii_start, trim_ascii_end, and trim_ascii tracked in rust-lang/rust#94035.

Solution sketch

See rust-lang/rust#118523

This proposal seeks to implement the following functions on &str, leveraging the pre-existing functions on byte slices.

pub const fn trim_ascii_start(&self) -> &str {
    // SAFETY: Removing ASCII characters from a `&str` does not invalidate UTF-8.
    unsafe { core::str::from_utf8_unchecked(self.as_bytes().trim_ascii_start()) }
}

pub const fn trim_ascii_end(&self) -> &str {
    unsafe { core::str::from_utf8_unchecked(self.as_bytes().trim_ascii_end()) }
}

pub const fn trim_ascii(&self) -> &str {
    unsafe { core::str::from_utf8_unchecked(self.as_bytes().trim_ascii()) }
}

Adding these functions would remove the need for users to write unsafe code to efficiently implement the behavior themselves.

Alternatives

Do nothing. This can be implemented safely by users with trim_matches, trim_start_matches, and trim_end_matches.

pub fn trim_ascii(s: &str) -> &str {
    s.trim_matches(|c: char| c.is_ascii_whitespace())
}

Alternatively, users could use str::as_bytes to trim using u8::is_ascii_whitespace, but they would have to use the unsafe str::from_utf8_unchecked or re-validate with the safe str::from_utf8 to return a &str.

Links and related work

Add ASCII whitespace trimming functions to &str - rust-lang/rust#118523
Tracking Issue for ASCII trim functions on byte slices - rust-lang/rust#94035

rust-lang/rust#94035 contains some discussion and desire for adding these functions to &str.

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

Footnotes

  1. str::trim, str::trim_end, str::trim_start, str::trim_matches

@okaneco okaneco added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Dec 9, 2023
@joshtriplett joshtriplett added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Dec 12, 2023
@joshtriplett
Copy link
Member

Discussed in today's @rust-lang/libs-api meeting, and we decided to accept these. We can evaluate further during stabilization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

2 participants