-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add overloads to string trimming #14386
Comments
I think this would be a good addition. |
Chris, could you please add the following details to this proposal: namespace, type, assembly for these APIs. Once you have it, I will mark it as ready for review. |
Would also be interesting to understand if we need IgoreCase variants and if they are ordinal or linguistic comparisons. |
I've updated the post to clarify & put the details. |
Thanks! I marked it as ready for API review. |
We reviewed this issue today. We don't believe it's quite ready yet. Please see the notes for more details. |
changing state "ready for api review" => "needs more info" per @terrajobst remarks above. |
Might we also take the time to address scenarios with character escaping such as the following example:
|
|
|
@joperezr please talk to me if you are going to do any work here |
We need formal API proposal (taking into account the above feedback) |
@ChrisEelmaa do you have interest in updating the proposal here? |
@danmosemsft yes, let me catch up with the feedback |
@ChrisEelmaa just came across this PR, wondered whether you still had interest. |
@danmosemsft I've learned not to promise anything that I can't keep, been quite busy lately, but I'll definitely do it at one point unless someone else has done it :P |
@ChrisEelmaa sounds good! let us know in case you do want to take it so that we can assign the issue and provide any help if needed. |
I would be happy to take this issue on, but it seems to have stalled. Has there ever been a decision on naming, etc? I've been doing some digging just to get my head around things. I think I've a handle on everything apart from the unit tests. I know I can't just go and submit a PR today but my thinking is that I'd work off CoreClr (mscorlib/System.Private.CoreLib.sln) and this would then get automatically merged across to CoreFx. However, CoreClr has it's own set of String tests as does CoreFx. What happens here? |
First, we need to update the proposal and have it be approved before any implementation work starts. That is what @ChrisEelmaa meant about going on the provided feedback and updating it. Once that is done and assuming that the Api is approved, then the norm is to add the API to coreclr and send out a PR and then add the tests on a separate PR to corefx. We will then review the coreclr PR and make sure that with your corefx one you have enough test coverage so that we can feel comfortable merging the implementation. After that, we would merge both PRs. |
We need to address the design feedback first https://github.com/dotnet/apireviews/blob/267a7c25e418335b8aaf45c466d90425a9a944e4/2015/04-28-misc/README.md#1244-add-overloads-to-string-trimming and then update the proposal with the latest and run it one more time with the design committee. |
[Picked up from It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance. Here are some references gleaned from the original issue. UsageThere are only 2 overloads for each method, as shown in the following examples: // Default overload
"http://foo.com".RemoveStart("http://").RemoveEnd(".com") == "foo"
// StringComparison
"http://foo.com".RemoveStart("HTTP://", StringComparison.OrdinalIgnoreCase) == "foo.com" Background
Proposed APInamespace System
{
public partial class String
{
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="prefix">The prefix to remove.</param>
public string RemoveStart(string value);
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveStart(string value, StringComparison comparisonType);
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
public string RemoveEnd(string value);
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveEnd(string value, StringComparison comparisonType);
}
// Per @danmosemsft's suggestion
public static partial class MemoryExtensions
{
// Implementation specializes T for byte and char
public static ReadOnlySpan<T> RemoveStart(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
where T : IEquatable<T>;
public static ReadOnlySpan<T> RemoveEnd(this ReadOnlySpan<T> span, ReadOnlySpan<T> value)
where T : IEquatable<T>;
// .Fast
public static ReadOnlySpan<char> RemoveStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<char> RemoveEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
}
} Details, Decisions(Some of these items from @tarekgh's feedback above)
Questions
POCThis is a relatively simple addition, POC code below: /// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
public string RemovePrefix(string prefix)
=> RemovePrefix(prefix, StringComparison.Ordinal); // Or maybe its own body, depending on perf
/// <summary>
/// Removes the specified prefix if it is found at the start of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The prefix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemovePrefix(string prefix, StringComparison comparisonType)
{
if (prefix == null)
throw new ArgumentNullException(nameof(prefix));
// Alternative to the above guard
if (suffix == null)
return this;
if (prefix.Length == 0)
return this;
int len = this.Length - prefix.Length;
if (len < 0)
return this;
// main body elided - @tarekgh will advise implementation
}
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses <see cref="StringComparison.Ordinal"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
public string RemoveSuffix(string suffix)
=> RemoveSuffix(suffix, StringComparison.Ordinal); // Or maybe its own body, depending on perf
/// <summary>
/// Removes the specified suffix if it is found at the end of the string.
/// Uses the specified <see cref="StringComparison"/>.
/// </summary>
/// <param name="suffix">The suffix to remove.</param>
/// <param name="comparisonType">The string comparison method to use.</param>
public string RemoveSuffix(string suffix, StringComparison comparisonType)
{
if (suffix == null)
throw new ArgumentNullException(nameof(suffix));
// Alternative to the above guard
if (suffix == null)
return this;
if (suffix.Length == 0)
return this;
int len = this.Length - suffix.Length;
if (len < 0)
return this;
// main body elided - @tarekgh will advise implementation
} |
Copying @danmosemsft's feedback over from dupe issue:
|
[Edited] |
Updated spec: |
@jnm2 sorry we missed this in 5.0 but I moved the milestone to 6.0 at least we'll address it in the next release. The good news is you can easily have a workaround. Fortunately, @GrabYourPitchforks got #27935 which can help you in writing extension method for now as a workaround till we get this in 6.0. |
We are revisiting this proposal for two reasons:
Here is the new proposal we can look at and discuss. namespace System
{
public partial class String
{
public string TrimStart(string value, bool ignoreCase = false);
public string TrimEnd(string value, bool ignoreCase = false);
}
public static partial class MemoryExtensions
{
public static ReadOnlySpan<char> TrimStart(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
public static ReadOnlySpan<char> TrimEnd(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, bool ignoreCase = false);
}
}
namespace System.Globalization
{
public class CompareInfo
{
public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
}
} |
Tarek's latest proposal should also make implementation easier, as we can ditch the giant switch statement in Downside is that it means |
@GSPP do you have more input why you don't like the proposal? |
While I'm not @GSPP I can say that this is just adding confusion to the String class. It will be harder for newcomers to find and reason about CompareInfo and honestly, I'm in .Net like 4 years and I have never used it because most useful String overloads take StringComparison parameter. Also, I don't really like the fact of String methods being all around the place: some take StringComparison, some don't. Again, it feels very unnatural for a newcomer. Like Contains and StartsWith take it, but TrimStart don't. It feels unnatural to me too. Like an overload zoo which is not consistent. While I can understand that you want to actually clear this confusion, I believe it is impossible with current String class and I feel like it's better to leave it as newcomer friendly as possible. Yeah, it forces you to pass explicit parameter, it might not be that friendly but at least it should be consistent. I believe it's only possible with new classes, maybe specific ones like Utf16String, Utf8String but those are out of question I guess. |
@En3Tho yeah, this is my thinking. This is adding fragmentation to the API surface and discoverability is not good. I have personally very rarely touched these globalization classes. It was rarely necessary and they seemed a bit arcane to me. Handling a Even placing Of course, now that Now There's also a usability issue with the To say concretely what I would do: I would make all new APIs require a |
We are seeing the newcomers mostly interested in the ordinal behavior and not the linguistic behavior. Currently, the newcomers get confused when they use the string class and get linguistic behavior which they don't understand and that is main source of confusion. Switching the default behavior to ordinal can also create confusion. I would recommend you review dotnet/designs#207 and have your input there. This proposal trying to address the issues you are raising here. If you have better ideas to add there, we welcome that.
What options you used to pass with StringComparison? I am curious to learn more about your scenario.
We already have such cases and that is what we are trying to address to reduce that moving forward.
Mostly, only a few sectors of users who need the linguistic functionality (e.g. UI developers). From our experience, most of the users expect the ordinal behavior. That is why we are trying to not inject more linguistic functionality to the string class. I am, not really seeing a big deal for anyone want the linguistic functionality to use CompareInfo instead. Most of such users already using CultureInfo and other Globalization classes. I am curious to learn more about your scenarios when you use linguistic functionality.
In long term if we try to hide the linguistic functionality from the string class, this issue wouldn't be a concern. I want to clarify here; I am not trying to reject any feedback. I am trying to collect more data to decide about the future direction in general. If you didn't add your comments on the proposal dotnet/designs#207, I would encourage you to take some time review it and add your feedback there. |
I think this would be a grave mistake. This is not how real-world development works... People don't want do deal with complex APIs simply to perform linguistic string processing. I really do not intend any disrespect but this strong focus on ordinal APIs appears to be a bit "ivory tower" to me. (A little joking: It appears to be the Java way... 😏). I find the I just made a comment (#43956 (comment)) that is pertinent to this discussion as well. |
@GSPP Thanks for the feedback. We often see devs use FWIW, I speculated a bit on why I think devs are drawn to |
Hashed this out a bit with @tarekgh. We think the below might be a viable proposal for .NET 7. namespace System
{
public static partial class MemoryExtensions
{
public static ReadOnlySpan<T> TrimIfStartsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
public static ReadOnlySpan<char> TrimIfStartsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static Span<T> TrimIfStartsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
public static Span<char> TrimIfStartsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<T> TrimIfEndsWith<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> value);
public static ReadOnlySpan<char> TrimIfEndsWith(this ReadOnlySpan<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static Span<T> TrimIfEndsWith<T>(this Span<T> span, ReadOnlySpan<T> value);
public static Span<char> TrimIfEndsWith(this Span<char> span, ReadOnlySpan<char> value, StringComparison comparisonType);
public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, T startValue, T endValue);
public static ReadOnlySpan<T> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
public static ReadOnlySpan<char> TrimIfSurroundedBy<T>(this ReadOnlySpan<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, T startValue, T endValue);
public static Span<T> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue);
public static Span<char> TrimIfSurroundedBy<T>(this Span<T> span, ReadOnlySpan<T> startValue, ReadOnlySpan<T> endValue, StringComparison comparisonType);
}
}
// APIs on CompareInfo are optional.
// They can be implemented via existing IsPrefix / IsSuffix methods.
namespace System.Globalization
{
public class CompareInfo
{
public string TrimPrefix(string source, string prefix, CompareOptions options = CompareOptions.None);
public string TrimSuffix(string source, string suffix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimPrefix(ReadOnlySpan<char> source, ReadOnlySpan<char> prefix, CompareOptions options = CompareOptions.None);
public static ReadOnlySpan<char> TrimSuffix(ReadOnlySpan<char> source, ReadOnlySpan<char> suffix, CompareOptions options = CompareOptions.None);
}
} It differs from the earlier proposal in that it's focused on spans for the time being, which allows callers to use the generic versions on more data types. ReadOnlySpan<char> input1 = "Hello there!";
input1 = input1.TrimIfStartsWith(input1, "Hello");
Console.WriteLine(input1); // " there!"
ReadOnlySpan<char> input2 = "Hi there!";
input2 = input2.TrimIfStartsWith(input2, "Hello");
Console.WriteLine(input2); // "Hi there!"
// using new "u8" literal syntax!
ReadOnlySpan<byte> input3 = "https://example.com/"u8;
input3 = input3.TrimIfStartsWith(input3, "https://"u8);
Console.WriteLine(Encoding.UTF8.GetString(input3)); // "example.com/"
// Removing surrounding parentheses.
ReadOnlySpan<char> input4 = "(Hello world!)";
input4 = input4.TrimIfSurroundedBy('(', ')');
Console.WriteLine(input4); // "Hello world!" (no parens) The behavior for the generic method is Ordinal. For the T = char methods, you can specify a StringComparison parameter, matching the pattern of most other T = char methods on MemoryExtensions. We already have methods called MemoryExtensions.TrimStart and MemoryExtensions.TrimEnd, which both implement "trim any" semantics. That is, To work around this, we propose the method names TrimIfStartsWith and TrimIfEndsWith. This makes it clear that the logical equivalent of a StartsWith / EndsWith check is taking place, and if that call returns a match, then the trim takes place. The methods hanging off of CompareInfo are named TrimPrefix and TrimSuffix to match the other *Prefix and *Suffix methods on the type. We didn't want to bring these names forward onto MemoryExtensions directly since we don't commonly use these terms outside the CompareInfo class. The CompareInfo class would accept both string and span inputs, matching the pattern of the other methods on the type. There are no methods hanging directly off of string. We can add those later once details about how we want to expose culture information on the string type are hashed out and what the default options would be. In the meantime, if you need the equivalent of a string-returning The extension methods themselves would be utilize the new CompareInfo.IsPrefix and CompareInfo.IsSuffix overloads we added in .NET 5. The TrimIfSurroundedBy method is a bit of a special case since it wraps several operations together. It performs an IsPrefix check against the starting value and trims, then it performs an IsSuffix check against the ending value and trims. If both the prefix and the suffix check succeeded, then the final trimmed span is returned. If either the prefix or the suffix check does not succeed, then this is considered "no match" and the original span is returned. The culture-aware behavior of trimming between checking IsPrefix and IsSuffix is to avoid the situation where the start and end values might overlap within the target span. Further discussionI've seen a few cases where callers want to know if any prefix / suffix was matched before trimming, since they'll want to perform some extra operation. For example: ReadOnlySpan<char> originalInput = GetSomeInput();
var trimmedInput = originalInput.TrimIfSurroundedBy('\"', '\"');
DoSomeOperation();
if (wasOriginallySurroundedByQuotes)
ReintroduceQuotesToOutput(); For these cases, rather than complicate the APIs proposed here with In MemoryExtensions, all methods are Ordinal by default unless the caller specifies a custom StringComparison. This includes APIs like StartsWith and EndsWith. However, the APIs string.StartsWith and string.EndsWith use CurrentCulture by default, meaning the two lines below can produce different results: bool startsWith1 = myString.StartsWith("Hello");
bool startsWith2 = myString.AsSpan().StartsWith("Hello"); If we do end up adding these methods directly to string, we should be mindful of the possibility for confusion when people call these APIs. It would be confusing for developers if string.StartsWith returns false (because it's using a CurrentCulture comparison by default) but string.TrimIfStartsWith strips data anyway (because it's using Ordinal comparison by default). |
|
This is the second proposal of the method, these two questions are not connected in any way. |
Updated proposal (new)
Edit May 16, 2022 by @GrabYourPitchforks. See #14386 (comment) for further discussion.
Updated Proposal (old)
It is useful to have methods that trim a specified prefix or suffix from a string. This is somewhat simple for a developer to write themselves, but it seems to be a common enough request that it would be useful to support directly, especially in regards to robustness and performance.
Here are some references gleaned from the original issue.
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing
http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string
http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string
http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Usage
There are only 2 overloads for each method, as shown in the following examples:
Details, Decisions
(Some of these items from @tarekgh's feedback above)
Naming aligns with existing patterns StartsWith, EndsWith, TrimStart, TrimEnd
Decision: namespace System
Decision: No bool repeat overloads. The callsite can call this recursively (at the risk of more allocations).
Looking that linked StackOverflow questions it seems folks want the non-repeating behavior, i.e. "SIdId".RemoveSuffix("Id") should return "SId", not "S".
We don't want to introduce overloads for TrimEnd and TrimStart that have a non-repeating behavior because it's inconsistent with the existing ones.
At the same time, we feel the bool option is overkill and not very readable from the call site.
Questions
Should be instance methods on String (as opposed to extension methods)?
Should we add corresponding methods to TextInfo (or whatever they care called in globalization)?
If a null value for prefix or suffix is provided, should we throw or just return this?
Old Proposal
The proposal is to add new extension methods / overloads for string trimming,
As you know, right now, it's only possible to trim individual characters from a string, but I would like to trim suffixes & prefixes.
An example usage:
I feel like they should've be there. It seems kind of weird to me, as I've implemented these by myself in the past few times, and am sure there are quite few people who miss this & would find it useful:
http://stackoverflow.com/questions/7170909/trim-string-from-end-of-string-in-net-why-is-this-missing
http://stackoverflow.com/questions/4101539/c-sharp-removing-strings-from-end-of-string
http://stackoverflow.com/questions/5284591/how-to-remove-a-suffix-from-end-of-string
http://stackoverflow.com/questions/4335878/c-sharp-trimstart-with-string-parameter
Now the following statement is not true, but if it would be, it would describe how I am feeling: "ooh hey, we offer you string replace method which can replace individual chars, but not the whole strings. It's not too hard to craft your own one, give it a try!"
The following applies to
TrimEnd
/TrimStart
, but the overloads would be 1:1, so I will discuss onlyTrimEnd
.First overload:
public string TrimEnd(string suffix)
Behaviour: trim the suffix, case-sensitive, and only once. The comparision is done the same way, as it would be for
string.Replace
."MyTestTest".TrimEnd("Test") == "MyTest" == true
Second overload:
public string TrimEnd(string suffix, StringComparison comparison)
Works as the first one, but allows you to explictly tell how to compare.
"MyTESTtest".TrimEnd("test", StringComparison.InvariantCultureIgnoreCase) == "MyTEST" == true
Third overload(s):
I am not sure if these are needed, but I wanted to throw this out here anyway:
This proposal has nothing to do with string.Trim(), as it would be ambigious.
"tetet".Trim("tet") == ???
Namespace: System
Type: System.String
Assembly: System.Runtime.dll
I'd be willing to work on this :3
The text was updated successfully, but these errors were encountered: