Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make writeable_cmp_bytes a free function #5737

Merged
merged 13 commits into from
Oct 28, 2024
Merged

Conversation

sffc
Copy link
Member

@sffc sffc commented Oct 25, 2024

It simplifies writeable impls. There is no impact on benchmarks, either the new microbenchmark I added or the higher-level strict_cmp benchmark in the icu_locale_core crate:

     Running benches/langid.rs
langid/compare/strict_cmp/langid
                        time:   [271.42 ns 272.76 ns 274.20 ns]
                        change: [-0.2857% +0.0019% +0.3042%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

     Running benches/locale.rs
locale/compare/strict_cmp/locale
                        time:   [370.74 ns 371.09 ns 371.44 ns]
                        change: [-0.4829% -0.1979% +0.0605%] (p = 0.16 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

Manishearth
Manishearth previously approved these changes Oct 25, 2024
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this simplifies. There writeable argument is kind of fundamental so I think this should be a method.

@sffc sffc requested a review from a team as a code owner October 25, 2024 20:54
@sffc
Copy link
Member Author

sffc commented Oct 25, 2024

The trait should only have things that can actually be specialized. Defaulted trait functions create maintenance burden and code bloat. I want to move more toward free utility functions (#5738).

@sffc sffc requested a review from robertbastian October 25, 2024 20:56
@Manishearth
Copy link
Member

I agree on avoiding exposing functions like this for overriding

@sffc
Copy link
Member Author

sffc commented Oct 25, 2024

Note that the assert_writeable_eq impl gets a fair bit shorter with this change, because we reduced the surface of trait impl that we need to test.

@sffc
Copy link
Member Author

sffc commented Oct 26, 2024

I think a free function is good, but if you don't like the ergonomics, it could be added by trait injection

pub trait WriteableCmpBytes {
    pub fn writeable_cmp_bytes(...)
}
impl<T: Writeable + ?Sized> WriteableCmpBytes for T { ... }

But I prefer free functions.

Manishearth
Manishearth previously approved these changes Oct 26, 2024
@sffc
Copy link
Member Author

sffc commented Oct 27, 2024

Additional context: I had originally made this a function on the trait because I had thought that it could be specialized for certain types. However, a year in, we haven't found cases where a concrete type can offer a more efficient implementation of this function than the default, according to the benchmarks in this PR. So, there is no longer a technical reason why it needs to be a trait function.

Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I buy the consistency argument. It would be nice if Rust had final trait methods for things like this (I think we discussed this at Rust conf at some point?).

@@ -72,7 +72,7 @@ writeable::impl_display_with_writeable!(ComplexWriteable<'_>);

const SHORT_STR: &str = "short";
const MEDIUM_STR: &str = "this is a medium-length string";
const LONG_STR: &str = "this string is very very very very very very very very very very very very very very very very very very very very very very very very long";
const LONG_STR: &str = "this is a very very very very very very very very very very very very very very very very very very very very very very very very long string";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems unnecessary and invalidates benchmark history.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a longer overlap would make more meaningful benchmarks for cmp_bytes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we rename the benchmark, or have LONG_STR_DIVERGE_EARLY and LONG_STR_DIVERGE_LATE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a longer overlap would make more meaningful benchmarks for cmp_bytes

And it totally is. Just use a new const for it so all the preexisting benchmarks are kept stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the problem to be solved (the new string is only 2 bytes longer than the old one, and people who track the benchmarks over time can see this commit as the culprit) and prefer merging this as-is but I can make @Manishearth's suggested change if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people who track the benchmarks over time can see this commit as the culprit

If you're looking at the graph you don't know whether it's a benchmark setup change, or a behaviour change. All you'll see is "Make writeable_cmp_bytes a free function", which shouldn't change any benchmarks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the string the same length as before. (The bigger change requires a setup with a code editor which I don't have at the moment but will in about an hour.) I still prefer landing this to avoid the context switch later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I didn't hear back about whether my compromise was sufficient so I went ahead and implemented the solution that I believe addresses @robertbastian's blocking concern.

///
/// This returns a lexicographical comparison, the same as if the Writeable
/// were first converted to a String and then compared with `Ord`. For a
/// locale-sensitive string ordering, use an ICU4X Collator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeable should be independent of ICU4X, not sure we should recommend this here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the text slightly. I recommend to icu_collator as an external crate that does localized string sorting in the same vein as Jiff might recommend icu_calendar as an external crate that does non-Gregorian calendars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super happy with this, but won't block the PR as this is preexisting. impl Ord for str, and str I guess don't go into locale-aware collation either, and I consider this crate an alternative to those core types.

Manishearth
Manishearth previously approved these changes Oct 28, 2024
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@sffc sffc requested a review from robertbastian October 28, 2024 17:14
@sffc sffc merged commit 1159228 into unicode-org:main Oct 28, 2024
28 checks passed
@sffc sffc deleted the writeable-util branch October 28, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants