-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use into_string()
instead of to_string()
on string literals
#19708
Conversation
I requested @gankro remove his r+ because I have some reservations about this move. Specifically, I plan on deprecating the The I'll also note that calling In terms of performance, the |
The only thing that's unknown is the additional cost of the formatting infrastructure on top of the memory allocations, but I think it's going to be insignificant relative to the cost of memory allocation. |
The fact that it over-allocates up to the next power of 2 makes it easy to calculate the excess memory usage. Just take a look at the size classes here: An 85 byte string will consume 160 bytes instead of 96 bytes, a 4.5MiB string will consume 8MiB instead of 5MiB and so on. |
(If it only did 1 allocation, then this would actually matter) |
A quick benchmark shows the opposite, I have no idea why:
|
Ah, it was the optimization level, my bad:
|
So it looks like it's |
We can make conclusions about memory usage by calculating the waste. You don't need a benchmark to know that it increases the memory usage of many ranges of string sizes by up to 2x. |
@tbu- thanks for collecting some data! I would be very interested in the impact on the compiler as well in terms of this patch. @thestinger I agree that these are drawbacks of the current formatting implementation, but I do not want to jump to conclusions about memory usage or runtime without analyzing the data. For example the profile of @tbu-'s benchmark shows the hottest rust function is
To be clear I'm not disagreeing with your technical points, I'd just like to emphasize that we still need data to draw conclusions from this change as its sometimes surprising. I do not expect, for example, that formatting with a size hint would reduce memory usage of the compiler by half. I would be, however, curious in the impact of the memory usage on the compiler if we inserted a |
@alexcrichton I agree with you in that
I'm going to leave this alternative as a suggestion.
Advantages:
Disadvantages:
Feel free to close this PR, as I don't expect |
Calculating the excess memory usage is not jumping to conclusions.
The compiler is not representative of every application. It's enough to calculate the excess memory usage for the different sizes of strings and then apply that to the observed distribution of string sizes in a few domains (Rust or not) to figure out the increase in memory usage. |
A naive micro-benchmark is unable to measure the cost of memory allocation. It will just be grabbing the same memory over and over again without the usual bookkeeping work and cache misses. |
In-place reallocation rarely succeeds in the real world but it will succeed every time in a naive micro-benchmark because nothing else has been allocated so the cost won't be apparent. It will also be grabbing the same small allocations over and over again from the thread cache. This is a great example of why performing benchmarks can decrease your understanding of the performance rather than increasing it. The suggestion to benchmark / measure with |
@alexcrichton When does this call |
@japaric your suggestion of @tbu- whenever formatting finishes it calls |
I don't like 'to owned' because it's not specific enough. It used to refer to owned pointers but we don't have those anymore, and for good reason.= |
Please don't completely remove it. I just completed a lint to catch exactly this https://github.com/hyperium/hyper/blob/pocketlint/hyperlint/src/lib.rs#L64 |
@seanmonstar |
I'd like to reiterate that we should not be jumping to conclusions about how slow or fast code is without profiling and analyzing it. To be super clear about the
Before collecting these numbers, I made the following changes:
Primarily we can see a drastic improvement by avoiding the |
As I pointed out above, the benchmark does not include the cost of reallocations because it's too naive. I guess you didn't read what I wrote. If you did, then it's blatant misinformation. |
No, you just didn't take the time to read the claims in this thread:
|
I ran into this a while back (based on profiler feedback about my full application), you can see the relevant change in my repo, but the core part was:
|
Now that #19741 has landed, I'm going to close this now that |
The
to_string()
method goes through the formatting machinery and tends to over-allocate memory (always2^N
bytes). On the other handinto_string()
does a memcpy and doesn't over-allocate. I don't see any reason to use the former if we can use the latter.r? @alexcrichton