Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<format>: Optimize container insertion #1894

Merged
merged 15 commits into from
Jun 29, 2021
Merged

<format>: Optimize container insertion #1894

merged 15 commits into from
Jun 29, 2021

Conversation

sylveon
Copy link
Contributor

@sylveon sylveon commented May 2, 2021

This nukes the partial specialization of _Fmt_iterator_buffer for containers and instead uses a traits-like class for appending in the generic _Fmt_iterator_buffer. This traits-like class is then partially specialized for any container which has append or insert, preferring to pick append if possible.

There are a few benefits to this:

  • buffered, so if our size estimation is grossly wrong or a user calls format_to with a brand new string, our performance doesn't suffer drastically from it
  • faster than just disabling the existing partial specialization for basic_string, since the generic version of _Fmt_iterator_buffer suffers from the same issue as above.

One drawback, however, is that if the string's capacity is big enough to fit the output, we'll have to do copies that could be avoided.

It brings us to essentially the same performance than fmtlib according to @jovibor's benchmarking code:
image

And I'd wager potentially even faster with long strings, since fmt's implementation falls back to push_back character per character once fmt::memory_buffer runs out.

@sylveon sylveon requested a review from a team as a code owner May 2, 2021 05:45
stl/inc/format Outdated Show resolved Hide resolved
stl/inc/xstring Outdated Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
@AdamBucior
Copy link
Contributor

Should we also add a specialization for vector?

@StephanTLavavej StephanTLavavej added format C++20/23 format performance Must go faster labels May 2, 2021
@sylveon
Copy link
Contributor Author

sylveon commented May 2, 2021

@AdamBucior we could, but that's essentially the same code in triple. I'm thinking it might be better to completely get rid of specializations and use a traits-like class for appending the buffer to the iterator, so we'd have much less duplication.

@sylveon
Copy link
Contributor Author

sylveon commented May 2, 2021

I pushed a commit which enables it for any container with append or insert (so vector gets in on the fun because it has insert).

This matters to me because I'd like to keep using a stack-allocated memory buffer that grows into the heap like fmt::memory_buffer, while not losing the optimization introduced in this PR. Ideally, I shouldn't have to go and specialize std::_Iterator_appender in my own code :)

@sylveon sylveon changed the title <format>: Specialize _Fmt_iterator_buffer for back_insert_iterator<basic_string> <format>: Optimize container insertion May 2, 2021
@StephanTLavavej
Copy link
Member

I think there may be a Clang bug with CTAD here - all of the failures seem to be with clang-cl and the code appears to be well-formed, but Clang doesn't think that _Back_insert_iterator_container_access{_Out} is a structure that you can access .container of. If this indeed a bug, we should reduce and report it to Clang, and mark the workaround as TRANSITION (providing explicit template arguments should workaround the bug).

As an aside, I recommend not defining _Back_insert_iterator_container_access in <iterator> if we need it only in <format> - no reason to pay a throughput cost unnecessarily.

@sylveon
Copy link
Contributor Author

sylveon commented May 4, 2021

I fixed the clang bug with a temporary workaround

stl/inc/format Outdated Show resolved Hide resolved
stl/inc/format Outdated Show resolved Hide resolved
@CaseyCarter CaseyCarter self-assigned this May 5, 2021
@jovibor
Copy link
Contributor

jovibor commented May 6, 2021

While parties have yet to agree with the most precise approach, just want to share some numbers.

This is the performance with the current stable <format>:
image

And this is with this PR (f2d6854):
image

As you can see, while the difference for std::format_to with plain buffer is negligible, the difference with back_inserter
is drastic as is for std::format.

One interesting observation though:
If the number of iterations is reduced to 1'000 the difference with fmt becomes a bit less drastic (current stable <format>):
image

@sylveon
Copy link
Contributor Author

sylveon commented May 6, 2021

These are my results with this branch:
image

std::format_to with a plain buffer is consistently faster by a few ns here, and std::format is a lot closer to fmt::format

So I believe the differences at this point come from:

  • Differences between machines? This was ran on a 3700x
  • SSO on std::string being a lot less characters than fmt::memory_buffer's local array, so we hit allocations sooner.

@jovibor
Copy link
Contributor

jovibor commented May 6, 2021

So I believe the differences at this point come from:

  • Differences between machines?

It's the main reason, I believe.

Anyways, these changes are real performance improvements and I hope they gonna be merged one way or another.

Numbers formatting is something that exists in one form or another in almost every app on this planet. I think it's worth spending some time to improve its fundamental properties for the long term usage.

stl/inc/format Outdated Show resolved Hide resolved
@jovibor
Copy link
Contributor

jovibor commented May 19, 2021

It's been a while since last activity on the subject.
Seems like all the maintainers doing other internal stuff, after finishing C++20.

I really wouldn't be so worried if it wasn't about the upcoming horror of ABI freezing 😝
Hope this important fix will sneak into the 17.0 release at the least.

@StephanTLavavej
Copy link
Member

Fortunately, there's still plenty of time to get this into 17.0. Maintainer activity has indeed been slow recently - as you guessed, partly due to deferred internal work that we need to catch up on (apparently there are other MSVC libraries to maintain - although I can't imagine why anyone would ever need more than the STL 😹). There's also necessary STL work beyond code reviews (e.g. I spent the past several days updating the CI toolset and profiling/bug-reporting for modules). Also, maintainers including myself have been out due to vacations and vaccinations.

Making progress on the PR backlog is now one of our priorities, so the activity should pick up again. 📈

@jovibor
Copy link
Contributor

jovibor commented May 20, 2021

@StephanTLavavej
It's always nice to have a live feedback.
Usually we can only guess on things, so keep it up🚀

stl/inc/format Outdated Show resolved Hide resolved
@sylveon sylveon requested a review from CaseyCarter June 12, 2021 03:55
stl/inc/format Outdated Show resolved Hide resolved
@CaseyCarter CaseyCarter removed their assignment Jun 17, 2021
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I'll validate and push changes for test nitpicks.

@StephanTLavavej StephanTLavavej self-assigned this Jun 24, 2021
@StephanTLavavej StephanTLavavej merged commit 64aa67e into microsoft:main Jun 29, 2021
@StephanTLavavej
Copy link
Member

Thanks for significantly improving <format>'s performance! 🚀 🚀 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format C++20/23 format performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants