Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<algorithm>: count() is missing vectorization opportunities in C++14/17 #3245

Closed
StephanTLavavej opened this issue Nov 29, 2022 · 0 comments · Fixed by #3262
Closed

<algorithm>: count() is missing vectorization opportunities in C++14/17 #3245

StephanTLavavej opened this issue Nov 29, 2022 · 0 comments · Fixed by #3262
Labels
fixed Something works now, yay! performance Must go faster

Comments

@StephanTLavavej
Copy link
Member

While investigating #3244, I found something curious - count() was affected by that correctness bug only in C++20/23 mode when vector iterators are used (see https://godbolt.org/z/dcTscbeTn ). This is because there's a performance bug in C++14/17 mode.

There are 4 uses of _Within_limits in the STL (affected by said correctness bug, which is irrelevant for the purposes of this performance bug). Each use is preceded by checking _Vector_alg_in_find_is_safe.

  • 💚 ranges::_Count_fn::_Count_unchecked:

    STL/stl/inc/algorithm

    Lines 442 to 453 in e28f956

    template <class _It, class _Se, class _Ty, class _Pj>
    _NODISCARD static constexpr iter_difference_t<_It> _Count_unchecked(
    _It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj) {
    _STL_INTERNAL_STATIC_ASSERT(input_iterator<_It>);
    _STL_INTERNAL_STATIC_ASSERT(sentinel_for<_Se, _It>);
    _STL_INTERNAL_STATIC_ASSERT(indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>);
    #if _USE_STD_VECTOR_ALGORITHMS
    if constexpr (is_same_v<_Pj, identity> && _Vector_alg_in_find_is_safe<_It, _Ty>
    && sized_sentinel_for<_Se, _It>) {
    if (!_STD is_constant_evaluated()) {
    if (!_Within_limits(_First, _Val)) {
  • 💚 std::_Find_unchecked:

    STL/stl/inc/xutility

    Lines 5603 to 5612 in e28f956

    template <class _InIt, class _Ty>
    _NODISCARD _CONSTEXPR20 _InIt _Find_unchecked(_InIt _First, const _InIt _Last, const _Ty& _Val) {
    // find first matching _Val; choose optimization
    // activate optimization for contiguous iterators to most scalar types (possibly const-qualified)
    if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) {
    #if _HAS_CXX20
    if (!_STD is_constant_evaluated())
    #endif // _HAS_CXX20
    {
    if (!_Within_limits(_First, _Val)) {
  • 💚 ranges::_Find_unchecked:

    STL/stl/inc/xutility

    Lines 5674 to 5681 in e28f956

    template <input_iterator _It, sentinel_for<_It> _Se, class _Ty, class _Pj = identity>
    requires indirect_binary_predicate<ranges::equal_to, projected<_It, _Pj>, const _Ty*>
    _NODISCARD constexpr _It _Find_unchecked(_It _First, const _Se _Last, const _Ty& _Val, _Pj _Proj = {}) {
    constexpr bool _Is_sized = sized_sentinel_for<_Se, _It>;
    if constexpr (_Vector_alg_in_find_is_safe<_It, _Ty> && _Sized_or_unreachable_sentinel_for<_Se, _It>
    && same_as<_Pj, identity>) {
    if (!_STD is_constant_evaluated()) {
    if (!_Within_limits(_First, _Val)) {
  • 🐞 std::count:

    STL/stl/inc/xutility

    Lines 5779 to 5795 in e28f956

    _EXPORT_STD template <class _InIt, class _Ty>
    _NODISCARD _CONSTEXPR20 _Iter_diff_t<_InIt> count(const _InIt _First, const _InIt _Last, const _Ty& _Val) {
    // count elements that match _Val
    _Adl_verify_range(_First, _Last);
    if constexpr (_Is_vb_iterator<_InIt> && is_same_v<_Ty, bool>) {
    return _Count_vbool(_First, _Last, _Val);
    } else {
    auto _UFirst = _Get_unwrapped(_First);
    const auto _ULast = _Get_unwrapped(_Last);
    #if _USE_STD_VECTOR_ALGORITHMS
    if constexpr (_Vector_alg_in_find_is_safe<_InIt, _Ty>) {
    #if _HAS_CXX20
    if (!_STD is_constant_evaluated())
    #endif // _HAS_CXX20
    {
    if (!_Within_limits(_UFirst, _Val)) {

The performance bug 🐞 is that std::count is templated on a wrapped iterator _InIt (in my example, a vector iterator), and we check _Vector_alg_in_find_is_safe<_InIt, _Ty> with that wrapped iterator. However, the following code correctly uses the unwrapped iterator _UFirst (with both _Within_limits and __std_count_trivial).

The 14/17 versus 20/23 dependency is because _Vector_alg_in_find_is_safe uses _Iterator_is_contiguous:

STL/stl/inc/xutility

Lines 5591 to 5593 in e28f956

template <class _Iter, class _Ty, class _Elem = _Iter_value_t<_Iter>>
_INLINE_VAR constexpr bool _Vector_alg_in_find_is_safe = // Can we activate the vector algorithms for find/count?
_Iterator_is_contiguous<_Iter> // The iterator must be contiguous so we can get raw pointers.

In C++20/23 mode, it's powered by concepts and can detect arbitrary contiguous iterators, compensating for the perf bug:

STL/stl/inc/xutility

Lines 4226 to 4229 in e28f956

#ifdef __cpp_lib_concepts
// When concepts are available, we can detect arbitrary contiguous iterators.
template <class _Iter>
inline constexpr bool _Iterator_is_contiguous = contiguous_iterator<_Iter>;

But in C++14/17 mode, it can only detect pointers, so the perf bug is revealed:

STL/stl/inc/xutility

Lines 4236 to 4239 in e28f956

#else // ^^^ defined(__cpp_lib_concepts) ^^^ / vvv !defined(__cpp_lib_concepts) vvv
// When concepts aren't available, we can detect pointers. (Iterators should be unwrapped before using this.)
template <class _Iter>
_INLINE_VAR constexpr bool _Iterator_is_contiguous = is_pointer_v<_Iter>;

As the comment says: "(Iterators should be unwrapped before using this.)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Something works now, yay! performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant