uxlfoundation · timmiesmith · Feb 20, 2025 · Feb 14, 2025 · Feb 14, 2025 · Feb 14, 2025
diff --git a/documentation/library_guide/introduction.rst b/documentation/library_guide/introduction.rst
@@ -145,6 +145,11 @@ Known Limitations
   (including: ``std::ldexp``, ``std::frexp``), and the following functions when used with ``std::complex<float>``
   as argument(s):  ``std::acosh``, ``std::asin``, ``std::asinh``, ``std::asoc``, ``std::log10``, ``std::log``, ``std::pow``,
   ``std::sqrt`` require device support for double precision.
+* STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of
+  the Microsoft Visual C++ standard library.
+- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function
+  in the Microsoft Visual C++ standard library. For a workaround, define the
+  ``_USE_STD_VECTOR_ALGORITHMS`` macro to `` 0`` to the source file before including any headers.
 * ``exclusive_scan``, ``inclusive_scan``, ``exclusive_scan_by_segment``,
   ``inclusive_scan_by_segment``, ``transform_exclusive_scan``, ``transform_inclusive_scan``,
   when used with C++ standard aligned policies, impose limitations on the initial value type if an
@@ -162,5 +167,15 @@ Known Limitations
   the dereferenced value type of the provided iterators should satisfy the ``DefaultConstructible`` requirements.
 * For ``remove``, ``remove_if``, ``unique`` the dereferenced value type of the provided
   iterators should be ``MoveConstructible``.
-
-.. _`SYCL Specification`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
+* When compiling with ``-O0 -g`` options on Linux with the Intel® oneAPI DPC++/C++ Compiler version 2025.0 or earlier
+  the ``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``, and ``partial_sort_copy`` may work incorrectly
+  or cause a segmentation fault when used with a device execution policy on a CPU device. To avoid this issue, pass the
+  ``-fsycl-device-code-split=per_kernel`` option to the compiler or use Intel® oneAPI DPC++/C++ Compiler version 2025.1
+  or newer.
+* ``esimd::radix_sort`` and ``esimd::radix_sort_by_key`` kernel templates fail to compile when a program
+  is built with ``-g``, ``-O0``, ``-O1`` compiler options and a Linux General Purpose Intel GPUs Driver version older
+  than ``2423.32`` (Rolling) and ``2350.61`` (LTS) is used.
+  See the `Release Types <https://dgpu-docs.intel.com/releases/releases.html>`_
+  to find information about the relevant Rolling and LTS releases.
+
+.. _`SYCL Specification`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
diff --git a/documentation/release_notes.rst b/documentation/release_notes.rst
@@ -8,6 +8,68 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
 and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
 creating efficient heterogeneous applications.
 
+New in 2022.8.0
+===============
+
+New Features
+------------
+- Added support of host policies for ``histogram`` algorithms.
+- Added support for an undersized output range in the range-based ``merge`` algorithm.
+- Improved performance of the ``merge`` and sorting algorithms
+  (``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``) that rely on Merge sort [#fnote1]_,
+  with device policies for large data sizes.
+- Improved performance of ``copy``, ``fill``, ``for_each``, ``replace``, ``reverse``, ``rotate``, ``transform`` and 30+
+  other algorithms with device policies on GPUs.
+- Improved oneDPL use with SYCL implementations other than Intel oneDPI DPC++/C++ compiler.
+
+
+Fixed Issues
+------------
+- Fixed an issue with ``drop_view`` in the experimental range-based API.
+- Fixed compilation errors in ``find_if`` and ``find_if_not`` with device policies where the user provided predicate is
+  device copyable but not trivially copyable.
+- Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with ``-O0`` and executed
+  on a GPU device.
+- Fixed an issue preventing inclusion of the ``<numeric>`` header after ``<execution>`` and ``<algorithm>`` headers.
+- Fixed several issues in the ``sort``, ``stable_sort``, ``sort_by_key`` and ``stable_sort_by_key`` algorithms that:
+
+   * Allows the use of non-trivially-copyable comparators.
+   * Eliminates duplicate kernel names
+   * Resolves incorrect results on devices with sub-group sizes smaller than four.
+   * Resolved synchronization errors that were seen on Intel® Arc™ B-series GPU devices.
+
+Known Issues and Limitations
+----------------------------
+New in This Release
+^^^^^^^^^^^^^^^^^^^
+- Incorrect results may be observed when calling ``sort`` with a device policy on Intel® Arc™ Graphics 140V with data
+  sizes of 4-8 million elements.
+- ``sort``, ``stable_sort``, ``sort_by_key`` and ``stable_sort_by_key`` algorithms fail to compile
+  when using Clang 17 and earlier versions, as well as compilers based on these versions,
+  such as Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0.
+- When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver),
+  synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is
+  specified (e.g. ``-O1``) as opposed to relying on the compiler's default optimization level.
+
+Existing Issues
+^^^^^^^^^^^^^^^
+See oneDPL Guide for other `restrictions and known limitations`_.
+
+- ``histogram`` algorithm requires the output value type to be an integral type no larger than four bytes
+  when used with an FPGA policy.
+- ``histogram`` may provide incorrect results with device policies in a program built with ``-O0`` option.
+- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows. 
+- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
+  used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``, 
+  it is required that the provided input and destination iterators are equality comparable.
+  Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
+  If these conditions are not met, the result of these algorithm calls is undefined.
+- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
+  ``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment``
+  with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
+  with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux.
+  To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead.
+
 New in 2022.7.0
 ===============