Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of release notes and applied resolved comments in #2052 #2068

Merged
merged 26 commits into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4a3646a
Introduction of release notes and applied comments in #2052
timmiesmith Feb 14, 2025
e6e872c
Capturing known issue that was new in 2022.7.0
timmiesmith Feb 14, 2025
466edc5
Remove sub-group size detail.
timmiesmith Feb 14, 2025
9df1dfc
Remove note covered in update to known limitations in library guide.
timmiesmith Feb 14, 2025
9ff8b65
Apply suggestions from code review
timmiesmith Feb 14, 2025
b1005e3
apply code review suggestions
timmiesmith Feb 14, 2025
dab51ea
fix indentation as suggested in code review
timmiesmith Feb 14, 2025
5708ebb
apply suggested change from code review.
timmiesmith Feb 18, 2025
ee1a0e2
Apply suggestions from code review
timmiesmith Feb 19, 2025
4d43cee
Remove note for issue introduced and resolved between releases.
timmiesmith Feb 19, 2025
392bd4d
Apply review comments.
timmiesmith Feb 19, 2025
30d8f00
Shortening fixed item to improve readability.
timmiesmith Feb 19, 2025
9f10b1b
Combine items to eliminate repetition.
timmiesmith Feb 19, 2025
14ee429
Correct nested list formatting.
timmiesmith Feb 19, 2025
b8cfe88
corrected Arc Graphics 140V name
timmiesmith Feb 19, 2025
9293ccf
Move long-standing issue with MS compiler to known limitations in lib…
timmiesmith Feb 19, 2025
3aad0df
Rewording known issue with older compilers.
timmiesmith Feb 19, 2025
43af7b7
Adding full name of Intel GPU driver for Linux.
timmiesmith Feb 20, 2025
beef7f6
Moving long term known issues to library guide introduction and apply…
timmiesmith Feb 20, 2025
a2b79bd
Correcting pluralization of product name.
timmiesmith Feb 20, 2025
2274e8b
Removing note on resolved issue.
timmiesmith Feb 20, 2025
82d71f3
Update the note for merge
akukanov Feb 20, 2025
7dbfcc7
Add new limitation with open-source compiler and scan algorithms
mmichel11 Feb 20, 2025
d762113
Use code case for -O0 and -O1 in new note
mmichel11 Feb 20, 2025
90ae6ef
Update scan limitation based on feedback
mmichel11 Feb 20, 2025
46fc709
update wording.
timmiesmith Feb 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions documentation/library_guide/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,11 @@ Known Limitations
(including: ``std::ldexp``, ``std::frexp``), and the following functions when used with ``std::complex<float>``
as argument(s): ``std::acosh``, ``std::asin``, ``std::asinh``, ``std::asoc``, ``std::log10``, ``std::log``, ``std::pow``,
``std::sqrt`` require device support for double precision.
* STL algorithm functions (such as ``std::for_each``) used in DPC++ kernels do not compile with the debug version of
the Microsoft Visual C++ standard library.
- ``std::array`` cannot be swapped in DPC++ kernels with ``std::swap`` function or ``swap`` member function
in the Microsoft Visual C++ standard library. For a workaround, define the
``_USE_STD_VECTOR_ALGORITHMS`` macro to `` 0`` to the source file before including any headers.
* ``exclusive_scan``, ``inclusive_scan``, ``exclusive_scan_by_segment``,
``inclusive_scan_by_segment``, ``transform_exclusive_scan``, ``transform_inclusive_scan``,
when used with C++ standard aligned policies, impose limitations on the initial value type if an
Expand All @@ -162,5 +167,15 @@ Known Limitations
the dereferenced value type of the provided iterators should satisfy the ``DefaultConstructible`` requirements.
* For ``remove``, ``remove_if``, ``unique`` the dereferenced value type of the provided
iterators should be ``MoveConstructible``.

.. _`SYCL Specification`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
* When compiling with ``-O0 -g`` options on Linux with the Intel® oneAPI DPC++/C++ Compiler version 2025.0 or earlier
the ``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``, and ``partial_sort_copy`` may work incorrectly
or cause a segmentation fault when used with a device execution policy on a CPU device. To avoid this issue, pass the
``-fsycl-device-code-split=per_kernel`` option to the compiler or use Intel® oneAPI DPC++/C++ Compiler version 2025.1
or newer.
* ``esimd::radix_sort`` and ``esimd::radix_sort_by_key`` kernel templates fail to compile when a program
is built with ``-g``, ``-O0``, ``-O1`` compiler options and a Linux General Purpose Intel GPUs Driver version older
than ``2423.32`` (Rolling) and ``2350.61`` (LTS) is used.
See the `Release Types <https://dgpu-docs.intel.com/releases/releases.html>`_
to find information about the relevant Rolling and LTS releases.

.. _`SYCL Specification`: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
62 changes: 62 additions & 0 deletions documentation/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,68 @@ The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C
and provides high-productivity APIs aimed to minimize programming efforts of C++ developers
creating efficient heterogeneous applications.

New in 2022.8.0
===============

New Features
------------
- Added support of host policies for ``histogram`` algorithms.
- Added support for an undersized output range in the range-based ``merge`` algorithm.
- Improved performance of the ``merge`` and sorting algorithms
(``sort``, ``stable_sort``, ``sort_by_key``, ``stable_sort_by_key``) that rely on Merge sort [#fnote1]_,
with device policies for large data sizes.
- Improved performance of ``copy``, ``fill``, ``for_each``, ``replace``, ``reverse``, ``rotate``, ``transform`` and 30+
other algorithms with device policies on GPUs.
- Improved oneDPL use with SYCL implementations other than Intel oneDPI DPC++/C++ compiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the compiler name: Intel® oneAPI DPC++/C++ Compiler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



Fixed Issues
------------
- Fixed an issue with ``drop_view`` in the experimental range-based API.
- Fixed compilation errors in ``find_if`` and ``find_if_not`` with device policies where the user provided predicate is
device copyable but not trivially copyable.
- Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with ``-O0`` and executed
on a GPU device.
- Fixed an issue preventing inclusion of the ``<numeric>`` header after ``<execution>`` and ``<algorithm>`` headers.
- Fixed several issues in the ``sort``, ``stable_sort``, ``sort_by_key`` and ``stable_sort_by_key`` algorithms that:

* Allows the use of non-trivially-copyable comparators.
* Eliminates duplicate kernel names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a period at the end of this bullet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Resolves incorrect results on devices with sub-group sizes smaller than four.
* Resolved synchronization errors that were seen on Intel® Arc™ B-series GPU devices.

Known Issues and Limitations
----------------------------
New in This Release
^^^^^^^^^^^^^^^^^^^
- Incorrect results may be observed when calling ``sort`` with a device policy on Intel® Arc™ Graphics 140V with data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instance of the name calls for a lowercase "g" in "graphics". Please update the name to Intel® Arc™ graphics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sizes of 4-8 million elements.
- ``sort``, ``stable_sort``, ``sort_by_key`` and ``stable_sort_by_key`` algorithms fail to compile
when using Clang 17 and earlier versions, as well as compilers based on these versions,
such as Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the (R) to ® for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver),
synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is
specified (e.g. ``-O1``) as opposed to relying on the compiler's default optimization level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change "e.g." to "for example".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


Existing Issues
^^^^^^^^^^^^^^^
See oneDPL Guide for other `restrictions and known limitations`_.

- ``histogram`` algorithm requires the output value type to be an integral type no larger than four bytes
when used with an FPGA policy.
- ``histogram`` may provide incorrect results with device policies in a program built with ``-O0`` option.
- Compilation issues may be encountered when passing zip iterators to ``exclusive_scan_by_segment`` on Windows.
- For ``transform_exclusive_scan`` and ``exclusive_scan`` to run in-place (that is, with the same data
used for both input and destination) and with an execution policy of ``unseq`` or ``par_unseq``,
it is required that the provided input and destination iterators are equality comparable.
Furthermore, the equality comparison of the input and destination iterator must evaluate to true.
If these conditions are not met, the result of these algorithm calls is undefined.
- Incorrect results may be produced by ``exclusive_scan``, ``inclusive_scan``, ``transform_exclusive_scan``,
``transform_inclusive_scan``, ``exclusive_scan_by_segment``, ``inclusive_scan_by_segment``, ``reduce_by_segment``
with ``unseq`` or ``par_unseq`` policy when compiled by Intel® oneAPI DPC++/C++ Compiler
with ``-fiopenmp``, ``-fiopenmp-simd``, ``-qopenmp``, ``-qopenmp-simd`` options on Linux.
To avoid the issue, pass ``-fopenmp`` or ``-fopenmp-simd`` option instead.

New in 2022.7.0
===============

Expand Down