- * Uses a templated backend selector APIs, where the template parameters specify the backends
- * Application is linked with the required oneMKL backend wrapper libraries
- * `$> clang++ -fsycl –I$ONEMKL/include app.cpp`
- * `$> clang++ -fsycl app.o –L$ONEMKL/lib –lonemkl_blas_mklcpu –lonemkl_blas_cublas`
-
-
-
-
-
- ## Exercise
- * Objectives: Learn to use oneMKL GEMM buffer, USM APIs
- * What is provided:
- ** Boiler plate-code provided (a) to perform GEMM on CPU, (b) Helper function to verify results from oneMKL APIs and CPU
- ** Please complete the TODO tasks marked in the `source_*.cpp`.
- ** Refer to the solutions at `solution_*.cpp`
-
-
-
-
-
-
-
-
-
-
diff --git a/Lesson_Materials/common-revealjs/images/oneAPI.png b/Lesson_Materials/common-revealjs/images/oneAPI.png
new file mode 100644
index 00000000..e822aab9
Binary files /dev/null and b/Lesson_Materials/common-revealjs/images/oneAPI.png differ
diff --git a/Lesson_Materials/common-revealjs/images/oneMath-backends.svg b/Lesson_Materials/common-revealjs/images/oneMath-backends.svg
new file mode 100644
index 00000000..b4269c56
--- /dev/null
+++ b/Lesson_Materials/common-revealjs/images/oneMath-backends.svg
@@ -0,0 +1,1427 @@
+
+
+
\ No newline at end of file
diff --git a/Lesson_Materials/common-revealjs/images/uxl.svg b/Lesson_Materials/common-revealjs/images/uxl.svg
new file mode 100644
index 00000000..10ccd074
--- /dev/null
+++ b/Lesson_Materials/common-revealjs/images/uxl.svg
@@ -0,0 +1,205 @@
+
+
+
diff --git a/Lesson_Materials/oneMath_gemm/index.html b/Lesson_Materials/oneMath_gemm/index.html
new file mode 100644
index 00000000..44e278d7
--- /dev/null
+++ b/Lesson_Materials/oneMath_gemm/index.html
@@ -0,0 +1,208 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ oneAPI Math Library (oneMath)
+
+
+
+
+ ## Learning Objectives
+ * Learn what the oneMath is and how it works
+ * Learn how to use GEMM APIs from oneMath with both USM and buffer memory models
+
+
+
+
+ ## Do you need to write your own kernels?
+
+
+
+ * Many computationally intensive applications spend the most of their time in **common operations / algorithms**
+ * **Numerical libraries** provide reliable solutions to these common problems
+ * You can focus on solving higher-level problems instead of technical details
+ * Libraries optimised for specific hardware provide **superior performance**
+
+
+
+
+
+ ## Numerical libraries
+
+
+
+ * Common APIs like BLAS or LAPACK have multiple CPU implementations and vendor-specific GPU solutions
+ * **Intel CPU/GPU**: Intel Math Kernels Library (oneMKL)
+ * **NVIDIA GPU**: cuBLAS, cuSOLVER, cuRAND, cuFFT
+ * **AMD GPU**: rocBLAS, rocSOLVER, rocRAND, rocFFT
+ * Imagine being able to use all of them with *single source code* → **oneMath**
+
+
+
+
+
+
+ oneAPI and oneMath
+
+
+
+ * Open-source [**oneAPI**](https://oneapi.io/) project governed by the [United Acceleration (UXL) Foundation](https://uxlfoundation.org/):
+ * defines SYCL-based APIs and provides library implementations
+ * brings performance and ease of development to SYCL applications
+ * [**oneMath** specification](https://oneapi-spec.uxlfoundation.org/specifications/oneapi/latest/elements/onemath/source/):
+ * defines SYCL API for numerical computations across several domains
+ * Linear Algebra, Discrete Fourier Transforms, Random Number Generators, Statistics, Vector Math
+ * [**oneMath** library](https://github.com/uxlfoundation/oneMath):
+ * wrapper implementation dispatching SYCL API calls to a multitude of implementations, both generic and vendor-specific
+
+
+
+
+
+
+
+
+
+
+ oneMath library backends
+
+
+
+
+
+
+ #### Run-time dispatching
+
+
+
+#include <oneapi/math.hpp>
+
+sycl::queue q{myDeviceSelector};
+
+sycl::buffer<T,1> a{a_host, m*k};
+sycl::buffer<T,1> b{b_host, k*n};
+sycl::buffer<T,1> c{c_host, m*n};
+
+// Compute C = A*B+C on the device
+oneapi::math::blas::column_major::gemm(q, ..., m, n, k, ..., a, ..., b, ..., c, ... );
+
+
+
+ * Backend is loaded at run time based on the device associated with the SYCL queue
+ * Both buffer and USM APIs available (mind the different synchronisation)
+ * The same binary can run on different hardware with a generic device selector
+ * Can run on CPU or different GPUs without recompiling
+ * Link the application with the top-level runtime library: `-lonemath`
+
+
+
+
+
+ #### Compile-time dispatching
+
+
+
+#include <oneapi/math.hpp>
+
+sycl::queue cpu_queue{sycl::cpu_selector_v};
+
+sycl::buffer<T,1> a{a_host, m*k};
+sycl::buffer<T,1> b{b_host, k*n};
+sycl::buffer<T,1> c{c_host, m*n};
+
+oneapi::math::backend_selector<oneapi::math::backend::mklcpu> cpu_selector(cpu_queue);
+// Select the Intel oneMKL CPU backend specifically ^^^^^^
+
+oneapi::math::blas::column_major::gemm(cpu_selector, ..., m, n, k, ..., a, ..., b, ..., c, ... );
+
+
+
+ * Specific backend can be selected at compile-time with a `backend_selector`
+ * Passed into the API in place of the queue
+ * Reduces the small dispatching overhead at the cost of removed portability
+ * Link the application with the specific backend library: `-lonemath_blas_mklcpu`
+
+
+
+
+
+ ## Exercise
+
+
+ * Objectives: Learn to use oneMath GEMM buffer and USM APIs
+ * Boiler-plate code already provided to:
+ * Initialize matrices on host
+ * Compute reference result on host
+ * Compare the host and device results
+ * Please **complete the TODO tasks** marked in the `source_*.cpp`
+ * Create buffers or transfer data with USM
+ * Compute GEMM by calling the oneMath API
+ * Use the provided `VerifyResult` function
+ * If stuck, have a look at `solution_*.cpp`
+