Skip to content


Introduces experimental kernels folder
Browse files Browse the repository at this point in the history
This commit introduces the `experimental/` folder for kernels. This
folder will contain alternative, experimental implementations of our
kernels. These implementations are using our kernel catalog feature from

While being experimental, the kernels are not tightly coupled to
DAPHNE, e.g, they are not used per default without specifying the
required kernel hint to call the kernel and they are not tested as part
of our test suite. Once an experimental kernel is to be moved to our
default kernels, passing all the tests is required.

Making kernels executable when running DAPHNE while not yet being part
of our kernel library helps the development when improving kernels.

- Dependencies can be used while being experimental without having to
  worry about making them a requirement for all DAPHNE users
- Prototyping and development speed is increased, as only the kernel has
  to be recompiled this is drasticly faster than having to compile the
  `` each time
- Provides a playing ground for developers to try out alternative
- Makes it easy to benchmark and compare different implementations of
  the same kernel by using different kernel hints

The `gemv/` folder contains a simple example of such an alternative
implementation. It is using AVX2 instructions to implement the SpMV
kernel, making it unsuitable to be the default kernel implementation as
it requires hardware-specific instructions. Additionally, it uses the
LIKWID library to benchmark CPU performance counters (similar to PAPI).
Without bringing these dependency to all DAPHNE users, one can already
test the kernel, compare it with the default, run benchmarks with DAPHNE
using this kernel. For more information on kernel extensions see
  • Loading branch information
philipportner committed Sep 25, 2024
1 parent 73bf668 commit bfcd803
Show file tree
Hide file tree
Showing 4 changed files with 181 additions and 0 deletions.
24 changes: 24 additions & 0 deletions doc/development/
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,27 @@ It is recommended to exceptions such as `throw std::runtime_error` in a kernel
in case the code runs into an unresolvable issue. We catch these exceptions in
our surrounding code to the kernel and provide, whenever possible, additional
information about the source of the error in the DaphneDSL script.

### Experimental Kernels

As an alternative to implementing a new kernel that is directly integrated into
DAPHNE, one can also work on kernel implementations using the [kernel catalog](doc/
These should reside in [experimental/op/](src/runtime/local/kernels/experimental/op/) where `op` is
the mnemonic of the DaphneIR operation that the kernel is implementing.

Experimental kernels are not directly integrated into DAPHNE and are neither
compiled nor executed by default. They can be used to test new ideas and
provide an easier way of prototyping kernel implementations. One can easily
test multiple different implementations of the same DAPHNE kernel using a
single DaphneDSL script which calls all the kernel implementations.

There are less restrictions put on experimental kernels than on built-in
kernels, e.g., they are not tested as part of the CI pipeline. You are also
free to introduce new dependencies that are handled by the accompanying
`Makefile` or build script. Testing and dependency management will have to be
resolved before the experimental kernel is integrated into DAPHNE as a built-in

Check out [](doc/ for more information on how to
implement experimental kernels.
44 changes: 44 additions & 0 deletions src/runtime/local/kernels/experimental/gemv/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright 2024 The DAPHNE Consortium
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# See the License for the specific language governing permissions and
# limitations under the License.

CXX = g++
CXXFLAGS = -DLIKWID_PERFMON -ggdb3 -fPIC -fno-omit-frame-pointer -O3 -march=native -fopenmp -std=c++17
ASMFLAGS = -fverbose-asm -S

INCLUDES = -I/usr/local/include/ -I../../../../../

SRCS = gemv.cpp
OBJS = $(SRCS:.cpp=.o)

.PHONY: clean

all: $(TARGET)
@echo " ==> Built target $(TARGET)"

@echo " ==> COMPILING $@"
$(CXX) $(CXXFLAGS) $(INCLUDES) -shared -o $(TARGET) $(OBJS) -llikwid

asm: $(SRCS)
@echo " ==> COMPILING $@"
$(CXX) $(ASMFLAGS) $(CXXFLAGS) $(INCLUDES) -o $(SRCS:.cpp=.s) -cpp $< -llikwid

@echo " ==> COMPILING $@"
$(CXX) -c $(CXXFLAGS) $(INCLUDES) -o $@ -cpp $< -llikwid

@echo "==> CLEANING"
$(RM) *.o *.s *.so
103 changes: 103 additions & 0 deletions src/runtime/local/kernels/experimental/gemv/gemv.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#include <immintrin.h>
#include <runtime/local/datastructures/CSRMatrix.h>
#include <runtime/local/datastructures/DenseMatrix.h>
#include <unistd.h>

#include "runtime/local/datastructures/DataObjectFactory.h"

#include <iostream>
#include <stdexcept>

#include <likwid-marker.h>
#define LIKWID_MARKER_START(regionTag)
#define LIKWID_MARKER_STOP(regionTag)
#define LIKWID_MARKER_GET(regionTag, nevents, events, time, count)

class DaphneContext;

// Horizontal sum of [4 x double] __m256d
inline double hsum_double_avx2(__m256d v) {
__m128d vlow = _mm256_castpd256_pd128(v);
__m128d vhigh = _mm256_extractf128_pd(v, 1);
vlow = _mm_add_pd(vlow, vhigh);
__m128d high64 = _mm_unpackhi_pd(vlow, vlow);
return _mm_cvtsd_f64(_mm_add_sd(vlow, high64));

extern "C" {

void spmv_simd_parallel_omp(DenseMatrix<double> *&res,
const CSRMatrix<double> *lhs,
const DenseMatrix<double> *rhs, bool transa,
bool transb, DaphneContext *ctx) {
const size_t nr_lhs = lhs->getNumRows();
[[maybe_unused]] const size_t nc_lhs = lhs->getNumCols();

[[maybe_unused]] const size_t nr_rhs = rhs->getNumRows();
const size_t nc_rhs = rhs->getNumCols();

if (nc_lhs != nr_rhs) {
throw std::runtime_error(
"Gemv - #cols of mat and #rows of vec must be the same");

if (res == nullptr)
res = DataObjectFactory::create<DenseMatrix<double>>(nr_lhs, nc_rhs,

const auto *valuesRhs = rhs->getValues();
auto *valuesRes = res->getValues();
memset(valuesRes, double(0), sizeof(double) * nr_lhs * nc_rhs);

auto *row_offsets = lhs->getRowOffsets();
auto *values = lhs->getValues();
auto *col_idx = lhs->getColIdxs();

#pragma omp parallel
#pragma omp for
for (size_t row = 0; row < nr_lhs; ++row) {
double row_sum = 0;
// Initialize [4 x double] row-accumulator
__m256d row_acc = _mm256_setzero_pd();
// Iterate over non-zero elements in row
auto values_in_row = row_offsets[row + 1] - row_offsets[row];
int rounds = values_in_row / 4;
for (int i = 0; i < rounds; ++i) {
int idx = row_offsets[row] + i * 4;
// Load doubles from LHS matrix
__m256d mat_v = _mm256_loadu_pd(&values[idx]);
// Load RHS column indices
__m256i col_idxs =
_mm256_loadu_si256((const __m256i *)&col_idx[idx]);
// Gather values from RHS vector
__m256d vec_v = _mm256_i64gather_pd(valuesRhs, col_idxs, 8);
// Multiply and add to accumulator
row_acc = _mm256_fmadd_pd(mat_v, vec_v, row_acc);
// Horizontal sum of accumulator
row_sum = hsum_double_avx2(row_acc);
// Handle remaining elements
for (auto i = row_offsets[row] + rounds * 4;
i < row_offsets[row + 1]; ++i) {
row_sum += values[i] * valuesRhs[col_idx[i]];
// Store result
valuesRes[row] = row_sum;

10 changes: 10 additions & 0 deletions src/runtime/local/kernels/experimental/gemv/gemv.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"opMnemonic": "gemv",
"kernelFuncName": "spmv_simd_parallel_omp",
"resTypes": ["DenseMatrix<double>"],
"argTypes": ["CSRMatrix<double>", "DenseMatrix<double>"],
"backend": "CPP",
"libPath": ""

0 comments on commit bfcd803

Please sign in to comment.