[DAPHNE-455] Introducing a kernel catalog for the DAPHNE compiler.

- The DAPHNE compiler usually lowers most domain-specific operations to calls to pre-compiled kernels. - So far, the DAPHNE compiler did not know which kernel instantiations are available in pre-compiled form. - Instead, it generated the expected function name of a kernel based on the DaphneIR operation's mnenomic, its result/argument types, and the processing backend (e.g., CPP or CUDA). - If the expected kernel was not available, an error of the form "JIT session error: Symbols not found: ..." occurred during LLVM JIT compilation. - This commit introduces a kernel catalog that informs the DAPHNE compiler about the available pre-compiled kernels. - The kernel catalog stores a mapping from DaphneIR ops (represented by their mnemonic) to information on kernels registered for the op. - The information stored for each kernel comprises: the name of the pre-compiled C/C++ function, the result/argument types, the processing backend (e.g., CPP or CUDA) - The kernel catalog provides methods for registering a kernel, retrieving the registered kernels for a specific op, and for dumping the catalog. - The kernel catalog is stored inside the DaphneUserConfig. - Makes sense since users will be able to configure the available kernels in the future. - That way, the kernel catalog is accessible in all parts of the DAPHNE compiler and runtime. - The information on the available kernels is stored in a JSON file named catalog.json (or CUDAcatalog.json). - Currently, catalog.json is generated by genKernelInst.py; thus, the system has access to the same kernel specializations as before. - catalog.json is read at DAPHNE system start-up in the coordinator and distributed workers. - Added a parser for the kernel catalog JSON file. - RewriteToCallKernelOpPass uses the kernel catalog to obtain the kernel function name for an operation, instead of relying on a naming convention. - However, there are still a few points where kernel function names are built by convention (to be addressed later): - lowering of DistributedPipelineOp in RewriteToCallKernelOpPass - lowering of MapOp in LowerToLLVMPass - lowering of VectorizedPipelineOp in LowerToLLVMPass - Directly related misc changes - DaphneIrExecutor has getters for its DaphneUserConfig. - CompilerUtils::mlirTypeToCppTypeName() allows generating either underscores (as before) or angle brackets (new) for template parameters. - This is a first step towards extensibility w.r.t. the kernels, for now the main contribution is the representation of the available kernels in a data structure (the kernel catalog). - Contributes to #455, but doesn't close it yet.
daphne-eu · Apr 4, 2024 · a7df434 · a7df434
1 parent b3ab575
commit a7df434
Show file tree

Hide file tree

Showing 20 changed files with 688 additions and 118 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -183,6 +183,7 @@ add_subdirectory(src/compiler/inference)
 add_subdirectory(src/compiler/lowering)
 add_subdirectory(src/compiler/utils)
 add_subdirectory(src/parser)
+add_subdirectory(src/parser/catalog)
 add_subdirectory(src/parser/config)
 add_subdirectory(src/parser/metadata)
 add_subdirectory(src/runtime/distributed/proto)

diff --git a/src/api/cli/DaphneUserConfig.h b/src/api/cli/DaphneUserConfig.h
@@ -18,6 +18,7 @@
 #pragma once
 
 #include <api/daphnelib/DaphneLibResult.h>
+#include <compiler/catalog/KernelCatalog.h>
 #include <runtime/local/vectorized/LoadPartitioningDefs.h>
 #include <runtime/local/datastructures/IAllocationDescriptor.h>
 #include <util/LogConfig.h>
@@ -101,4 +102,6 @@ struct DaphneUserConfig {
     // TODO Maybe the DaphneLib result should better reside in the DaphneContext,
     // but having it here is simpler for now.
     DaphneLibResult* result_struct = nullptr;
+
+    KernelCatalog kernelCatalog;
 };
diff --git a/src/api/internal/CMakeLists.txt b/src/api/internal/CMakeLists.txt
@@ -26,6 +26,7 @@ set(LIBS
         DaphneDSLParser
         DaphneIrExecutor
         DaphneConfigParser
+        DaphneCatalogParser
         DaphneMetaDataParser
         Util
         WorkerImpl

diff --git a/src/api/internal/daphne_internal.cpp b/src/api/internal/daphne_internal.cpp
@@ -25,6 +25,7 @@
 #include <parser/daphnedsl/DaphneDSLParser.h>
 #include "compiler/execution/DaphneIrExecutor.h"
 #include <runtime/local/vectorized/LoadPartitioning.h>
+#include <parser/catalog/KernelCatalogParser.h>
 #include <parser/config/ConfigParser.h>
 #include <util/DaphneLogger.h>
 
@@ -43,8 +44,11 @@
 #include <iostream>
 #include <string>
 #include <unordered_map>
+
 #include <csignal>
 #include <csetjmp>
+#include <cstdlib>
+#include <cstring>
 
 // global logger handle for this executable
 static std::unique_ptr<DaphneLogger> logger;
@@ -505,18 +509,35 @@ int startDAPHNE(int argc, const char** argv, DaphneLibResult* daphneLibRes, int
     }
 
     // ************************************************************************
-    // Parse, compile and execute DaphneDSL script
+    // Create DaphneIrExecutor and get MLIR context
     // ************************************************************************
 
-    clock::time_point tpBegPars = clock::now();
-
     // Creates an MLIR context and loads the required MLIR dialects.
     DaphneIrExecutor executor(selectMatrixRepr, user_config);
+    mlir::MLIRContext * mctx = executor.getContext();
+
+    // ************************************************************************
+    // Populate kernel extension catalog
+    // ************************************************************************
+
+    KernelCatalog & kc = executor.getUserConfig().kernelCatalog;
+    // kc.dump();
+    KernelCatalogParser kcp(mctx);
+    kcp.parseKernelCatalog("build/src/runtime/local/kernels/catalog.json", kc);
+    if(user_config.use_cuda)
+        kcp.parseKernelCatalog("build/src/runtime/local/kernels/CUDAcatalog.json", kc);
+    // kc.dump();
+
+    // ************************************************************************
+    // Parse, compile and execute DaphneDSL script
+    // ************************************************************************
+
+    clock::time_point tpBegPars = clock::now();
 
     // Create an OpBuilder and an MLIR module and set the builder's insertion
     // point to the module's body, such that subsequently created DaphneIR
     // operations are inserted into the module.
-    OpBuilder builder(executor.getContext());
+    OpBuilder builder(mctx);
     auto loc = mlir::FileLineColLoc::get(builder.getStringAttr(inputFile), 0, 0);
     auto moduleOp = ModuleOp::create(loc);
     auto * body = moduleOp.getBody();

diff --git a/src/compiler/catalog/KernelCatalog.h b/src/compiler/catalog/KernelCatalog.h
@@ -0,0 +1,163 @@
+/*
+ *  Copyright 2023 The DAPHNE Consortium
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+
+#pragma once
+
+#include <compiler/utils/TypePrinting.h>
+
+#include <mlir/IR/Types.h>
+
+#include <iostream>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+/**
+ * @brief Stores information on a single kernel.
+ */
+struct KernelInfo {
+    /**
+     * @brief The name of the pre-compiled kernel function.
+     */
+    const std::string kernelFuncName;
+
+    // TODO Add the path to the shared library containing the kernel function.
+
+    /**
+     * @brief The kernel's result types.
+     */
+    const std::vector<mlir::Type> resTypes;
+
+    /**
+     * @brief The kernel's argument types.
+     */
+    const std::vector<mlir::Type> argTypes;
+
+    // TODO Maybe unify this with ALLOCATION_TYPE.
+    /**
+     * @brief The targeted backend (e.g., hardware accelerator).
+     */
+    const std::string backend;
+
+    KernelInfo(
+        const std::string kernelFuncName,
+        const std::vector<mlir::Type> resTypes,
+        const std::vector<mlir::Type> argTypes,
+        const std::string backend
+        // TODO Add the path to the shared library containing the kernel function.
+    ) :
+        kernelFuncName(kernelFuncName), resTypes(resTypes), argTypes(argTypes), backend(backend)
+    {
+        //
+    }
+};
+
+/**
+ * @brief Stores information on kernels registered in the DAPHNE compiler.
+ */
+class KernelCatalog {
+    /**
+     * @brief The central data structure mapping DaphneIR operations to registered kernels.
+     * 
+     * The DaphneIR operation is represented by its mnemonic. The kernels are represented
+     * by their kernel information.
+     */
+    std::unordered_map<std::string, std::vector<KernelInfo>> kernelInfosByOp;
+
+    /**
+     * @brief Prints the given kernel information.
+     * 
+     * @param opMnemonic The mnemonic of the corresponding DaphneIR operation.
+     * @param kernelInfos The kernel information to print.
+     * @param os The stream to print to. Defaults to `std::cerr`.
+     */
+    void dumpKernelInfos(const std::string & opMnemonic, const std::vector<KernelInfo> & kernelInfos, std::ostream & os = std::cerr) const {
+        os << "- operation `" << opMnemonic << "` (" << kernelInfos.size() << " kernels)" << std::endl;
+        for(KernelInfo ki : kernelInfos) {
+            os << "  - kernel `" << ki.kernelFuncName << "`: (";
+            for(size_t i = 0; i < ki.argTypes.size(); i++) {
+                os << ki.argTypes[i];
+                if(i < ki.argTypes.size() - 1)
+                    os << ", ";
+            }
+            os << ") -> (";
+            for(size_t i = 0; i < ki.resTypes.size(); i++) {
+                os << ki.resTypes[i];
+                if(i < ki.resTypes.size() - 1)
+                    os << ", ";
+            }
+            os << ") for backend `" << ki.backend  << '`' << std::endl;
+        }
+    }
+
+public:
+    /**
+     * @brief Registers the given kernel information as a kernel for the DaphneIR
+     * operation with the given mnemonic.
+     * 
+     * @param opMnemonic The DaphneIR operation's mnemonic.
+     * @param kernelInfo The information on the kernel.
+     */
+    void registerKernel(std::string opMnemonic, KernelInfo kernelInfo) {
+        kernelInfosByOp[opMnemonic].push_back(kernelInfo);
+    }
+
+    /**
+     * @brief Retrieves information on all kernels registered for the given DaphneIR operation.
+     * 
+     * @param opMnemonic The mnemonic of the DaphneIR operation.
+     * @return A vector of kernel information, or an empty vector if no kernels are registered
+     * for the given operation.
+     */
+    const std::vector<KernelInfo> getKernelInfosByOp(const std::string & opMnemonic) const {
+        auto it = kernelInfosByOp.find(opMnemonic);
+        if(it != kernelInfosByOp.end())
+            return it->second;
+        else
+            return {};
+    }
+
+    /**
+     * @brief Prints high-level statistics on the kernel catalog.
+     * 
+     * @param os The stream to print to. Defaults to `std::cerr`.
+     */
+    void stats(std::ostream & os = std::cerr) const {
+        const size_t numOps = kernelInfosByOp.size();
+        size_t numKernels = 0;
+        for(auto it = kernelInfosByOp.begin(); it != kernelInfosByOp.end(); it++)
+            numKernels += it->second.size();
+        os << "KernelCatalog (" << numOps << " ops, " << numKernels << " kernels)" << std::endl;
+    }
+
+    /**
+     * @brief Prints this kernel catalog.
+     * 
+     * @param opMnemonic If an empty string, print registered kernels for all DaphneIR
+     * operations; otherwise, consider only the specified DaphneIR operation.
+     * @param os The stream to print to. Defaults to `std::cerr`.
+     */
+    void dump(std::string opMnemonic = "", std::ostream & os = std::cerr) const {
+        stats(os);
+        if(opMnemonic.empty())
+            // Print info on all ops.
+            for(auto it = kernelInfosByOp.begin(); it != kernelInfosByOp.end(); it++)
+                dumpKernelInfos(it->first, it->second, os);
+        else
+            // Print info on specified op only.
+            dumpKernelInfos(opMnemonic, getKernelInfosByOp(opMnemonic), os);
+    }
+};
diff --git a/src/compiler/execution/DaphneIrExecutor.cpp b/src/compiler/execution/DaphneIrExecutor.cpp
@@ -213,7 +213,7 @@ bool DaphneIrExecutor::runPasses(mlir::ModuleOp module) {
             "IR after managing object references:"));
 
     pm.addNestedPass<mlir::func::FuncOp>(
-        mlir::daphne::createRewriteToCallKernelOpPass());
+        mlir::daphne::createRewriteToCallKernelOpPass(userConfig_));
     if (userConfig_.explain_kernels)
         pm.addPass(
             mlir::daphne::createPrintIRPass("IR after kernel lowering:"));

diff --git a/src/compiler/execution/DaphneIrExecutor.h b/src/compiler/execution/DaphneIrExecutor.h
@@ -31,6 +31,15 @@ class DaphneIrExecutor
 
     mlir::MLIRContext *getContext()
     { return &context_; }
+
+    DaphneUserConfig & getUserConfig() {
+        return userConfig_;
+    }
+
+    const DaphneUserConfig & getUserConfig() const {
+        return userConfig_;
+    }
+
 private:
     mlir::MLIRContext context_;
     DaphneUserConfig userConfig_;

diff --git a/src/compiler/lowering/LowerToLLVMPass.cpp b/src/compiler/lowering/LowerToLLVMPass.cpp
@@ -526,10 +526,10 @@ class MapOpLowering : public OpConversionPattern<daphne::MapOp>
         callee << '_' << op->getName().stripDialect().str();
 
         // Result Matrix
-        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(op.getType());
+        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(op.getType(), false);
 
         // Input Matrix
-        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(op.getArg().getType());
+        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(op.getArg().getType(), false);
 
         // Pointer to UDF 
         callee << "__void";
@@ -740,7 +740,7 @@ class VectorizedPipelineOpLowering : public OpConversionPattern<daphne::Vectoriz
                     );
 
             // Append the name of the common type of all results to the kernel name.
-            callee << "__" << CompilerUtils::mlirTypeToCppTypeName(resultTypes[0]) << "_variadic__size_t";
+            callee << "__" << CompilerUtils::mlirTypeToCppTypeName(resultTypes[0], false) << "_variadic__size_t";
         }
 
         mlir::Type operandType;
@@ -776,7 +776,7 @@ class VectorizedPipelineOpLowering : public OpConversionPattern<daphne::Vectoriz
             daphne::VariadicPackType::get(rewriter.getContext(), rewriter.getI1Type()),
             attrNumInputs);
         // For inputs and numInputs.
-        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(operandType, true);
+        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(operandType, false, true);
         callee << "_variadic__size_t";
         auto vpInputs = rewriter.create<daphne::CreateVariadicPackOp>(loc,
             daphne::VariadicPackType::get(rewriter.getContext(), operandType),
@@ -805,11 +805,11 @@ class VectorizedPipelineOpLowering : public OpConversionPattern<daphne::Vectoriz
 
         auto numOutputs = op.getNumResults();
         // Variadic num rows operands.
-        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(rewriter.getIntegerType(64, true));
+        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(rewriter.getIntegerType(64, true), false);
         auto rowsOperands = adaptor.getOperands().drop_front(numDataOperands);
         newOperands
             .push_back(convertToArray(loc, rewriter, rewriter.getI64Type(), rowsOperands.take_front(numOutputs)));
-        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(rewriter.getIntegerType(64, true));
+        callee << "__" << CompilerUtils::mlirTypeToCppTypeName(rewriter.getIntegerType(64, true), false);
         auto colsOperands = rowsOperands.drop_front(numOutputs);
         newOperands.push_back(convertToArray(loc, rewriter, rewriter.getI64Type(), colsOperands.take_front(numOutputs)));