Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Offload] Introduce offload-tblgen and initial new API implementation #108413

Merged
merged 22 commits into from
Nov 25, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
841d002
[Offload] Introduce offload-tblgen and initial new API implementation
callumfare Sep 12, 2024
2b14244
Fix liboffload_new linking and formatting
callumfare Sep 13, 2024
bb17c4f
Refactor entry point validation so failures are visible in tracing ou…
callumfare Sep 18, 2024
2c98c64
Add Offload API unittests
callumfare Sep 26, 2024
5d92671
Implement optional error details
callumfare Oct 1, 2024
35e70fb
Check in auto-generated Offload files
callumfare Oct 1, 2024
6d9c1bf
Fix OffloadGenerate target, clang-format generated files
callumfare Oct 2, 2024
dd26654
Fix offload header install location
callumfare Oct 2, 2024
585a239
Tidy generated comments etc
callumfare Oct 2, 2024
147e39e
Rework Offload API errors
callumfare Oct 4, 2024
fb6d775
Add optional code location entry point variants
callumfare Oct 23, 2024
55740e0
Rework API to avoid multiple returns, add init/shutdown, general refa…
callumfare Oct 25, 2024
932d440
Fix style
callumfare Oct 25, 2024
87ddac3
Only check OFFLOAD_TRACE once
callumfare Oct 25, 2024
6e76a79
Add offload-tblgen tests
callumfare Oct 25, 2024
8aa2cdf
Add additional offload-tblgen tests and associated fixes
callumfare Oct 28, 2024
cc1dc59
Add version info to the API definition
callumfare Oct 28, 2024
e92a900
Fix version query; misc style fixes
callumfare Oct 30, 2024
d932cea
Change prefix from `offload` to `ol`
callumfare Oct 31, 2024
a6cff97
Rename new offload library and try to match LLVM style
callumfare Oct 31, 2024
2ec0fff
Fix missing tablegen, tidy Offload tests
callumfare Oct 31, 2024
5c87ef2
Use SmallVector where possible
callumfare Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions offload/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,9 @@ add_subdirectory(tools)
# Build target agnostic offloading library.
add_subdirectory(src)

add_subdirectory(tools/offload-tblgen)
add_subdirectory(new-api)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call this new-api.


# Add tests.
add_subdirectory(test)

Expand Down
212 changes: 212 additions & 0 deletions offload/new-api/API/APIDefs.td
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
//===-- APIDefs.td - Base definitions for Offload tablegen -*- tablegen -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file contains the class definitions used to implement the Offload API,
// as well as helper functions used to help populate relevant records.
// See offload/API/README.md for more detailed documentation.
//
//===----------------------------------------------------------------------===//

// Prefix for API naming. This could be hard-coded in the future when a value
// is agreed upon.
defvar PREFIX = "OFFLOAD";
defvar prefix = !tolower(PREFIX);

// Parameter flags
defvar PARAM_IN = 0x1;
defvar PARAM_OUT = 0x2;
defvar PARAM_OPTIONAL = 0x4;
defvar PARAM_IN_OPTIONAL = !or(PARAM_IN, PARAM_OPTIONAL);
defvar PARAM_OUT_OPTIONAL = !or(PARAM_OUT, PARAM_OPTIONAL);

// Does the type end with '_handle_t'?
class IsHandleType<string Type> {
// size("_handle_t") == 9
bit ret = !if(!lt(!size(Type), 9), 0,
!ne(!find(Type, "_handle_t", !sub(!size(Type), 9)), -1));
}

// Does the type end with '*'?
class IsPointerType<string Type> {
bit ret = !ne(!find(Type, "*", !sub(!size(Type), 1)), -1);
}

// Describes the valid range of a pointer parameter that reperesents an array
class Range<string Begin, string End> {
string begin = Begin;
string end = End;
}

// Names the parameters that indicate the type and size of the data pointed to
// by an opaque pointer parameter
class TypeInfo<string TypeEnum, string TypeSize> {
string enum = TypeEnum;
string size = TypeSize;
}

class Param<string Type, string Name, string Desc, bits<3> Flags = 0> {
string type = Type;
string name = Name;
string desc = Desc;
bits<3> flags = Flags;
Range range = Range<"", "">;
TypeInfo type_info = TypeInfo<"", "">;
bit IsHandle = IsHandleType<type>.ret;
bit IsPointer = IsPointerType<type>.ret;
}

// A parameter whose range is described by other parameters in the function.
class RangedParam<string Type, string Name, string Desc, bits<3> Flags, Range ParamRange> : Param<Type, Name, Desc, Flags> {
let range = ParamRange;
}

// A parameter (normally of type void*) which has its pointee type and size
// described by other parameters in the function.
class TypeTaggedParam<string Type, string Name, string Desc, bits<3> Flags, TypeInfo ParamTypeInfo> : Param<Type, Name, Desc, Flags> {
let type_info = ParamTypeInfo;
}

class Return<string Value, list<string> Conditions = []> {
string value = Value;
list<string> conditions = Conditions;
}

class ShouldCheckHandle<Param P> {
bit ret = !and(P.IsHandle, !eq(!and(PARAM_OPTIONAL, P.flags), 0));
}

class ShouldCheckPointer<Param P> {
bit ret = !and(P.IsPointer, !eq(!and(PARAM_OPTIONAL, P.flags), 0));
}

// For a list of returns that contains a specific return code, find and append
// new conditions to that return
class AppendConditionsToReturn<list<Return> Returns, string ReturnValue,
list<string> Conditions> {
list<Return> ret =
!foreach(Ret, Returns,
!if(!eq(Ret.value, ReturnValue),
Return<Ret.value, Ret.conditions#Conditions>, Ret));
}

// Add null handle checks to a function's return value descriptions
class AddHandleChecksToReturns<list<Param> Params, list<Return> Returns> {
list<string> handle_params =
!foreach(P, Params, !if(ShouldCheckHandle<P>.ret, P.name, ""));
list<string> handle_params_filt =
!filter(param, handle_params, !ne(param, ""));
list<string> handle_param_conds =
!foreach(handle, handle_params_filt, "`NULL == "#handle#"`");

// Does the list of returns already contain ERROR_INVALID_NULL_HANDLE?
bit returns_has_inv_handle = !foldl(
0, Returns, HasErr, Ret,
!or(HasErr, !eq(Ret.value, PREFIX#"_RESULT_ERROR_INVALID_NULL_HANDLE")));

list<Return> returns_out = !if(returns_has_inv_handle,
AppendConditionsToReturn<Returns, PREFIX # "_RESULT_ERROR_INVALID_NULL_HANDLE", handle_param_conds>.ret,
!listconcat(Returns, [Return<PREFIX # "_RESULT_ERROR_INVALID_NULL_HANDLE", handle_param_conds>])
);
}

// Add null pointer checks to a function's return value descriptions
class AddPointerChecksToReturns<list<Param> Params, list<Return> Returns> {
list<string> ptr_params =
!foreach(P, Params, !if(ShouldCheckPointer<P>.ret, P.name, ""));
list<string> ptr_params_filt = !filter(param, ptr_params, !ne(param, ""));
list<string> ptr_param_conds =
!foreach(ptr, ptr_params_filt, "`NULL == "#ptr#"`");

// Does the list of returns already contain ERROR_INVALID_NULL_POINTER?
bit returns_has_inv_ptr = !foldl(
0, Returns, HasErr, Ret,
!or(HasErr, !eq(Ret.value, PREFIX#"_RESULT_ERROR_INVALID_NULL_POINTER")));
list<Return> returns_out = !if(returns_has_inv_ptr,
AppendConditionsToReturn<Returns, PREFIX # "_RESULT_ERROR_INVALID_NULL_POINTER", ptr_param_conds>.ret,
!listconcat(Returns, [Return<PREFIX # "_RESULT_ERROR_INVALID_NULL_POINTER", ptr_param_conds>])
);
}

defvar DefaultReturns = [Return<PREFIX#"_RESULT_SUCCESS">,
Return<PREFIX#"_RESULT_ERROR_UNINITIALIZED">,
Return<PREFIX#"_RESULT_ERROR_DEVICE_LOST">];

class APIObject {
string name;
string desc;
}

class Function : APIObject {
list<Param> params;
list<Return> returns;
list<string> details = [];
list<string> analogues = [];

list<Return> returns_with_def = !listconcat(DefaultReturns, returns);
list<Return> all_returns = AddPointerChecksToReturns<params,
AddHandleChecksToReturns<params, returns_with_def>.returns_out>.returns_out;
}

class Etor<string Name, string Desc> {
string name = Name;
string desc = Desc;
string tagged_type;
}

class TaggedEtor<string Name, string Type, string Desc> : Etor<Name, Desc> {
let tagged_type = Type;
}

class Enum : APIObject {
// This refers to whether the enumerator descriptions specify a return
// type for functions where this enum may be used as an output type. If set,
// all Etor values must be TaggedEtor records
bit is_typed = 0;

list<Etor> etors = [];
}

class StructMember<string Type, string Name, string Desc> {
string type = Type;
string name = Name;
string desc = Desc;
}

defvar DefaultPropStructMembers =
[StructMember<prefix#"_structure_type_t", "stype",
"type of this structure">,
StructMember<"void*", "pNext", "pointer to extension-specific structure">];

class StructHasInheritedMembers<string BaseClass> {
bit ret = !or(!eq(BaseClass, prefix#"_base_properties_t"),
!eq(BaseClass, prefix#"_base_desc_t"));
}

class Struct : APIObject {
string base_class = "";
list<StructMember> members;
list<StructMember> all_members =
!if(StructHasInheritedMembers<base_class>.ret,
DefaultPropStructMembers, [])#members;
}

class Typedef : APIObject { string value; }

class FptrTypedef : APIObject {
list<Param> params;
list<Return> returns;
}

class Macro : APIObject {
string value;

string condition;
string alt_value;
}

class Handle : APIObject;
75 changes: 75 additions & 0 deletions offload/new-api/API/Common.td
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
def : Macro {
let name = "OFFLOAD_APICALL";
let desc = "Calling convention for all API functions";
let condition = "defined(_WIN32)";
let value = "__cdecl";
let alt_value = "";
}

def : Macro {
let name = "OFFLOAD_APIEXPORT";
let desc = "Microsoft-specific dllexport storage-class attribute";
let condition = "defined(_WIN32)";
let value = "__declspec(dllexport)";
let alt_value = "";
}

def : Macro {
let name = "OFFLOAD_DLLEXPORT";
let desc = "Microsoft-specific dllexport storage-class attribute";
let condition = "defined(_WIN32)";
let value = "__declspec(dllexport)";
}

def : Macro {
let name = "OFFLOAD_DLLEXPORT";
let desc = "GCC-specific dllexport storage-class attribute";
let condition = "__GNUC__ >= 4";
let value = "__attribute__ ((visibility (\"default\")))";
let alt_value = "";
}

def : Typedef {
let name = "offload_bool_t";
callumfare marked this conversation as resolved.
Show resolved Hide resolved
let value = "uint8_t";
let desc = "compiler-independent type";
}

def : Handle {
let name = "offload_platform_handle_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is platform what we call a plugin currently and a device a subset of that? It's tough to think how we should handle this in what's exported, since I would say that ideally the user doesn't need to care about the "plugin" at all. However, that's not true for cases where some queue is only valid with CUDA and not AMDGPU for example. We could probably make it a quality of the device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea of this is that platform is roughly equivalent to a plugin (with a big caveat*).

In SYCL, the platforms are exposed to the user (see here), so we can't hide these completely from the interface. Obviously that's not important to OpenMP, so anything omp related built on top of this wouldn't expose them, but in general we need something in the new API.

*In UR we actually split the concept of a plugin (called an adapter in UR) from the platform. This is because the OpenCL adapter can contain multiple platforms (one for every valid OpenCL driver on the system). For other adapters it's just a 1:1 relationship and not important. I didn't want to introduce this complication at this point though, especially since we don't even have an OpenCL plugin.

let desc = "Handle of a platform instance";
}

def : Handle {
let name = "offload_device_handle_t";
let desc = "Handle of platform's device object";
}

def : Handle {
let name = "offload_context_handle_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding, I think offload_ in front of everything is too verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, by the time I was done it was feeling a little unwieldy but I didn't want to rewrite everything without agreement - it would be good to make a decision as a group on this. ol_ or offl_ might be better choices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think there was some desire for every API function to accept some identifier information, for use with debug printing. I'm wondering if we should have some _ident or something suffix and #define the normal API entrypoints to be those with a nullptr argument.

let desc = "Handle of context object";
}

def : Enum {
let name = "offload_result_t";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked a lot about error codes, where the returned value is actually an index into some table (zero is success / invalid index). The actual error code could be queried from that index or something. This is mostly because we want the library to provide more helpful error codes than generic "X went wrong" since we already do a lot of stuff like return Error::("Something called %s failed on device %d", name, device) instead of just returning failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely possible, but I have some concerns about that being the default path for returning errors.

With SYCL we programmatically deal with error codes, converting them to appropriate SYCL exceptions etc. See here for an example. Having errors just be strings makes this more difficult to handle, and liable to be unintentionally broken.

Could we have a specific error condition (e.g. RESULT_ERROR_CUSTOM) that represents a custom error state that can then be queried with a function like offloadGetLastCustomError? We have something similar in UR (UR_RESULT_ERROR_ADAPTER_SPECIFIC). Rather than using an index into a table of strings, we have a thread_local buffer for a single error message, and it's the caller's responsibility to query the contents of it after the failing API call (subsequent calls may overwrite it). This design avoids complications with the lifetime of the error strings.

Another possibility is some kind of logging callback where Offload can write out additional error/debug information for the language runtime to handle in whatever way is suitable. That would allow an enum to be used as the normal return code, while additional more descriptive information is optionally available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors are not strings, the errors are indices into a pair of {error_code, error_message} in a table. APIs like CUDA and HSA already have stuff like hsa_status_string(EC) to get the string error message. What we suggested was needing something like hsa_status_code(IDX).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, yeah that makes more sense, thanks!

Would the table contain one or more errors for the last API call or is the idea that the table can be queried for any previous API call that returned an error? If it's the latter then I'm still a bit unsure about how that would work in terms of lifetime; seems like they would just pile up indefinitely since you can never safely clear them. Maybe that's not a big deal in practice but it makes me a bit worried about long-running programs. That's why we made the equivalent in UR thread-local and mandated checking the error message before making another API call on the same thread, since the implementation can safely clear the existing message(s) when setting a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also do pointers or something, null is no error. That might make it easier to implement the "table" as a thread safe data structure (Since otherwise an index could never be reused).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now implemented the errors as discussed in previous calls. Success is a nullptr, an error is a pointer to a struct with an error code and an optional string.

let desc = "Defines Return/Error codes";
let etors =[
Etor<"SUCCESS", "Success">,
Etor<"ERROR_INVALID_VALUE", "Invalid Value">,
Etor<"ERROR_INVALID_PLATFORM", "Invalid platform">,
Etor<"ERROR_DEVICE_NOT_FOUND", "Device not found">,
Etor<"ERROR_INVALID_DEVICE", "Invalid device">,
Etor<"ERROR_DEVICE_LOST", "Device hung, reset, was removed, or driver update occurred">,
Etor<"ERROR_UNINITIALIZED", "plugin is not initialized or specific entry-point is not implemented">,
Etor<"ERROR_OUT_OF_RESOURCES", "Out of resources">,
Etor<"ERROR_UNSUPPORTED_VERSION", "[Validation] generic error code for unsupported versions">,
Etor<"ERROR_UNSUPPORTED_FEATURE", "[Validation] generic error code for unsupported features">,
Etor<"ERROR_INVALID_ARGUMENT", "[Validation] generic error code for invalid arguments">,
Etor<"ERROR_INVALID_NULL_HANDLE", "[Validation] handle argument is not valid">,
Etor<"ERROR_INVALID_NULL_POINTER", "[Validation] pointer argument may not be nullptr">,
Etor<"ERROR_INVALID_SIZE", "[Validation] invalid size or dimensions (e.g., must not be zero, or is out of bounds)">,
Etor<"ERROR_INVALID_ENUMERATION", "[Validation] enumerator argument is not valid">,
Etor<"ERROR_UNSUPPORTED_ENUMERATION", "[Validation] enumerator argument is not supported by the device">,
Etor<"ERROR_UNKNOWN", "Unknown or internal error">
];
}
Loading
Loading