-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml-backend : add device and backend reg interfaces #9707
Conversation
ggml/include/ggml-backend.h
Outdated
GGML_API void ggml_backend_event_wait (ggml_backend_t backend, ggml_backend_event_t event); | ||
GGML_API ggml_backend_event_t ggml_backend_event_new (ggml_backend_dev_t device); | ||
GGML_API void ggml_backend_event_free (ggml_backend_event_t event); | ||
GGML_API void ggml_backend_event_record (ggml_backend_event_t event, ggml_backend_t backend); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to pass a backend? Is ggml_backend_dev_t
not backend-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggml_backend_t
represents a stream or async queue. The events are associated with a device, but not a stream. ggml_backend_event_record
records the event on the stream represented by backend
, which should be a backend (stream) of the same device than the event. I know that this is a bit confusing at the moment, ggml_backend_t
should be renamed to something like ggml_backend_stream
, but I am afraid that it will break a lot of code.
|
||
GGML_API ggml_backend_t ggml_backend_cpu_init(void); | ||
|
||
GGML_API bool ggml_backend_is_cpu (ggml_backend_t backend); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the long-term plan to make this check against a device instead of a backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't intend to change these functions at the moment. Most of the functions that need these checks, like ggml_backend_cpu_set_n_threads
, operate on a ggml_backend_t
object, so it is still convenient to have a function to check if a ggml_backend_t
belongs to a specific backend. After all the backends have been adapted to the new interface this could be re-evaluated.
// (optional) tensor copy: dst is in the buffer, src may be in any buffer, including buffers from a different backend (return false if not supported) | ||
bool (*cpy_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * src, struct ggml_tensor * dst); | ||
// clear the entire buffer | ||
void (*clear) (ggml_backend_buffer_t buffer, uint8_t value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in essence the same functionality as memset_tensor
except at a different scope, should we be using the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't want to call this function memset
when it was added is because it does not allow specifying neither the offset or the amount of memory to clear, it always applies to the entire buffer. I believe that the name clear
makes it a bit more intuitive that the function applies to the entire buffer and is not as flexible as memset
. memset_tensor
is fine since it effectively provides the full functionality of a memset
function, although limited to tensors. Anyway I may be overthinking this, it is a rather minor distinction.
ggml_backend_registry() { | ||
#ifdef GGML_USE_CUDA | ||
register_backend(ggml_backend_cuda_reg()); | ||
#endif | ||
|
||
return ggml_backend_registry_count; | ||
} | ||
register_backend(ggml_backend_cpu_reg()); | ||
|
||
size_t ggml_backend_reg_find_by_name(const char * name) { | ||
ggml_backend_registry_init(); | ||
// TODO: sycl, metal, vulkan, kompute, cann | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a meaning behind the order of backends, e.g. the priority with which they are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions like ggml_backend_dev_by_type
choose the first device of the given type, so the order can make a difference.
Co-authored-by: Johannes Gäßler <[email protected]>
e7a6deb
to
9ade7ce
Compare
ggml-ci
9ade7ce
to
04ef648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started adapting the Metal backend to the new interfaces and everything is working out smoothly. Feel free to merge this PR at any point and in the meantime I will continue the Metal implementation in #9713.
Hi, I'm a little confused by this error message, what exactly does it mean? |
It's mostly to prevent host buffers from being used with the incorrect backend. It is not an error, it is a debug message meant to help developers understand why async uploads is not being used, in llama.cpp it shouldn't be printed unless run with |
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Co-authored-by: Johannes Gäßler <[email protected]>
Adds the backend device and backend registry interfaces. These interfaces represent an entry point to the backend, and aim to replace commonly used custom backend functions and pave the way to support dynamically loadable backends.
The backend registry interface provides a way to enumerate the devices exposed by the backend, obtain function pointers to custom backend functions, and other functionality that is common to the entire backend.
The backend device interface has functions to create backend instances and query information about the devices. Some of the functions of the backend interface have been moved to the device interface.
Currently, only the CUDA and CPU backends implement these interfaces, and support in other backends will be added progressively. During the transition period, currently existing backends that do not implement these interfaces can still be used, but eventually llama.cpp will be refactored to use the backend registry API only. Most backends already implement the functions in these interfaces, so this should only require shuffling some code around.
test-backend-ops
will stop working for backends that do not implement these interfaces.Other changes:
GGML_CALL
macro: this was added to support llamafile, but is never used within ggml. As a result, it is very hard to maintain because we don't know which functions need it, and it keeps creeping to new functions in a very inconsistent manner. Once support for loading backends dynamically is added to ggml, other projects can use this implementation rather than rolling their own.