Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml-backend : add device and backend reg interfaces #9707

Merged
merged 14 commits into from
Oct 2, 2024

Conversation

slaren
Copy link
Member

@slaren slaren commented Oct 1, 2024

Adds the backend device and backend registry interfaces. These interfaces represent an entry point to the backend, and aim to replace commonly used custom backend functions and pave the way to support dynamically loadable backends.

The backend registry interface provides a way to enumerate the devices exposed by the backend, obtain function pointers to custom backend functions, and other functionality that is common to the entire backend.

The backend device interface has functions to create backend instances and query information about the devices. Some of the functions of the backend interface have been moved to the device interface.

Currently, only the CUDA and CPU backends implement these interfaces, and support in other backends will be added progressively. During the transition period, currently existing backends that do not implement these interfaces can still be used, but eventually llama.cpp will be refactored to use the backend registry API only. Most backends already implement the functions in these interfaces, so this should only require shuffling some code around. test-backend-ops will stop working for backends that do not implement these interfaces.

Other changes:

  • Removes the GGML_CALL macro: this was added to support llamafile, but is never used within ggml. As a result, it is very hard to maintain because we don't know which functions need it, and it keeps creeping to new functions in a very inconsistent manner. Once support for loading backends dynamically is added to ggml, other projects can use this implementation rather than rolling their own.

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 1, 2024
GGML_API void ggml_backend_event_wait (ggml_backend_t backend, ggml_backend_event_t event);
GGML_API ggml_backend_event_t ggml_backend_event_new (ggml_backend_dev_t device);
GGML_API void ggml_backend_event_free (ggml_backend_event_t event);
GGML_API void ggml_backend_event_record (ggml_backend_event_t event, ggml_backend_t backend);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to pass a backend? Is ggml_backend_dev_t not backend-specific?

Copy link
Member Author

@slaren slaren Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ggml_backend_t represents a stream or async queue. The events are associated with a device, but not a stream. ggml_backend_event_record records the event on the stream represented by backend, which should be a backend (stream) of the same device than the event. I know that this is a bit confusing at the moment, ggml_backend_t should be renamed to something like ggml_backend_stream, but I am afraid that it will break a lot of code.


GGML_API ggml_backend_t ggml_backend_cpu_init(void);

GGML_API bool ggml_backend_is_cpu (ggml_backend_t backend);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the long-term plan to make this check against a device instead of a backend?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't intend to change these functions at the moment. Most of the functions that need these checks, like ggml_backend_cpu_set_n_threads, operate on a ggml_backend_t object, so it is still convenient to have a function to check if a ggml_backend_t belongs to a specific backend. After all the backends have been adapted to the new interface this could be re-evaluated.

// (optional) tensor copy: dst is in the buffer, src may be in any buffer, including buffers from a different backend (return false if not supported)
bool (*cpy_tensor) (ggml_backend_buffer_t buffer, const struct ggml_tensor * src, struct ggml_tensor * dst);
// clear the entire buffer
void (*clear) (ggml_backend_buffer_t buffer, uint8_t value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in essence the same functionality as memset_tensor except at a different scope, should we be using the same name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I didn't want to call this function memset when it was added is because it does not allow specifying neither the offset or the amount of memory to clear, it always applies to the entire buffer. I believe that the name clear makes it a bit more intuitive that the function applies to the entire buffer and is not as flexible as memset. memset_tensor is fine since it effectively provides the full functionality of a memset function, although limited to tensors. Anyway I may be overthinking this, it is a rather minor distinction.

Comment on lines +523 to +531
ggml_backend_registry() {
#ifdef GGML_USE_CUDA
register_backend(ggml_backend_cuda_reg());
#endif

return ggml_backend_registry_count;
}
register_backend(ggml_backend_cpu_reg());

size_t ggml_backend_reg_find_by_name(const char * name) {
ggml_backend_registry_init();
// TODO: sycl, metal, vulkan, kompute, cann
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a meaning behind the order of backends, e.g. the priority with which they are used?

Copy link
Member Author

@slaren slaren Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions like ggml_backend_dev_by_type choose the first device of the given type, so the order can make a difference.

@github-actions github-actions bot added the Kompute https://github.com/KomputeProject/kompute/ label Oct 1, 2024
@slaren slaren force-pushed the sl/backend-registry-2 branch 4 times, most recently from e7a6deb to 9ade7ce Compare October 2, 2024 00:45
@slaren slaren force-pushed the sl/backend-registry-2 branch from 9ade7ce to 04ef648 Compare October 2, 2024 00:45
@github-actions github-actions bot added script Script related devops improvements to build systems and github actions labels Oct 2, 2024
@slaren slaren marked this pull request as ready for review October 2, 2024 02:01
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started adapting the Metal backend to the new interfaces and everything is working out smoothly. Feel free to merge this PR at any point and in the meantime I will continue the Metal implementation in #9713.

@slaren slaren merged commit c83ad6d into master Oct 2, 2024
53 checks passed
@LostRuins
Copy link
Collaborator

Hi,
buffer type %s is not the default buffer type for device %s for async uploads

I'm a little confused by this error message, what exactly does it mean?

@slaren
Copy link
Member Author

slaren commented Oct 6, 2024

It's mostly to prevent host buffers from being used with the incorrect backend. It is not an error, it is a debug message meant to help developers understand why async uploads is not being used, in llama.cpp it shouldn't be printed unless run with --verbose.

@slaren slaren deleted the sl/backend-registry-2 branch October 29, 2024 11:18
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning Kompute https://github.com/KomputeProject/kompute/ Nvidia GPU Issues specific to Nvidia GPUs script Script related SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants