-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CANN] Adapt to dynamically loadable backends mechanism #9970
[CANN] Adapt to dynamically loadable backends mechanism #9970
Conversation
…el for LM models who's type is Q4_0 class
src/llama.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the instances of GGML_USE_CANN
should be removed from this file, there are still a few remaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, I have removed all the instances of GGML_USE_CANN from llama.cpp.
ggml/src/ggml-backend.cpp
Outdated
|
||
#ifdef GGML_USE_AMX | ||
register_backend(ggml_backend_amx_reg()); | ||
#endif | ||
|
||
// TODO: kompute, cann | ||
#ifdef GGML_USE_CANN | ||
register_backend(ggml_backend_cann_reg()); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not add extra lines, use the same format as the rest of the backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, I have removed the blank lines and the formatting is consistent with the rest of the backends. Function is normal after removing.
ggml/include/ggml-cann.h
Outdated
@@ -33,6 +33,9 @@ extern "C" { | |||
* @brief Maximum number of CANN devices supported. | |||
*/ | |||
#define GGML_CANN_MAX_DEVICES 16 | |||
#define GGML_CANN_NAME "CANN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that this macro was taken from the implementation in the CUDA backend, but the reason it exists there is because the same code is used to build the ROCm and MUSA backends, so the name of the backend may change depending on the build flags, but I don't think this is necessary in the CANN backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, this macro is no longer exposed to llama. I have moved this macro to the cann implementation file.
6d36f48
to
abf5be4
Compare
The review comment has been fixed. Looks good to me. Approved |
* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request
* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request
* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request build passed
* [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request
The corresponding Feature Request PR is Feature Request: [CANN] backend adapts to llama.cpp dynamic backend loading mechanism #9862.
Function is normal:


Performance is the same with before: