-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows ARM runner and build fixes #5979
Conversation
@@ -857,7 +857,7 @@ inline static float vaddvq_f32(float32x4_t v) { | |||
#define GGML_F16x8 float16x8_t | |||
#define GGML_F16x8_ZERO vdupq_n_f16(0.0f) | |||
#define GGML_F16x8_SET1(x) vdupq_n_f16(x) | |||
#define GGML_F16x8_LOAD(x) vld1q_f16((const __fp16 *)(x)) | |||
#define GGML_F16x8_LOAD(x) vld1q_f16((const ggml_fp16_internal_t *)(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would call vld1q_f16
with uint16_t *
argument.
Does this produce correct results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this produce correct results?
Hm, yes - no reason not to.
Wouldn't it be better if instead of introducing ggml_fp16_internal_t
, we simply change these to:
diff --git a/ggml.c b/ggml.c
index 80efa6f2..ac0d15b2 100644
--- a/ggml.c
+++ b/ggml.c
@@ -857,7 +857,7 @@ inline static float vaddvq_f32(float32x4_t v) {
#define GGML_F16x8 float16x8_t
#define GGML_F16x8_ZERO vdupq_n_f16(0.0f)
#define GGML_F16x8_SET1(x) vdupq_n_f16(x)
- #define GGML_F16x8_LOAD(x) vld1q_f16((const __fp16 *)(x))
+ #define GGML_F16x8_LOAD(x) vld1q_f16((const uint16_t *)(x))
#define GGML_F16x8_STORE vst1q_f16
#define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
#define GGML_F16x8_ADD vaddq_f16
@@ -900,7 +900,7 @@ inline static float vaddvq_f32(float32x4_t v) {
#define GGML_F32Cx4 float32x4_t
#define GGML_F32Cx4_ZERO vdupq_n_f32(0.0f)
#define GGML_F32Cx4_SET1(x) vdupq_n_f32(x)
- #define GGML_F32Cx4_LOAD(x) vcvt_f32_f16(vld1_f16((const __fp16 *)(x)))
+ #define GGML_F32Cx4_LOAD(x) vcvt_f32_f16(vld1_f16((const uint16_t *)(x)))
#define GGML_F32Cx4_STORE(x, y) vst1_f16(x, vcvt_f16_f32(y))
#define GGML_F32Cx4_FMA(a, b, c) vfmaq_f32(a, b, c)
#define GGML_F32Cx4_ADD vaddq_f32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll get some warnings (like incompatible pointer types assigning to 'const __fp16 *' from 'const uint16_t *')
from GGML_F16_VEC_LOAD
in this case.
But if it's ok, I'll commit the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it like proposed
* windows arm ci * fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64 * fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned` * fix `error C2065: '__fp16': undeclared identifier`
* windows arm ci * fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64 * fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned` * fix `error C2065: '__fp16': undeclared identifier`
* windows arm ci * fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64 * fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned` * fix `error C2065: '__fp16': undeclared identifier`
I would like to add several fixes for building the repo on Windows ARM, as well as a CI runner for automated build control.
error C2078: too many initializers with ggml_vld1q_u32
with a macro for MSVC ARM64error C2065: '__fp16': undeclared identifier
warning C4146: unary minus operator applied to unsigned type, result still unsigned