-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add loongarch lsx and lasx optimize code #6454
Conversation
@junchao-loongson Thanks for this PR. Just a heads up I will only be able to get to reviewing this after #6412 and #6414, so it can take me some time - sorry about that. In the meantime feel free to continue review with other devs |
Let's resolve the conflicts from the recent |
okay, I rebased the code. |
test ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suppose Github actions support this architecture, but if it does, it would be nice to add CI workflow
Have you done some inference/perplexity runs to make sure the generation looks find?
ggml.c
Outdated
typedef union | ||
{ | ||
int32_t i; | ||
float f; | ||
} FloatInt; | ||
/* float type data load instructions */ | ||
static __m128 __lsx_vreplfr2vr_s(float val) | ||
{ | ||
FloatInt fi_tmpval = {.f = val}; | ||
return (__m128)__lsx_vreplgr2vr_w(fi_tmpval.i); | ||
} | ||
|
||
static __m256 __lasx_xvreplfr2vr_s(float val) | ||
{ | ||
FloatInt fi_tmpval = {.f = val}; | ||
return (__m256)__lasx_xvreplgr2vr_w(fi_tmpval.i); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicate this code by moving it in ggml-impl.h
and reusing it in ggml.c
and ggml-quants.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to just deduplicate the __lsx_vreplfr2vr_s
and __lasx_xvreplfr2vr_s
code. The rest of the lsx/lasx code that is used only inside ggml-quants.c
should remain in ggml-quants.c
Btw, for long-term support it would be very useful to add CI for this arch. If there is someone who can donate a machine we can deploy |
We have loongarch architecture machines available for remote connection, can we use them as ci? |
Great! If you could spare a machine we can add it as a node to the ggml-ci fleet. Easiest way would be if you could give me SSH access so I can log and configure it. If that is possible, send me an email and we can set it up |
I apologize for the late reply. We are in the process of checking in with our colleagues who are responsible for this matter and should have it ready within the next week. |
Description
Hello, we (@lixing-star @MQ-mengqing) are the developers of the Loongson team.
We have added 128 (LSX) and 256 (LASX) vector optimization codes for the Loongarch architecture.
test-quantize-fns
benchmark
LonngArch Documents