-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Apple Silicon Mac support #164
Conversation
Signed-off-by: Taiju Yamada <[email protected]>
Signed-off-by: Taiju Yamada <[email protected]>
Signed-off-by: Taiju Yamada <[email protected]>
Signed-off-by: Taiju Yamada <[email protected]>
Signed-off-by: Taiju Yamada <[email protected]>
Thanks for the submission. This has a lot of ifdefs for For the dispatcher it seems cleaner and easier to read as:
It would be good to test a native compile. |
crc/aarch64/crc_aarch64_dispatcher.c
Outdated
@@ -34,6 +34,8 @@ DEFINE_INTERFACE_DISPATCHER(crc16_t10dif) | |||
unsigned long auxval = getauxval(AT_HWCAP); | |||
if (auxval & HWCAP_PMULL) | |||
return PROVIDER_INFO(crc16_t10dif_pmull); | |||
#elif defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is add it like
#ifndef __MACH__
unsigned long auxval=getauxval(AT_HWCAP);
if(auxval & HWCAP_PMULL)
return PROVIDER_INFO(crc16_t10dif_pmull);
return PROVIDER_BASIC(crc16_t10dif);
#else
return PROVIDER_INFO(crc16_t10dif_pmull);
#endif
And another thing I must confirm with you . If the transparent layer can be remove , I think this file should not be compiled in Apple Silicon Mac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not be answering correct question, but removing __MACH__
causes getauxval undefined
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rewritten dispatchers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- aarch64's neon is spec
- my very first assumption was Apple would not ever sell aarch64 CPU without pmull. In this way I don't need dispatcher.
in above condition, removing 12575f5 solves "aarch64_multibinary.h issue" in workaround way.
crc/aarch64/crc_aarch64_dispatcher.c
Outdated
@@ -81,6 +87,8 @@ DEFINE_INTERFACE_DISPATCHER(crc32_iscsi) | |||
if (auxval & HWCAP_PMULL) { | |||
return PROVIDER_INFO(crc32_iscsi_refl_pmull); | |||
} | |||
#elif defined(__aarch64__) | |||
return PROVIDER_INFO(crc32_iscsi_refl_pmull); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function might not be best choice . As I know, crc32_iscsi_crc_ext or crc32_iscsi_3crc_fold should be better choice.
You can test the real performance and pick up the best one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm sorry.
Although I have asked an acquaintance of mine to test the same binary on ARM mac (it worked, so "it does support ARM mac"), my primary machine is Intel mac and my test environment is iPad (well, jailbroken and sshd enabled).
Also, at least crc32 instruction caused SIGILL on my iPad (from libslz).
The worse thing, as far as I know, there are no runtime cpu feature detection API
on Darwin. That's why I adjust the dispatcher to middle-range...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems undocumented _get_cpu_capabilities
can be used as "runtime cpu feature detection API".
Now my concern is https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms The platforms reserve register x18. Don’t use this register.
Allowing CRC32 instruction could get into this codepath...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x18
problem should be fix . I will raise an issue later.
And it looks runtime cpu feature detection
should be added in Apple. As I known , M1 (mac mini arm64 ) are available now. I guess it has more feature support.
Could you review aarch64_multibinary.h
? I am not sure if there are anything that can not match Apple spec.
crc/aarch64/crc_aarch64_dispatcher.c
Outdated
@@ -105,6 +113,8 @@ DEFINE_INTERFACE_DISPATCHER(crc32_gzip_refl) | |||
|
|||
if (auxval & HWCAP_PMULL) | |||
return PROVIDER_INFO(crc32_gzip_refl_pmull); | |||
#elif defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above comment. crc32_gzip_refl_crc_ext and crc32_gzip_refl_3crc_fold are better choice.
crc/aarch64/crc32_ieee_norm_pmull.h
Outdated
.align 4 | ||
.set .lanchor_crc_tab,. + 0 | ||
#ifndef __MACH__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it requirement from Clang? If yes , I think clang is better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems controlled by llvm::MCAsmInfo::HasDotTypeDotSizeDirective (https://llvm.org/doxygen/classllvm_1_1MCAsmInfo.html#a7c3b8692b75d4808f7c888e61f01e1c8) and it is false in Darwin (https://github.com/llvm/llvm-project/blob/release/9.x/llvm/lib/MC/MCAsmInfoDarwin.cpp#L89).
Well, if ARM Windows support will be added as well, #if !defined(__MACH__) && !defined(__WIN32__)
is more proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , you are right.
But now , there are no enough information about run time cpu feature detection
on WIN32 . That's same with Apple .
@cielavenir , Thanks your hard work. How about re-org #162 and #164 ? It looks this PR includes patches for clang . About Apple Silicon Mac support. I have some questions.
|
Signed-off-by: Taiju Yamada <[email protected]>
Signed-off-by: Taiju Yamada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the commit message .
It looks my name appear in commit message :)
And please re-org the patches . It looks some commit can not pass CI tests . |
I re-org this PR into #168 . |
WIP until aarch_multibinary.h issue is cleared. |
by the way I report here as well that reading ID_AA64ISAR0_EL1 in user mode is prohibited (causes SIGILL) on iDevice. |
Signed-off-by: Taiju Yamada <[email protected]>
@kirbyzhou @rhpvorderman checked compilation by my (intel) macbook |
checked compilation by my macbook (apple m1)
|
But in my benchmark, the isa-l version of ungzip is much slower than cloudflare version https://github.com/cloudflare/zlib
|
Under a linux arm host.
|
Any news on this? From the last couple of comments from @kirbyzhou , it sounds like this version builds & runs, but the performance seems to have a regression? |
@cdevers-es I can confirm that the performance regression that @kirbyzhou mentions also happens on an Olimex Olinuxino A64 development board. This is only for decompression (compression sitll is faster). As such it is a common aarch64 issue, not a Mac specific one. Therefore it is my opinion that this PR should be merged as soon as it is ready. The performance regression can be handled later by the people who understand the arm64 code. |
I get an illegal instruction exception on this when I cross compile when I don't on main branch. Perhaps it's breaking something in the existing dispatcher.
|
compression is faster but compression ratio is worse. |
oh this fixes the test actually diff --git a/crc/aarch64/crc16_t10dif_pmull.S b/crc/aarch64/crc16_t10dif_pmull.S
index 2ae3fb7..29af534 100644
--- a/crc/aarch64/crc16_t10dif_pmull.S
+++ b/crc/aarch64/crc16_t10dif_pmull.S
@@ -201,13 +201,7 @@ v_tmp1_x3 .req v27
q_fold_const .req q17
v_fold_const .req v17
- ldr q_fold_const, fold_constant
-
-fold_constant:
- .word 0x87e70000
- .word 0x00000000
- .word 0x371d0000
- .word 0x00000000
+ ldr q_fold_const, =0x371d00000000000087e70000;
.align 2
.crc_fold_loop: could someone tell me why (my) this fold_constant does not work? |
I'm terribly sorry, cielavenir@825d080 works. I feel so ashame. Let me check other parts soon. |
Signed-off-by: Taiju Yamada <[email protected]>
fixed crc16_t10dif_copy_pmull.S as well. It seems those two are |
checked
igzip_wrapper_hdr_test did not pass but master is similarly not passing. |
Thanks @cielavenir. I was able to rebase and I don't see any more issues so I should be able to integrate as soon as I can push through our internal CI. |
Integrated |
should be fixed by #226 |
As the further work of #162 , I was able to assemble aarch64 code for Apple Silicon Mac.
I needed slight modification but the assembled Android binary still works.
How I tested my work:
Signed-off-by: Taiju Yamada [email protected]