Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DYNAMIC_ARCH support on Android #2931

Closed
ognjentodic opened this issue Oct 21, 2020 · 12 comments · Fixed by #5041
Closed

DYNAMIC_ARCH support on Android #2931

ognjentodic opened this issue Oct 21, 2020 · 12 comments · Fixed by #5041

Comments

@ognjentodic
Copy link

I've been experimenting with using DYNAMIC_ARCH support on Android (aarch64) and had some questions about this. I am using 0.3.10 release (commit 63b03ef).

On couple of devices I've been using for testing, upon program startup it seems that get_coretype method gets called and it fails to obtain core type info because

if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {

fails, so core type defaults to ARMV8; this seems to happen when the program starts (without any openblas methods being called). One of these devices is Samsung S7, another is custom hardware.

  1. Does this match other people experience? I saw some mentions on https://developer.android.com/ndk/guides/cpu-features#the_google_cpu_features_library that using AT_HWCAP may not be accurate on old devices ("for example, claim to have integer division instructions but do not"), but no mention of not being able to access this info at all.

  2. Alternative would be to use Google's cpu_features library (https://github.com/google/cpu_features) and dynamically set core type via OPENBLAS_CORETYPE env variable. But, because gotoblas is already set, calling gotoblas_dynamic_init() doesn't do anything because of this line: https://github.com/xianyi/OpenBLAS/blob/f6e4cf2f9dac6f90b36f49659d883767d63791f5/driver/others/dynamic_arm64.c#L232

Am I missing something obvious here?

Thank you,
Ogi

@martin-frbg
Copy link
Collaborator

The problem goes a little deeper, see #2715

@ognjentodic
Copy link
Author

Looks like there is some overlap, but I think the ability to set the core type externally via env variable is currently broken. I'll provide some more comments in #2715.

@martin-frbg
Copy link
Collaborator

Why do you think it is broken ? "gotoblas" points to the cpu model in use - if we find this already set, the initialization routine must already have been run (in another thread). If gotoblas is still undefined, we look in the environment if the user wants a particular model, and if that is not the case the cpu detection is begun. This works fine for the other architectures, the only thing it does not let you do is change your mind and try to switch to another cpu model when the library is already running.

@ognjentodic
Copy link
Author

In my case -- and I admit this may not be how most people use openblas, especially on other platforms -- I have a product that's an SDK that runs on Android. Environment variable for core type can only be set in my code (i.e. it cannot be set externally), and by the time I can do it, openblas had already been initialized (and due to failing to properly detect core type, it it's not setup in an optimal way).

Perhaps I am missing a part about initialization... I don't think I am doing anything explicitly to initialize openblas. If I had control over initialization, I could then set environment variable prior to init.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Oct 21, 2020

Initialization happens the moment you load the library, if that is before your own code gets a chance then I'm not sure what to do except try to find a fallback cpu detection method that works on your devices. Would /proc/cpuinfo be available ? (This is what gets parsed by the build-time check when not building DYNAMIC_ARCH, hwcap was just supposed to be more elegant and reliable)

@ognjentodic
Copy link
Author

I agree, but the problem is that at that point I cannot call gotoblas_dynamic_init(), i.e. it won't do anything.

Maybe to ask a different question: what is the use case for calling gotoblas_dynamic_init() with the current setup? Unless I am missing something, there isn't really a way to use it properly right now.

Btw, I can make the changes to my fork to address my use case; I am just trying to understand if I am doing something wrong and/or if the API could be improved.

@ognjentodic
Copy link
Author

ognjentodic commented Oct 21, 2020

To answer your other question, I assume /proc/cpuinfo would be available (I don't know for sure as my SDK is deployed on variety of Android devices)... Based on the fact that Google is using it in https://github.com/google/cpu_features I would assume so, although @yuyichao suggests alternative approach in #2715.

From my comment in #2715 it might be better not to force this to happen under the hood, or at least to allow developers to change it. With the current setup, autodetect could silently fail (and default to ARMV8), and it's not possible to change the coretype after that happens.

@martin-frbg
Copy link
Collaborator

AFAIK gotoblas_dynamic_init was never meant to be exposed, as I mentioned it is called by the constructor code when the library is loaded and so far an environment variable that gets read at that point was all that was needed (and usually only for debugging).
Do you really see (or expect) a dramatic difference in performance between ARMV8 and whatever is optimal for your target hardware ? (The S7 would be using CortexA53 or similar I guess). In any case the quickest solution is probably to copypaste the relevant parts of the "detect()" function from cpuid_arm64.c into the error path of get_coretype()

@ognjentodic
Copy link
Author

OK, thanks. The first sentence explains what I needed to know.

I am seeing quite a bit of difference... I think I'll just determine the core type in my code, for the relevant cores (to avoid the issue mentioned in #2715, where the type could be different among subsets of cores), and then use modified gotoblas_dynamic_init to set it via env variable (in conjunction with setting thread affinity).

Thanks!

@brada4
Copy link
Contributor

brada4 commented Oct 22, 2020

Works as is on android 10 (e.g termux), is there any problem on Android 11?

@martin-frbg
Copy link
Collaborator

Maybe HWCAP access is restricted on some embedded devices, or termux is emulating priviledged syscalls under the hood ? Anyway a more intelligent fallback than just assuming ARMV8 should be doable, I just do not want to experiment before 0.3.12 is out.

@martin-frbg
Copy link
Collaborator

develop branch now tries to read the MIDR from /sys/devices when the CPUID call fails. (Still not ideal as it queries only the first cpu instead of iterating over all and choosing one in a mixed configuration - I have not decided what to do with big.little setups)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants