Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set MKL_NUM_THREADS=1 at runtime? #414

Closed
bstabler opened this issue Apr 2, 2021 · 3 comments
Closed

set MKL_NUM_THREADS=1 at runtime? #414

bstabler opened this issue Apr 2, 2021 · 3 comments

Comments

@bstabler
Copy link
Contributor

bstabler commented Apr 2, 2021

Inspired by this, should we add something like the following to the asim main runner to ensure we always override the Intel MKL threading settings on Windows and when multiprocessing?

import os
os.environ['MKL_NUM_THREADS'] = '1'
@bstabler
Copy link
Contributor Author

bstabler commented Apr 6, 2021

@guyrousseau asked if the Intel MKL library underlying numpy and pandas work as well on AMD processors. Based on this Wikipedia entry and some other web searching, it may not. However, this information may be out of date as Intel, AMD, numpy/pandas, etc. are always making improvements.

@jpn--
Copy link
Member

jpn-- commented Apr 7, 2021

If we want to squash threads not just for MKL but other BLAS implementations that a use may end up with (and many other of the usual suspects), we can do the same ='1' for all these: OMP_NUM_THREADS, NUMBA_NUM_THREADS, OPENBLAS_NUM_THREADS, MKL_NUM_THREADS, VECLIB_MAXIMUM_THREADS, NUMEXPR_NUM_THREADS. We'd need to make sure we don't accidentally squash anything we need, but at first glance squashing all these threads probably won't ruin anything.

Alternatively, we can look at threadpoolctl for less heavy-handed runtime control.

@guyrousseau
Copy link

ARC completely agrees with the recommendation. Setting the maximum number of threads to align with the number of available processors will reduce overhead. While it is possible to create more threads than available processors, and at times this can be useful, there is no benefit to doing so when you have a distributed computational platform such as the ActivitySim ABM. A higher thread count could actually create CPU contention, which is then reliant on the internal CPU scheduler to ensure all threads are processed before accepting new inputs, thus slowing down the system during high computation events.

@bstabler bstabler mentioned this issue Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants