You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
For comparison, pure numpy takes about 400ns.
Some of this seems to be python overhead (the largest ones I found were a.shape with about 2μs and a.ctypes.data_as(ctypes.c_void_p) with 4μs in a._sync_copyfrom. Most of it is on the C side however:
Thinking a bit more about this, I am a bit confused about why there is any synchronisation at all. I'm really new to mxnet, so I might be missing something, but shouldn't the engine be able to tell if there are any outstanding operations at all? And if not, couldn't it just skip the ThreadedVar::WaitForVar call entirely? If there is nothing that might want to change any variable, then that variable in particular should be fine, right? My guess would be that this is the case most of the time when executing things synchronously.
While investigating a performance issue I noticed that setting the values
of a
mx.nd.NDArray
is somewhat slow os osx (sierra):For comparison, pure numpy takes about 400ns.
Some of this seems to be python overhead (the largest ones I found were
a.shape
with about 2μs anda.ctypes.data_as(ctypes.c_void_p)
with 4μs ina._sync_copyfrom
. Most of it is on the C side however:On a linux machine that same test runs in 900ns.
I am using version
0.11.1
according tomx.__version__
, installed viapip install --pre mxnet-mkl
.I sampled the stack trace while
![image](https://user-images.githubusercontent.com/1882397/31045828-67a8aa42-a5ec-11e7-8465-c5711749f55c.png)
MXNDArraySyncCopyFromCPU
was running in a loop:The text was updated successfully, but these errors were encountered: