Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

Open
aseyboldt opened this issue Sep 30, 2017 · 2 comments
Open

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

aseyboldt opened this issue Sep 30, 2017 · 2 comments

Comments

@aseyboldt
Copy link

aseyboldt commented Sep 30, 2017

While investigating a performance issue I noticed that setting the values
of a mx.nd.NDArray is somewhat slow os osx (sierra):

import mxnet as mx
import numpy as np
import ctypes

a = mx.nd.zeros(4)
b = np.zeros(4, dtype='f')
%timeit a[:] = b
28.3 µs ± 653 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For comparison, pure numpy takes about 400ns.
Some of this seems to be python overhead (the largest ones I found were a.shape with about 2μs and a.ctypes.data_as(ctypes.c_void_p) with 4μs in a._sync_copyfrom. Most of it is on the C side however:

handle = a.handle
b_addr = b.ctypes.data_as(ctypes.c_void_p)
b_size = ctypes.c_size_t(b.size)
%timeit mx.base._LIB.MXNDArraySyncCopyFromCPU(handle, b_addr, b_size)
14.3 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

On a linux machine that same test runs in 900ns.

I am using version 0.11.1 according to mx.__version__, installed via pip install --pre mxnet-mkl.

I sampled the stack trace while MXNDArraySyncCopyFromCPU was running in a loop:
image

@sergeykolychev
Copy link
Contributor

@tlby something that you noticed as well

@aseyboldt
Copy link
Author

aseyboldt commented Oct 1, 2017

Thinking a bit more about this, I am a bit confused about why there is any synchronisation at all. I'm really new to mxnet, so I might be missing something, but shouldn't the engine be able to tell if there are any outstanding operations at all? And if not, couldn't it just skip the ThreadedVar::WaitForVar call entirely? If there is nothing that might want to change any variable, then that variable in particular should be fine, right? My guess would be that this is the case most of the time when executing things synchronously.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants