Overhead of MXNDArraySyncCopyFromCPU on osx #8112

aseyboldt · 2017-09-30T12:36:04Z

While investigating a performance issue I noticed that setting the values
of a mx.nd.NDArray is somewhat slow os osx (sierra):

import mxnet as mx
import numpy as np
import ctypes

a = mx.nd.zeros(4)
b = np.zeros(4, dtype='f')
%timeit a[:] = b
28.3 µs ± 653 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For comparison, pure numpy takes about 400ns.
Some of this seems to be python overhead (the largest ones I found were a.shape with about 2μs and a.ctypes.data_as(ctypes.c_void_p) with 4μs in a._sync_copyfrom. Most of it is on the C side however:

handle = a.handle
b_addr = b.ctypes.data_as(ctypes.c_void_p)
b_size = ctypes.c_size_t(b.size)
%timeit mx.base._LIB.MXNDArraySyncCopyFromCPU(handle, b_addr, b_size)
14.3 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

On a linux machine that same test runs in 900ns.

I am using version 0.11.1 according to mx.__version__, installed via pip install --pre mxnet-mkl.

I sampled the stack trace while MXNDArraySyncCopyFromCPU was running in a loop:

The text was updated successfully, but these errors were encountered:

sergeykolychev · 2017-10-01T01:07:05Z

@tlby something that you noticed as well

aseyboldt · 2017-10-01T19:18:56Z

Thinking a bit more about this, I am a bit confused about why there is any synchronisation at all. I'm really new to mxnet, so I might be missing something, but shouldn't the engine be able to tell if there are any outstanding operations at all? And if not, couldn't it just skip the ThreadedVar::WaitForVar call entirely? If there is nothing that might want to change any variable, then that variable in particular should be fine, right? My guess would be that this is the case most of the time when executing things synchronously.

szha added Installation Discussion Performance and removed Installation labels Oct 1, 2017

Jerryzcn mentioned this issue Aug 30, 2019

[Discussion] Overhead in MXNet Execution #14883

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

aseyboldt commented Sep 30, 2017 •

edited

Loading

sergeykolychev commented Oct 1, 2017

aseyboldt commented Oct 1, 2017 •

edited

Loading

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

Comments

aseyboldt commented Sep 30, 2017 • edited Loading

sergeykolychev commented Oct 1, 2017

aseyboldt commented Oct 1, 2017 • edited Loading

aseyboldt commented Sep 30, 2017 •

edited

Loading

aseyboldt commented Oct 1, 2017 •

edited

Loading