[MXNET-381] Enhancement of take operator #11326

haojin2 · 2018-06-18T17:37:00Z

Description

Previously our take operator only supports axis=0 and mode = 'clip' case, this PR adds support for axis in range [-r, r-1] and an additional mode 'wrap'.

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Support for take on any dimension
New mode 'wrap' for indices
Unit tests for enhanced take operator

Comments

The legacy implementation for axis=0 and mode='clip' is still preserved to ensure there's no performance or accuracy regression after the enhancement.

haojin2 · 2018-06-18T23:13:16Z

@reminisce @piiswrong @anirudh2290 @rahul003 @eric-haibin-lin Please give a review when you have time, thanks!

zheng-da · 2018-06-19T02:43:48Z

@junrushao1994 you might want to keep an eye on this PR.

haojin2 · 2018-06-25T18:04:12Z

@piiswrong @reminisce @anirudh2290 @rahul003 ping for review

anirudh2290 · 2018-06-27T18:40:16Z

src/operator/tensor/indexing_op.h

    .set_default(0)
-    .describe("The axis of input array to be taken.");
+    .describe("The axis of input array to be taken."


numpy take currently has raise as default but it doesnt seem to be supported in mxnet currently. We can also make raise as default but it will be a breaking change. We should add an issue to add it for 2.0 release.

I totally agree, but something worth noting here is that adding 'raise' mode may impact the performance a bit as you need another kernel to check if all indices are within the legal range.

you dont need another kernel you can just do it inside the same kernel. you already have the bounds check for indices inside the Take kernel you can just maintain state of whether bound check passed or failed.

anirudh2290 · 2018-06-27T20:33:13Z

src/operator/tensor/indexing_op.h

-        oshape[i + idxshape.ndim()] = arrshape[i + 1];
+  const int actual_axis = param.axis + ((param.axis < 0) ? arrshape.ndim() : 0);
+  TShape oshape(idxshape.ndim() + arrshape.ndim() - 1);
+  for (int i = 0; i < static_cast<int>(idxshape.ndim()); ++i) {


we can use index_t here and avoid static_cast

good catch, will do.

anirudh2290 · 2018-06-27T20:49:50Z

src/operator/tensor/indexing_op.h

+  for (int i = 0; i < static_cast<int>(idxshape.ndim()); ++i) {
+    oshape[i + actual_axis] = idxshape[i];
+  }
+  for (int i = 0; i < static_cast<int>(arrshape.ndim()); i++) {


we can use index_t here and avoid static_cast

good catch, will do.

anirudh2290 · 2018-06-27T20:50:51Z

src/operator/tensor/indexing_op.h

-    }
-    for (size_t i = 0; i < arrshape.ndim() - 1; i++) {
-        oshape[i + idxshape.ndim()] = arrshape[i + 1];
+  const int actual_axis = param.axis + ((param.axis < 0) ? arrshape.ndim() : 0);


index_t here ?

anirudh2290 · 2018-06-27T20:54:35Z

src/operator/tensor/indexing_op.h

+                                  const int in_ndims, const int out_ndims, const int idx_ndims,
+                                  const int axis_dim, const int axis) {
+    // i is the global flattened index in the output
+    const int out_head_index = (axis == 0) ? 0 : (i / out_stride[axis - 1]);


IType can be used for all indexes here.

There's possibility of IType to be of a floating number type, so compiler will complain about it. That's also the reason why the legacy Map function above is also using a cast.

okay makes sense.

anirudh2290 · 2018-06-27T21:11:30Z

src/operator/tensor/indexing_op.cc

@@ -389,7 +389,7 @@ Examples::
 )code" ADD_FILELINE)


Given an input array with shape ``(d0, d1, d2)`` and indices with shape ``(i0, i1)``, the output This only holds true for axis =0 right ?

Will update that doc.

anirudh2290 · 2018-06-27T23:02:14Z

src/operator/tensor/indexing_op.h

+  MSHADOW_TYPE_SWITCH(idx.type_flag_, IType, {
+    // get size of temporary storage for sort
+    char* temp_storage_ptr = nullptr;
+    int* src_indptr_ptr = nullptr;


can we use dim_t instead of int here.

Since we're using cub DeviceHistogram for doing the histogramming of indices here we need to stick to int32, currently int32 should suffice. Or we can switch our own histogram kernel which supports all types, but that would be slower compared to cub's implementation.

anirudh2290 · 2018-06-27T23:03:56Z

src/operator/tensor/indexing_op.h

+        s, idxshape.Size(), sorted_idx_ptr, sorted_idx_ptr, static_cast<int>(arrshape[axis]));
+    }
+    Tensor<cpu, 1, int> original_idx(original_idx_ptr, Shape1(idxshape.Size()), s);
+    Tensor<cpu, 1, char> temp_storage(temp_storage_ptr, Shape1(temp_storage_bytes), s);


we can move this to the start and use temp_storage.dptr_ to reuse it and remove temp_storage_ptr.

Sorry that I did not notice this comment earlier, the tensor is purely for the SortByKey function call, so keeping declaration of it closer to the function call makes more sense.

you can also keep it at the same place. i am essentially suggesting that temp_storage_ptr seems not required and can be removed.

It's used as a shorthand for the calculated pointer within the whole workspace pool: https://github.com/apache/incubator-mxnet/pull/11326/files#diff-ed06b8d9798aca630313f2a9dd3fcd68R950

You can do the following:

Tensor<cpu, 1, char> temp_storage(workspace.dptr_ + 2 * original_idx_bytes + src_indptr_bytes, Shape1(temp_storage_bytes), s);

and use temp_storage or temp_storage.dptr_ for the pointer.

anirudh2290 · 2018-06-27T23:11:01Z

src/operator/tensor/indexing_op.h

    });
  });
 }

+#ifdef __CUDACC__


can this gpu specific code be moved to cuh file.

I would like to re-use the kernel here, if I move this to cuh and the cpu compiler will not see that kernel.

szha · 2018-06-30T19:00:23Z

src/operator/tensor/indexing_op.cc

-   - `axis`- Only slicing along axis 0 is supported for now.
-   - `mode`- Only `clip` mode is supported for now.
+   - `axis`- Could be from -r to r-1 where r is the rank of input tensor
+   - `mode`- Could be either `clip` or `wrap`.


You can move this explanation to the respective arguments and delete the note.

szha · 2018-06-30T19:04:24Z

src/operator/tensor/indexing_op.cc

-   - `axis`- Only slicing along axis 0 is supported for now.
-   - `mode`- Only `clip` mode is supported for now.
+   - `axis`- Could be from -r to r-1 where r is the rank of input tensor
+   - `mode`- Could be either `clip` or `wrap`.

 Examples::


Example format needs fixing. http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-11326/9/api/python/ndarray/ndarray.html#mxnet.ndarray.take

Due to lack of extra blank lines, fixed.

haojin2 · 2018-07-03T16:58:36Z

@piiswrong @reminisce please give a review when you have time, thanks!

reminisce · 2018-07-04T04:06:52Z

src/operator/tensor/indexing_op-inl.cuh

@@ -272,7 +274,7 @@ inline void AddTakeGradLargeBatch(mshadow::Tensor<gpu, 2, DType> dst,
                                  const mshadow::Tensor<gpu, 1, IndexType>& sorted,
                                  const mshadow::Tensor<gpu, 1, IndexType>& index,
                                  const mshadow::Tensor<gpu, 2, DType> &src,
-                                  mshadow::Tensor<gpu, 1, char>* workspace) {
+                                  mshadow::Tensor<gpu, 1, char>* workspace = NULL) {


nit: NULL -> nullptr. NULL has more semantic meanings than nullptr and should be deprecated in C++11.

will change.

haojin2 · 2018-07-11T00:29:51Z

@piiswrong Please give a review once you have a minute.

* take forward for any axis with enhanced test * general take backward on gpu * backward of enhanced take op

haojin2 changed the title ~~[MXNET-381] Enhancement of take operator~~ [MXNET-381] [WIP] [DO NOT MERGE] [DO NOT REVIEW] Enhancement of take operator Jun 18, 2018

haojin2 force-pushed the take_op_enhance branch from 9f8f522 to b386584 Compare June 18, 2018 23:12

haojin2 changed the title ~~[MXNET-381] [WIP] [DO NOT MERGE] [DO NOT REVIEW] Enhancement of take operator~~ [MXNET-381] Enhancement of take operator Jun 18, 2018

haojin2 force-pushed the take_op_enhance branch 2 times, most recently from 28b8007 to a6ed57c Compare June 18, 2018 23:30

junrushao mentioned this pull request Jun 21, 2018

While loop zheng-da/incubator-mxnet#32

Closed

haojin2 force-pushed the take_op_enhance branch 2 times, most recently from 229897a to b4b5af3 Compare June 27, 2018 17:36

anirudh2290 reviewed Jun 27, 2018

View reviewed changes

haojin2 force-pushed the take_op_enhance branch 2 times, most recently from d49079e to 5df48b2 Compare June 28, 2018 04:51

sxjscience mentioned this pull request Jun 28, 2018

fix beam search script dmlc/gluon-nlp#175

Merged

5 tasks

haojin2 force-pushed the take_op_enhance branch from 5df48b2 to fdd788f Compare June 29, 2018 22:40

szha reviewed Jun 30, 2018

View reviewed changes

haojin2 force-pushed the take_op_enhance branch 2 times, most recently from 27feafd to e8e51d7 Compare July 2, 2018 21:50

haojin2 force-pushed the take_op_enhance branch from e8e51d7 to 9bea2e1 Compare July 5, 2018 18:58

reminisce approved these changes Jul 6, 2018

View reviewed changes

haojin2 mentioned this pull request Jul 11, 2018

'Take' operator should support non-zero axis values #10755

Closed

haojin2 force-pushed the take_op_enhance branch from 9bea2e1 to 422c440 Compare July 12, 2018 21:49

piiswrong approved these changes Jul 12, 2018

View reviewed changes

Hao Jin added 3 commits July 12, 2018 23:40

take forward for any axis with enhanced test

b829913

general take backward on gpu

41be3c1

backward of enhanced take op

5268cf2

haojin2 force-pushed the take_op_enhance branch from 422c440 to 5268cf2 Compare July 12, 2018 23:40

eric-haibin-lin merged commit 3051c49 into apache:master Jul 17, 2018

KellenSunderland pushed a commit to KellenSunderland/incubator-mxnet that referenced this pull request Jul 19, 2018

[MXNET-381] Enhancement of take operator (apache#11326)

d3e380c

* take forward for any axis with enhanced test * general take backward on gpu * backward of enhanced take op

haojin2 deleted the take_op_enhance branch July 19, 2018 20:12

KellenSunderland pushed a commit to KellenSunderland/incubator-mxnet that referenced this pull request Jul 21, 2018

[MXNET-381] Enhancement of take operator (apache#11326)

25b149f

* take forward for any axis with enhanced test * general take backward on gpu * backward of enhanced take op

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018

[MXNET-381] Enhancement of take operator (apache#11326)

2193715

* take forward for any axis with enhanced test * general take backward on gpu * backward of enhanced take op

[MXNET-381] Enhancement of take operator #11326

[MXNET-381] Enhancement of take operator #11326

Conversation

haojin2 commented Jun 18, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

haojin2 commented Jun 18, 2018

zheng-da commented Jun 19, 2018

haojin2 commented Jun 25, 2018

anirudh2290 Jun 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haojin2 commented Jul 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haojin2 commented Jul 11, 2018

haojin2 commented Jun 18, 2018 •

edited

Loading

anirudh2290 Jun 27, 2018 •

edited

Loading