Update Operator Implementation Tutorial (apache#12230)

* update op creation docs * add flakiness checker and link to gradient checking * address comments * update reference line number * fix comments
piyushghai · Oct 19, 2018 · 6a79f63 · 6a79f63
1 parent 21f177b
commit 6a79f63
Show file tree

Hide file tree

Showing 2 changed files with 112 additions and 55 deletions.
diff --git a/docs/faq/add_op_in_backend.md b/docs/faq/add_op_in_backend.md
@@ -55,7 +55,7 @@ Next, we are going to
 for registering `a`, `b`, and `c` in `quadratic_op-inl.h`.
 2. Define type and shape inference functions in `quadratic_op-inl.h`.
 3. Define forward and backward functions in `quadratic_op-inl.h`.
-4. Register the operator using [nnvm](https://github.com/dmlc/nnvm)
+4. Register the operator using [nnvm](https://docs.tvm.ai/dev/nnvm_overview.html)
 in `quadratic_op.cc` and `quadratic_op.cu` for
 CPU and GPU computing, respectively.
 
@@ -318,7 +318,10 @@ parallel computation on both CPU and GPU. This allows most of
 the simple operators to share the same piece of code for CPU and GPU as
 parallelization approaches are often identical on both types of devices.
 The kernel function is defined as the following, where the function
-`Map` is executed by each thread for each input element. To explain a little
+`Map` is executed by each thread for each input element. The `out_data.Size()`,
+in the `Kernel::Launch` function corresponds to the factor by which the
+workload will get parallelized among the different threads, which here
+corresponds to the size of the output array. To explain a little
 bit more on the two macros used in the kernel struct: (1) `MSHADOW_XINLINE` is
 a consolidated macro for inlining functions compiled by both CPU and GPU
 compilers. It enables CPU and GPU computing to share the same piece of code.
@@ -543,22 +546,21 @@ If you use python, when you type `import mxnet as mx`, two python
 functions for invoking your backend implementation are
 generated on the fly: one is for imperative programming
 registered as `mxnet.ndarray.quadratic` or `mx.nd.quadratic` for short;
-the other one is for symbolic
-programming registered under module `mxnet.symbol.quadratic`
-or `mx.sym.quadratic` for short.
+the other one is for symbolic programming registered under
+module `mxnet.symbol.quadratic` or `mx.sym.quadratic` for short.
 
 In order to unit test it in frontend, we need to add the following code
-to the python file `test_operator.py`. Note that while testing the
-forward pass is straightforward using `mx.nd.quadratic`, testing
-the backward involves a bit of more efforts. We create a
-`quadratic` symbol and feed it into the utility function `check_numeric_gradient`.
-The utility function will perform a perturbation on the input
-and calculate the response rate of the output using the
-[finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method).
-Then it will compare the gradient from the backward pass with
-the values from the finite difference method. The test
-will be successful once the comparison satisfies user specified
-relative and absolute thresholds.
+to the python file `test_operator.py`. A typical operator implementation
+tests for both the `symbol` API and the `ndarray` API. The following test
+has both these tests. The imperative API test, tests for the `ndarray` API,
+`mx.nd.contrib.quadratic`. The `symbol` API test, tests for the complete
+functionality of the operator - the forward pass and the backward
+pass. To facilitate the testing of these functionalities we use three
+helper functions available in the `mxnet.test_utils` module:
+ - `check_symbolic_forward`
+ - `check_symbolic_backward`
+ - `check_numeric_gradient`
+
 ```python
 def test_quadratic_function():
     def f(x, a, b, c):
@@ -567,29 +569,72 @@ def test_quadratic_function():
     a = np.random.random_sample()
     b = np.random.random_sample()
     c = np.random.random_sample()
-    for ndim in range(1, 6):
-        # check forward
-        shape = rand_shape_nd(ndim, 5)
-        data = rand_ndarray(shape=shape, stype='default')
-        data_np = data.asnumpy()
-        expected = f(data_np, a, b, c)
-        output = mx.nd.quadratic(data, a=a, b=b, c=c)
-        assert_almost_equal(output.asnumpy(), expected)
-
-        # check backward using finite difference
-        data = mx.sym.Variable('data')
-        quad_sym = mx.sym.quadratic(data=data, a=a, b=b, c=c)
-        check_numeric_gradient(quad_sym, [data_np])
+    data = mx.symbol.Variable('data')
+    quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c)
+    for dtype in [np.float16, np.float32, np.float64]:
+        for ndim in range(1, 6):
+            shape = rand_shape_nd(ndim, 5)
+            data_np = np.random.randn(*shape).astype(dtype)
+            expected = f(data_np, a, b, c)
+            backward_expected = 2 * a * data_np + b
+
+            # check imperative forward
+            output = mx.nd.contrib.quadratic(mx.nd.array(data_np), a=a, b=b, c=c)
+            assert_almost_equal(output.asnumpy(),expected,
+                                rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check forward
+            check_symbolic_forward(quad_sym, [data_np], [expected],
+                                    rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                    atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check backward
+            check_symbolic_backward(quad_sym, [data_np], [np.ones(expected.shape)],
+                                        [backward_expected],
+                                        rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                        atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check backward using finite difference
+            check_numeric_gradient(quad_sym, [data_np], atol=0.001)
+```
+
+In the above test we create a `quadratic` symbol and feed it into the three
+utility functions. The `check_symbolic_forward` and `check_symbolic_backward`
+tests the computed values against the expected values that we pass
+as an argument to the function. The `check_numeric_gradient` utility function
+performs [gradient checking](http://ufldl.stanford.edu/tutorial/supervised/DebuggingGradientChecking/)
+to verify the implementation for the backward function of the operator.
+It will perform a perturbation on the input and calculate the response
+rate of the output using the
+[finite difference method](https://en.wikipedia.org/wiki/Finite_difference_method).
+Then it will compare the gradient from the backward pass with the values
+from the finite difference method. All three of these tests will be successful
+once the comparison satisfies user specified `rtol` and `atol` values. Here `rtol`
+and `atol` expand to relative tolerance and absolute tolerance respectively. They
+are used to specify how far the computed values can deviate from the expected values.
+They are defined as follows
 ```
-Note that here we used `mx.nd.quadratic` to test the forward function
-and `check_numeric_gradient` to test the backward function. In MXNet,
-two other utility functions are also commonly used: `check_symbolic_forward`
-and `check_symbolic_backward`. By using them in unit tests,
-users need to pass in the operator symbols and expected results
-for comparison. Please also note that
-we highly recommend adding `check_numeric_gradient` test for every operator
-with backward function implemented as it eliminates the possibility
-of passing incorrect expected results into `check_symbolic_backward`.
+abs(Expected_Value - Computed_Value) < RTOL * abs(Expected_Value) + ATOL
+```
+
+For example, if `rtol` is `1e-5` and `atol` is `1e-5` and the expected value is
+`1.5623145`, then the computed value should lie within the range of
+`(1.562288876855, 1.562340123145)` else the test will fail. Make sure you
+tune the `rtol` and `atol` values accordingly. Giving very low values for `rtol`
+and `atol` will likely make the test very flaky. It is recommended that you
+use the flakiness checker tool to check if the test you have written is flaky
+or not. You can run the flakiness checker tool for the above test with the
+following command -
+
+```bash
+python tools/flakiness_checker.py test_operator.test_quadratic_function
+```
+
+Please note that for `check_symbolic_forward` and `check_symbolic_backward` we pass
+both the operator symbols and expected results for comparison, for
+`check_numeric_gradient` we only pass the operator symbol, as the 
+`check_numeric_gradient` computes the expected value using finite difference
+method. Which is why it is highly recommended to add `check_numeric_gradient`
+test for every operator with backward function implemented as it eliminates
+the possibility of passing incorrect expected results into `check_symbolic_backward`.
 
 
 ## Summary
@@ -601,8 +646,8 @@ using nnvm. Congratulations! You now know how to add operators.
 We welcome your contributions to MXNet.
 
 **Note**: Source code in the tutorial can be found in
-[quadratic_op-inl.h](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op-inl.h),
-[quadratic_op.cc](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op.cc),
-[quadratic_op.cu](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/src/operator/tensor/quadratic_op.cu),
+[quadratic_op-inl.h](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op-inl.h),
+[quadratic_op.cc](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op.cc),
+[quadratic_op.cu](https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/quadratic_op.cu),
 and
-[test_operator.py](https://github.com/reminisce/mxnet/blob/add_op_example_for_tutorial/tests/python/unittest/test_operator.py#L4008).
+[test_operator.py](https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L6514).
diff --git a/tests/python/unittest/test_operator.py b/tests/python/unittest/test_operator.py
@@ -6540,19 +6540,31 @@ def f(x, a, b, c):
     a = np.random.random_sample()
     b = np.random.random_sample()
     c = np.random.random_sample()
-    # check forward
-    for ndim in range(1, 6):
-        shape = rand_shape_nd(ndim, 5)
-        data = rand_ndarray(shape=shape, stype='default')
-        data_np = data.asnumpy()
-        expected = f(data_np, a, b, c)
-        output = mx.nd.contrib.quadratic(data, a=a, b=b, c=c)
-        assert_almost_equal(output.asnumpy(), expected, rtol=0.001, atol=0.0001)
-
-        # check backward using finite difference
-        data = mx.sym.Variable('data')
-        quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c)
-        check_numeric_gradient(quad_sym, [data_np], atol=0.001)
+    data = mx.symbol.Variable('data')
+    quad_sym = mx.sym.contrib.quadratic(data=data, a=a, b=b, c=c)
+    for dtype in [np.float16, np.float32, np.float64]:
+        for ndim in range(1, 6):
+            shape = rand_shape_nd(ndim, 5)
+            data_np = np.random.randn(*shape).astype(dtype)
+            expected = f(data_np, a, b, c)
+            backward_expected = 2 * a * data_np + b
+
+            # check imperative forward
+            output = mx.nd.contrib.quadratic(mx.nd.array(data_np), a=a, b=b, c=c)
+            assert_almost_equal(output.asnumpy(),expected,
+                                rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check forward
+            check_symbolic_forward(quad_sym, [data_np], [expected],
+                                   rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                   atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check backward
+            check_symbolic_backward(quad_sym, [data_np], [np.ones(expected.shape)],
+                                    [backward_expected],
+                                    rtol=1e-2 if dtype is np.float16 else 1e-5,
+                                    atol=1e-2 if dtype is np.float16 else 1e-5)
+            # check backward using finite difference
+            check_numeric_gradient(quad_sym, [data_np], atol=0.001)
 
 
 @with_seed()