Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[Numpy] Bugfix of slice operator export (MXNet to ONNX) #17827

Closed

Conversation

QueensGambit
Copy link
Contributor

@QueensGambit QueensGambit commented Mar 13, 2020

Description

This PR fixes the slice operator export from MXNet into ONNX.

@MoritzMaxeiner reported this problem already: Incorrect ONNX export of SliceChannel #13061.
The corresponding pull request ONNX export: Support equal length splits #14121 however, only partially solved the problem.

The output of the slice operator was corrected but it is still not properly handled in the get_inputs() function.

The following unit test demonstrates this:

class SplitConcatBlock(HybridBlock):
    """Block which creates two splits and later concatenates them"""
    def __init__(self, name):
        super(SplitConcatBlock, self).__init__(name)

    def hybrid_forward(self, F, x):
        splits = F.split(x, axis=1, num_outputs=2)
        return F.concat(*splits)

def test_onnx_export_slice(self):
     net = nn.HybridSequential(prefix='slice_net')
     with net.name_scope():
         net.add(nn.Dense(100, activation='relu'), SplitConcatBlock("splitConcat"), nn.Dense(10))
     _check_onnx_export(net)

Before this PR, the conversion of the model from MXNet into ONNX resulted in the following error:

Traceback (most recent call last):
  File "mxnet_export_test.py", line 134, in test_onnx_export_slice
    _check_onnx_export(net)
  File "mxnet_export_test.py", line 66, in _check_onnx_export
    onnx_file_path=onnx_file_path)
  File "/opt/mxnet/python/mxnet/contrib/onnx/mx2onnx/export_model.py", line 87, in export_model
    verbose=verbose)
  File "/opt/mxnet/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 312, in create_onnx_graph_proto
    checker.check_graph(graph)
  File "/usr/local/lib/python3.6/dist-packages/onnx/checker.py", line 51, in checker
    proto.SerializeToString(), ctx)
onnx.onnx_cpp2py_export.checker.ValidationError: Nodes in a graph must be topologically sorted, however input 'slice_netsplitConcatsplit0' of node: 
input: "slice_netsplitConcatsplit0" input: "slice_netsplitConcatsplit0" output: "slice_netsplitConcatconcat0" name: "slice_netsplitConcatconcat0" op_type: "Concat" attribute { name: "axis" i: 1 type: INT }
 is not output of any previous nodes.

Now, this is resolved by replacing:

        input_nodes.append(proc_nodes[input_node_id].name)

with:

        try:
            # ip[1] defines which output index to use
            input_nodes.append(proc_nodes[input_node_id].output[ip[1]])
        except AttributeError:
            # fallback to the name attribute as output if the output attribute does not exist (e.g. for data nodes)
            input_nodes.append(proc_nodes[input_node_id].name)

Ping: @Roshrini @vandanavk @ChaiBapchya

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Added unit test: test_onnx_export_slice()
  • Updated: get_inputs()

Comments

  • This PR is backward compatible

@QueensGambit QueensGambit requested a review from szha as a code owner March 13, 2020 16:12
@QueensGambit QueensGambit changed the title [Numpy] Bugfix of MXNet to ONNX export of slice operator [Numpy] Bugfix of slice operator export (MXNet to ONNX) Mar 13, 2020
@QueensGambit
Copy link
Contributor Author

Hi, there is actually an additional problem:
The function create_onnx_graph_proto() assumes that in_shape remains fixed throughout the whole network:
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/contrib/onnx/mx2onnx/export_onnx.py#L244

                converted = MXNetGraph.convert_layer(
                    node,
                    is_input=False,
                    mx_graph=mx_graph,
                    weights=weights,
                    in_shape=in_shape,
                    in_type=in_type,
                    proc_nodes=all_processed_nodes,
                    initializer=initializer,
                    index_lookup=index_lookup,
                    idx=idx
                )

I suggest to infer the current shape before each layer and to refactor create_onnx_graph_proto().

@ruro
Copy link
Contributor

ruro commented Apr 16, 2020

@QueensGambit I am personally interested in fixing this issue. I've also developed a fix for the in_shape problem. Can you please review and accept my changes, if there are no problems. Or alternatively, if you don't have the time to work on this, I can open my own PR for this bugfix and we can close this one.

P.S. You might also want to rebase on top of current master, since I think, there were some CI/CD changes recently, so maybe that will improve the problems with CI validation.

@QueensGambit
Copy link
Contributor Author

QueensGambit commented Apr 18, 2020

Hello @ruro, great to see that you have been working on the shape inference issue.
Surprisingly, I didn't receive any notification for your PR and only noticed it after you mentioned it.
Passing graph_shapes as a dict is a working solution but is cumbersome for deep models with changing shape sizes.
I noticed that there is existing functionality for inferring the out-shapes of a symbol object.
For instance you can plot the model structure including all shapes of each layer like this:

    display(mx.viz.plot_network(
        symbol,
        shape={'data':(input_shape)},
        node_attrs={"shape":"oval","fixedsize":"false"}
    ))

The corresponding code for inferring the shapes looks like this:

    internals = symbol.get_internals()
    input_name = "data"
    _, out_shapes, _ = internals.infer_shape(**{input_name: input_shape})

This way you would only require input_name as an additional parameter which could be set to "data" by default.
Do you want to update your code or should I do it?

@ruro
Copy link
Contributor

ruro commented Apr 18, 2020

@QueensGambit I am not quite sure, what you mean by cumbersome for deep models with changing shape sizes.

The approach I use in my PR extracts the shapes in exactly the way you describe. The graph_shapes is a dict, mapping node names to shape tuples. It is built in advance

graph_shapes = MXNetGraph.get_outputs(sym.get_internals(), params, in_shape, output_label)

by calling get_outputs to effectively do exactly

_, out_shapes, _ = internals.infer_shape(**inputs)

P.S. You are free to change the code in my PR for graph_shapes however you want.

@QueensGambit
Copy link
Contributor Author

@ruro see QueensGambit#1 for reply on your PR.

@ruro
Copy link
Contributor

ruro commented Jun 10, 2020

@QueensGambit I am interested in getting this fix accepted.

Could you solve the merge conflict and rebase on top of master? The conflict is in the unittest, but it's pretty easy to fix. Just add a @with_seed decorator to the test function and update the diff context.

A patch like this should work:

diff --git a/tests/python/unittest/onnx/mxnet_export_test.py b/tests/python/unittest/onnx/mxnet_export_test.py
index 40d7d4e3e..3b039b426 100644
--- a/tests/python/unittest/onnx/mxnet_export_test.py
+++ b/tests/python/unittest/onnx/mxnet_export_test.py
@@ -28,6 +28,7 @@ from common import setup_module, teardown_module, with_seed
 from mxnet import nd, sym
 from mxnet.test_utils import set_default_context
 from mxnet.gluon import nn
+from mxnet.gluon import HybridBlock
 from mxnet.contrib import onnx as onnx_mxnet
 import mxnet as mx
 
@@ -80,6 +81,16 @@ def _check_onnx_export(net, group_outputs=False, shape_type=tuple, extra_params=
             mx.test_utils.assert_almost_equal(out, imp_out, atol=1e-5, rtol=1e-5)
 
 
+class SplitConcatBlock(HybridBlock):
+    """Block which creates two splits and later concatenates them"""
+    def __init__(self, name):
+        super(SplitConcatBlock, self).__init__(name)
+
+    def hybrid_forward(self, F, x):
+        splits = F.split(x, axis=1, num_outputs=2)
+        return F.concat(*splits)
+
+
 class TestExport(unittest.TestCase):
     """ Tests ONNX export.
     """
@@ -126,3 +137,10 @@ class TestExport(unittest.TestCase):
             net.add(nn.Dense(100, activation='relu'), nn.Dense(10))
         _check_onnx_export(net, extra_params={'extra_param': nd.array([1, 2])})
 
+    @with_seed()
+    def test_onnx_export_slice(self):
+        net = nn.HybridSequential(prefix='slice_net')
+        with net.name_scope():
+            net.add(nn.Dense(100, activation='relu'), SplitConcatBlock("splitConcat"), nn.Dense(10))
+        _check_onnx_export(net)
+

@QueensGambit
Copy link
Contributor Author

@ruro I added a verbose parameter to get_outputs to disable the info message when using sym.get_internals().

        graph_shapes = MXNetGraph.get_outputs(sym.get_internals(), params, in_shape, output_label, verbose=False)

I also added another unit test for using the slice operation with changing shapes.
Finally I rebased my branch to master but I probably did this s.th. wrong there because this resulted in tons of additional commits.
Maybe it is better to create a new clean PR from current master.

@ruro
Copy link
Contributor

ruro commented Jun 10, 2020

Yeah. Don't worry. I am guessing, you merged instead of rebasing. What I meant is to do git pull --rebase upstream master. You can try to restore the previous state from reflog and try to rebase+force push again. Or you can create a new PR.

@QueensGambit QueensGambit force-pushed the onnx_export_slice_fix branch from 09eee04 to 79ba0e5 Compare June 10, 2020 22:34
@QueensGambit
Copy link
Contributor Author

QueensGambit commented Jun 10, 2020

@ruro Ok, thank you. I was able to fix this mess by going back and using a forced push.
Is it correct now?

@ruro
Copy link
Contributor

ruro commented Jun 11, 2020

Unfortunately, it seems, that this doesn't clear the codeowners, so the PR still requires the review of 10 different maintainers. You'll probably have to create a new PR after all. :(

The last "added ONNX unit test export" commit doesn't seem correct, since it updates the submodules. I think, you can just drop this commit, since there are no actual changes left in it apart from the accidental submodules update.

@QueensGambit QueensGambit force-pushed the onnx_export_slice_fix branch from 79ba0e5 to 0c6411f Compare June 11, 2020 09:34
@QueensGambit
Copy link
Contributor Author

@ruro I created a new PR in #18535.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants