[RFC] Need to export all HybridBlocks in a Gluon model #19535

samskalicky · 2020-11-14T07:01:31Z

Description

Gluon can have any hierarchy of Blocks where a top-level Block is not a HybridBlock, and lower-level blocks are HybridBlocks. These HybridBlocks can be dispersed throughout the model architecture. Users can optimize the parts of their model that support it by calling hybridize on the top level block to trigger a recursive call throughout all child blocks. But there is currently no way to export all HybridBlocks without going through and manually calling export on each one. Further, theres no way to reload those exported symbol files back without changing the model design and swapping those HybridBlocks for SymbolBlocks and than one-by-one calling imports to reload.

I would like to ask for suggestions from the community and have an active discussion on the different ways to address this problem. To kick things off, here is a proposal to start the discussion around:

def save_cached_graphs(block):
    def _save_cached_graphs(blk, index):
        if isinstance(blk, mx.gluon.nn.HybridBlock):
            blk.export(blk.name + str(index[0]))
        for child in blk._children.values():
            index[0] += 1
            _save_cached_graphs(child, index)
    #save top-level block                                                                                  
    index = [0]
    _save_cached_graphs(block, index)

def load_cached_graphs(block):
    def _load_cached_graphs(blk, index):
        if isinstance(blk, mx.gluon.nn.HybridBlock):
            sym = symbol.load(blk.name + str(index[0]) + '-symbol.json')
            blk._cached_graph = sym
        for child in blk._children.values():
            index[0] += 1
            _load_cached_graphs(child, index)
    #load top-level block                                                                                  
    index = [0]
    _load_cached_graphs(block, index)

With these two functions, we can recursively export each hybrid block and then reload the symbols. Obviously the code is not complete or even functional (_cached_graph is actual a tuple of symbols and sym.var inputs). But should serve as a point of reference.

Items on v1.x

Since a Block's children are stored in a dictionary, need to save/restore their unique names
Since parameters are mapped to their block's name, need to synchronize names after reloading model architecture to match save params

Items on master

Since parameters have a UUID, need to save/restore mapping of a Block's params to UUID

General Approach

Recursively create a dictionary structure of blocks mimicking the model architecture
Be able to uniquely identify the block in the model
For HybridBlocks, save/restore the cached graph (symbol + inputs) and in/out formats
Save the model architecture dictionary & parameters
Restore the model architecture with unique identifiers to synchronize with parameter naming/IDs

The text was updated successfully, but these errors were encountered:

samskalicky · 2020-11-14T07:16:57Z

Heres a silly example of a hierarchical Block/HybridBlock to play around with:

class MyBlock(mx.gluon.nn.Block):
    def __init__(self, **kwargs):
        super(MyBlock, self).__init__(**kwargs)
    def add(self, block):
        self._children[block.name + str(len(self._children))] = block
    def forward(self, x, *args):
        out = (x,) + args
        for block in self._children.values():
            out = block(*out)
        return out

# create the Model
inside = MyBlock()
inside.add(mx.gluon.nn.Dense(10))
net = MyBlock()
net.add(inside)
net.add(mx.gluon.nn.Dense(10))
net.initialize()
x = mx.nd.empty((1,10))
out = net(x)

#hybridize and create cached_graphs
net.hybridize()
out = net(x)

#save cached_graphs
save_cached_graphs(net)

The hierarchy should look like this, where a top level Block(0) has a child Block(1) and a HybridBlock(1) and the child Block(1) has a child HybridBlock(0).

- MyBlock(0)
         |
         - MyBlock(1)
         |     |
         |     - Dense(0)
         - Dense(1)

samskalicky · 2020-11-14T21:44:04Z

Heres a complete, working solution on v1.8.x branch. Notice the new save and load functions perform the full model export/reload of both model architecture and params.

import mxnet as mx
import json

class MyBlock(mx.gluon.nn.Block):
    def __init__(self, **kwargs):
        super(MyBlock, self).__init__(**kwargs)
    def add(self, block):
        self._children[block.name + str(len(self._children))] = block
    def forward(self, x, *args):
        out = (x,) + args
        for block in self._children.values():
            out = block(*out)
        return out
    def save(self, prefix):
        # create empty model structure
        model = {}
        def _save_cached_graphs(blk, index, structure):
            # create new entry for this block
            mdl = {'orig_name': blk.name}
            # encode unique name based on block type and ID
            name = type(blk).__name__.lower()
            structure[name+str(index[0])] = mdl
            if isinstance(blk, mx.gluon.nn.HybridBlock):
                # save in/out formats
                mdl['in_format'] = blk._in_format
                mdl['out_format'] = blk._out_format
                # save cached graph & input symbols
                syms, out = blk._cached_graph
                mdl_syms = []
                for sym in syms:
                    mdl_syms.append(sym.tojson())
                mdl['inputs'] = mdl_syms
                mdl['symbol'] = out.tojson()
            children = dict()
            mdl['children'] = children
            # recursively save children
            for ch_name, child in blk._children.items():
                index[0] += 1
                # save child's original name in this block's map
                children[child.name] = ch_name
                _save_cached_graphs(child, index, mdl)
        # save top-level block
        index = [0]
        _save_cached_graphs(self, index, model)
        # save model
        fp = open(prefix+'-model.json','w')
        json.dump(model, fp)
        fp.close()
        # save params
        self.save_parameters(prefix+'-model.params')
        
    def load(self, prefix):
        # load model json from file
        fp = open(prefix+'-model.json')
        model = json.load(fp)
        fp.close()
        def _load_cached_graphs(blk, index, log):
            # get block name
            name = type(blk).__name__.lower()
            # lookup previous encoded name based on block type and ID
            mdl = log[name+str(index[0])]
            # rename block to what it was when saved
            blk._name = mdl['orig_name']
            if isinstance(blk, mx.gluon.nn.HybridBlock):
                # restore in/out formats
                blk._in_format = mdl['in_format']
                blk._out_format = mdl['out_format']
                # get saved symbol
                out = mx.sym.load_json(mdl['symbol'])
                syms = []
                # recreate inputs for this symbol
                for inp in mdl['inputs']:
                    syms.append(mx.sym.load_json(inp))
                # reset cached_graph and active status
                blk._cached_graph = (syms, out)
                blk._active = True
            # rename params with updated block name
            pnames = list(blk.params.keys())
            for p in pnames:
                param = blk.params._params[p]
                new_name = blk.name +'_'+ p[len(blk.params._prefix):]
                blk.params._params.pop(p)
                blk.params._params[new_name] = param            
            # recursively reload children
            for ch_name, child in blk._children.items():
                index[0] += 1
                _load_cached_graphs(child, index, mdl)
            # current set of child names
            ch_names = list(blk._children.keys())
            # original child names
            children = mdl['children']
            # loop and remap children with original names
            for ch_name in ch_names:
                child = blk._children[ch_name]
                blk._children.pop(ch_name)
                orig_name = children[child.name]
                blk._children[orig_name] = child
        # load top-level block
        index = [0]
        _load_cached_graphs(self, index, model)
        # load params
        self.load_parameters(prefix+'-model.params')

def createNet():
    inside = MyBlock()
    dense = mx.gluon.nn.Dense(10)
    inside.add(dense)
    net = MyBlock()
    net.add(inside)
    net.add(mx.gluon.nn.Dense(10))
    return net

# create and initialize model
net = createNet()
net.initialize()
# run first inference to test
x = mx.nd.empty((1,10))
out = net(x)
# hybridize (the hybridizeable blocks, ie. the Dense layers)
net.hybridize()
out = net(x)

# save hybridized model
net.save('MyModel')

# create a new model, uninitialized
net = createNet()
# reload hybridized model
net.load('MyModel')
# run inference again
out = net(x)

And heres a complete, working solution on master branch:

import mxnet as mx
import json

class MyBlock(mx.gluon.Block):
    def __init__(self, **kwargs):
        super(MyBlock, self).__init__(**kwargs)
        self.layers = []
    def add(self, block):
        self.layers.append(block)
        self.register_child(block)
    def forward(self, x, *args):
        out = (x,) + args
        for block in self._children.values():
            out = block()(*out)
        return out                                    
    def save(self, prefix):
        # create empty model structure
        model = {}
        def _save_cached_graphs(blk, index, structure):
            # create new entry for this block
            mdl = {}
            # encode unique name based on block type and ID
            name = type(blk).__name__.lower()
            structure[name+str(index[0])] = mdl
            if isinstance(blk, mx.gluon.nn.HybridBlock):
                # save in/out formats
                mdl['in_format'] = blk._in_format
                mdl['out_format'] = blk._out_format
                # save cached graph & input symbols
                syms, out = blk._cached_graph
                mdl_syms = []
                for sym in syms:
                    mdl_syms.append(sym.tojson())
                mdl['inputs'] = mdl_syms
                mdl['symbol'] = out.tojson()
            # save param uuids
            pmap = {}
            mdl['params'] = pmap
            pnames = list(blk.params.keys())
            for p in pnames:
                param = blk.params[p]
                pmap[p]=param._uuid
            # recursively save children
            for ch_name, child in blk._children.items():
                index[0] += 1
                _save_cached_graphs(child(), index, mdl)
        # save top-level block
        index = [0]
        _save_cached_graphs(self, index, model)
        # save model
        fp = open(prefix+'-model.json','w')
        json.dump(model, fp)
        fp.close()
        # save params
        self.save_parameters('MyModel-model.params')
    def load(self, prefix):
        # load model json from file
        fp = open(prefix+'-model.json')
        model = json.load(fp)
        fp.close()
        def _load_cached_graphs(blk, index, structure):
            # get block name
            name = type(blk).__name__.lower()
            # lookup previous encoded name based on block type and ID
            mdl = structure[name+str(index[0])]
            if isinstance(blk, mx.gluon.nn.HybridBlock):
                # restore in/out formats
                blk._in_format = mdl['in_format']
                blk._out_format = mdl['out_format']
                # get saved symbol
                out = mx.sym.load_json(mdl['symbol'])
                syms = []
                # recreate inputs for this symbol
                for inp in mdl['inputs']:
                    syms.append(mx.sym.load_json(inp))
                # reset cached_graph and active status
                blk._cached_graph = (syms, out)
                blk._active = True
            # reload param uuids
            pmap = mdl['params']
            for p, uuid in pmap.items():
                param = blk.params[p]
                param._uuid = pmap[p]
            # recursively reload children
            for ch_name, child in blk._children.items():
                index[0] += 1
                _load_cached_graphs(child(), index, mdl)
        # load top-level block
        index = [0]
        _load_cached_graphs(self, index, model)
        # load params
        self.load_parameters('MyModel-model.params')
        
def createNet():
    inside = MyBlock()
    dense = mx.gluon.nn.Dense(10)
    inside.add(dense)
    net = MyBlock()
    net.add(inside)
    net.add(mx.gluon.nn.Dense(10))
    return net

# create and initialize model
net = createNet()
net.initialize()
# run first inference to test
x = mx.nd.random.randn(1,10)
out = net(x)
# hybridize (the hybridizeable blocks, ie. the Dense layers)
net.hybridize()
out = net(x)

# save hybridized model
net.save('MyModel')

# create a new model, uninitialized
net = createNet()
# reload hybridized model
net.load('MyModel')
# run inference again
out = net(x)

fhieber · 2020-11-16T09:26:33Z

Thanks @samskalicky, this would sound like an interesting feature for Sockeye as well. In Sockeye, we use a mix of Hybrid and non-Hybrid blocks, both during training and inference. We currently do not use the export mechanism of Gluon, but keep our own configuration file to build the model graph at inference time, according to its specifications, we therefore tie code and model artifacts closely together. An export functionality that would allow us to export the inference computation after training sounds useful.

My main question about this though would be how exported HybridBlocks (i.e. SymbolBlocks) behave w.r.t changing input shapes? Before Gluon, we had to make heavy use of the BucketingModule in MXNet to account for varying shapes at inference time.

samskalicky · 2020-11-16T18:06:47Z

Thanks @fhieber is your approach for Sockeye's configuration file generalizable to something we could build in to mxnet? Can you point me to an example where you do this?

My main question about this though would be how exported HybridBlocks (i.e. SymbolBlocks) behave w.r.t changing input shapes? Before Gluon, we had to make heavy use of the BucketingModule in MXNet to account for varying shapes at inference time.

Nothing would change for that problem with exporting models. The restrictions around input shapes is specific to the MXNet engine memory allocation scheme where reshaping causes re-allocation. Its much lower level in the stack. If you wanna discuss that lets start a separate GitHub issue.

samskalicky · 2020-11-19T08:03:16Z

Created PRs on v1.x (#19565) and master (#19564) branches. Closing this RFC issue and migrating to PRs for further discussion.

samskalicky added the Feature request label Nov 14, 2020

This was referenced Nov 19, 2020

Save/Load Gluon Blocks & HybridBlocks #19564

Merged

[v1.x] Save/Load Gluon Blocks & HybridBlocks #19565

Merged

samskalicky closed this as completed Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Need to export all HybridBlocks in a Gluon model #19535

[RFC] Need to export all HybridBlocks in a Gluon model #19535

samskalicky commented Nov 14, 2020 •

edited

Loading

samskalicky commented Nov 14, 2020 •

edited

Loading

samskalicky commented Nov 14, 2020 •

edited

Loading

fhieber commented Nov 16, 2020

samskalicky commented Nov 16, 2020

samskalicky commented Nov 19, 2020

[RFC] Need to export all HybridBlocks in a Gluon model #19535

[RFC] Need to export all HybridBlocks in a Gluon model #19535

Comments

samskalicky commented Nov 14, 2020 • edited Loading

Description

Items on v1.x

Items on master

General Approach

samskalicky commented Nov 14, 2020 • edited Loading

samskalicky commented Nov 14, 2020 • edited Loading

fhieber commented Nov 16, 2020

samskalicky commented Nov 16, 2020

samskalicky commented Nov 19, 2020

samskalicky commented Nov 14, 2020 •

edited

Loading

samskalicky commented Nov 14, 2020 •

edited

Loading

samskalicky commented Nov 14, 2020 •

edited

Loading