Skip to content

Write Mappers and Reducers

Yangqing edited this page Apr 22, 2013 · 4 revisions

In a Glance

Mappers and reducers are defined as derived classes of mincepie.mapreducer.BasicMapper and mincepie.mapreducer.BasicReducer respectively. Basically, your need to write your own map() and reduce() functions that carry out your job.

Mappers

In your mapper you need to define the map() function, which takes a key and a value, and yields multiple key and value pairs. A simple example, which takes a file name as the value and yields tuples for word counting, looks like this:

class WordCountMapper(mincepie.mapreducer.BasicMapper):
    """The wordcount mapper"""
    def map(self, key, value):
        with open(value,'r') as fid:
            for line in fid:
                for word in line.split():
                    yield word, 1

Optionally, you can define a setUp(self) member function, which does not take any parameters, to do any initialization work your mapper needs when the mapper instance is created. You can also add any other member variables/functions to your class.

Reducers

In your reducer you need to define the reduce() function. It also takes a key and a list containing values returned by the map() function corresponding to the key, and returns the output value for the corresponding key. A simple example, which takes a word and a list of counts produced by the WordCountMapper, looks like this:

class WordCountReducer(mapreducer.BasicReducer):
    """The wordcount reducer"""
    def reduce(self, key, values):
        return sum(values)

Register your Mapper and Reducer

In your python source file, you can define multiple mappers and reducers, and then choose the mapper and reducer to use with command line options (see Launch your mapreducer for more details). Thus you need to register your mapper and reducer so mincepie can find the right class by its name. For example, to register your mapper class MyAwesomeMapper, simply do

mincepie.mapreducer.REGISTER_MAPPER(MyAwesomeMapper)

Alternatively, you can register your mapper as the default mapper so you don't need to explicitly specify it in the commandline argument. This is done by

mincepie.mapreducer.REGISTER_DEFAULT_MAPPER(MyAwesomeMapper)

After registration you can define your mapper and reducer in one module, and use it in another script by simply importing the module.

Predefined Mappers and Reducers

We have defined a set of simple, default mapper and reducers. Here is a list of them:

Name Output
`IdentityMapper` The same key and value pair
`IdentityReducer` The same list of values
`SumReducer` The sum of the input values
`FirstElementReducer` The first element of the value list
`NoPassReducer` Nothing