-
Notifications
You must be signed in to change notification settings - Fork 54
Write Mappers and Reducers
Mappers and reducers are defined as derived classes of mincepie.mapreducer.BasicMapper
and mincepie.mapreducer.BasicReducer
respectively. Basically, your need to write your own map()
and reduce()
functions that carry out your job.
In your mapper you need to define the map()
function, which takes a key and a value, and yields multiple key and value pairs. A simple example, which takes a file name as the value and yields tuples for word counting, looks like this:
class WordCountMapper(mincepie.mapreducer.BasicMapper):
"""The wordcount mapper"""
def map(self, key, value):
with open(value,'r') as fid:
for line in fid:
for word in line.split():
yield word, 1
Optionally, you can define a setUp(self)
member function, which does not take any parameters, to do any initialization work your mapper needs when the mapper instance is created. You can also add any other member variables/functions to your class.
In your reducer you need to define the reduce()
function. It also takes a key and a list containing values returned by the map()
function corresponding to the key, and returns the output value for the corresponding key. A simple example, which takes a word and a list of counts produced by the WordCountMapper, looks like this:
class WordCountReducer(mapreducer.BasicReducer):
"""The wordcount reducer"""
def reduce(self, key, values):
return sum(values)
In your python source file, you can define multiple mappers and reducers, and then choose the mapper and reducer to use with command line options (see Launch your mapreducer for more details). Thus you need to register your mapper and reducer so mincepie can find the right class by its name. For example, to register your mapper class MyAwesomeMapper, simply do
mincepie.mapreducer.REGISTER_MAPPER(MyAwesomeMapper)
Alternatively, you can register your mapper as the default mapper so you don't need to explicitly specify it in the commandline argument. This is done by
mincepie.mapreducer.REGISTER_DEFAULT_MAPPER(MyAwesomeMapper)
After registration you can define your mapper and reducer in one module, and use it in another script by simply importing the module.
We have defined a set of simple, default mapper and reducers. Here is a list of them:
Name | Output |
---|---|
`IdentityMapper` | The same key and value pair |
`IdentityReducer` | The same list of values |
`SumReducer` | The sum of the input values |
`FirstElementReducer` | The first element of the value list |
`NoPassReducer` | Nothing |