Apache Accumulo Word Count example

The WordCount example (WordCount.java) uses MapReduce and Accumulo to compute word counts for a set of documents. This is accomplished using a map-only MapReduce job and an Accumulo table with combiners.

To run this example, create a directory in HDFS containing text files. You can use the Accumulo README for data:

$ hdfs dfs -mkdir /wc
$ hdfs dfs -copyFromLocal /path/to/accumulo/README.md /wc/README.md

Verify that the file was created:

$ hdfs dfs -ls /wc

After creating the table, run the WordCount MapReduce job with your HDFS input directory:

$ ./bin/runmr mapreduce.WordCount -i /wc

WordCount.java creates an Accumulo table named with a SummingCombiner iterator attached to it. It runs a map-only M/R job that reads the specified HDFS directory containing text files and writes word counts to Accumulo table.

After the MapReduce job completes, query the Accumulo table to see word counts.

$ accumulo shell
username@instance> table examples.wordcount
username@instance examples.wordcount> scan -b the
the count:20080906 []    75
their count:20080906 []    2
them count:20080906 []    1
then count:20080906 []    1
...

When the WordCount MapReduce job was run above, the client properties were serialized into the MapReduce configuration. This is insecure if the properties contain sensitive information like passwords. A more secure option is store accumulo-client.properties in HDFS and run the job with the -D options. This will configure the MapReduce job to obtain the client properties from HDFS:

$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/myuser
$ hdfs dfs -copyFromLocal /path/to/accumulo/conf/accumulo-client.properties /user/myuser/
$ ./bin/runmr mapreduce.WordCount -i /wc -t examples.wordcount2 -d /user/myuser/accumulo-client.properties

After the MapReduce job completes, query the examples.wordcount2 table. The results should be the same as before:

$ accumulo shell
username@instance> table examples.wordcount2
username@instance examples.wordcount2> scan -b the
the count:20080906 []    75
their count:20080906 []    2
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wordcount.md

wordcount.md

Apache Accumulo Word Count example

Files

wordcount.md

Latest commit

History

wordcount.md

File metadata and controls

Apache Accumulo Word Count example