Multitool

Welcome

This is the Cascading.Multitool (Multitool) application.

Multitool provides a simple command line interface for building data processing jobs. Think of this as grep, sed, and awk for Hadoop, which also supports joins between multiple data-sets.

For example, with $HADOOP_HOME/bin/ in your PATH, the following command,

$ hadoop jar multitool-<release-date>.jar source=input.txt select=Monday sink=outputDir

will start a Hadoop job to read from the source file input.txt, grep all lines with the word Monday, then output the results into the outputDir directory.

Multitool will inherit the underlying Hadoop configuration, so if the default FileSystem is HDFS, all paths will be relative to the cluster filesystem, not local. Using fully qualified urls will override the defaults (file://some/path or s3n:/bucket/file).

This application is built with Cascading.

Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing workflows on a Hadoop cluster. It can be found at http://www.cascading.org/

Installing

This step is not necessary if you wish to run Multitool directly from the uncompressed distribution folder or Multitool was pre-installed with your Hadoop distribution.

To see if Multitool has already been added to your PATH, type:

$ which multitool

To install for all users into /usr/local/bin:

$ sudo ./bin/multitool install

or for the current user only into ~/.multitool:

$ ./bin/multitool install

For detailed instructions:

$ ./bin/multitool help install

Choose the method that best suites your environment.

If you are running Multitool on AWS Elastic MapReduce, you need to follow the Elastic MapReduce instructions on the AWS site, which typically expect the multitool-<release-date>.jar to be uploaded to AWS S3.

Using

The environment variable HADOOP_HOME should always be set first before using Multitool.

To run from the command line with the jar, Hadoop should be in the path:

$ hadoop jar multitool-<release-date>.jar <args>

...or if Multitool has been installed based on the instructions above:

$ multitool source=data/artist.100.txt cut=0 sink=output

This will cut the first fields out of the file artists.100.txt and save the results to output file.

If no args are given, a comprehensive list of commands will be printed. That list is also available as COMMANDS.md in this directory.

Examples

For more detailed examples of using Multitool, see also: http://cascading.org/multitool/

Copying:

$ ./bin/multitool source=input.txt sink=outputDir

Copying while removing the first header line, and overwriting output:

$ ./bin/multitool source=input.txt source.skipheader=true sink=outputDir sink.replace=true

Filter out data:

$ ./bin/multitool source=input.txt "reject=some words" sink=outputDir

For a more complex example:

$ ./bin/multitool source=data/topic.100.txt cut=0 \
"pgen=(\b[12][09][0-9]{2}\b)" group=0 count=0 group=1 \
sink=output sink.replace=true sink.parts=1

This will find all years in the input file, count them, and sort them by counts.

Building

To build Multitool, you may download the source code from GitHub:

https://github.com/cascading/cascading.multitool

This release will pull all dependencies from the relevant maven repos, including http://conjars.org

To build a jar,

$ ant retrieve jar

To test,

$ ant test

License

See apl.txt in this directory.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
bin		bin
data		data
src		src
.gitignore		.gitignore
COMMANDS.md		COMMANDS.md
README.md		README.md
apl.txt		apl.txt
build.xml		build.xml
ivy.xml		ivy.xml
test_cli.sh		test_cli.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multitool

Welcome

Installing

Using

Examples

Building

License

About

Releases

Packages

Languages

lior2b/cascading.multitool

Folders and files

Latest commit

History

Repository files navigation

Multitool

Welcome

Installing

Using

Examples

Building

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages