Word-Document-Index-Application

Given a large collection of documents distributed in the secondary memories (i.e. hard disks) of multiple nodes in a cluster, designed a program that extracts words and their frequency of occurrence from each document and created a word-document index in each node ranked on the frequency.Each index is local to a node a word is associated with only those documents in that node.

Also designed a program to merge all indices into one large word-document index ranked by decreasing frequency to be stored in node.

Implemented this program using MPI in C.

Measured the time taken for different numbers and sizes of documents and independently varying the number of processors used. For each input (of a given number of documents of certain sizes), plotted a curve of time taken against number of processors used.

Compiling and running:

Compile to generate a.out file mpic++ -std=c++11 driver.cpp

Run: mpirun -np num_processes ./a.out path_to_directory

assumptions:

in all nodes directory has same path.
all nodes have file names as numbers and no two nodes have same file name.
keep the stopwords file in the nfs.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
PerformanceResults.pdf		PerformanceResults.pdf
README.md		README.md
driver.cpp		driver.cpp
globalIndexCreation.cpp		globalIndexCreation.cpp
header.h		header.h
localIndexCreation.cpp		localIndexCreation.cpp
stopwords.txt		stopwords.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word-Document-Index-Application

About

Releases

Packages

Languages

VigneshN1997/Word-Document-Index-Application

Folders and files

Latest commit

History

Repository files navigation

Word-Document-Index-Application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages