Yang Gao's implementation of tf-idf text indexing scheme, predict doc similarity by cosine similarity. Refer to: http://en.wikipedia.org/wiki/Tf–idf; yet I use normalized tf instead of raw tf.
It can serve as a baseline for more complicated text indexing and retrieval models, such as topic model.
- see "run_examples.sh" for example usage.
- external libraries, such as Eigen and tclap, are included; therefore the code is ready to run
- for initial build, type "make";
- if you modify code, type "make rebuild"
for questions, comments or to report bugs, contact Yang Gao(USC/ISI) at [email protected]