Live Blog Analyis
This is our final project for Advances in Data Science and Architectures course at Northeastern University under guidance of Professor Srikanth Krishnamurthy.
We have used a dataset from WebHose.io and trained text analytics algorithms on blogs. We have tested these algorithms on live blogs scrapped by the WebHose API. We have implemented TF-IDF (Term Frequency- Inverse Document Frequency) and LDA (Latent Dirichlet Allocation) Algorithms to classify and cluster the blogs into groups based on its text (blog content).