Skip to content

Shai1-Levi/wiki_serach

Repository files navigation

wiki_serach

Hi, In this repo you can find an app that retireve articls from wikipedia by august corpus.

files: Ranker.py is page rank. Most_views.py is most views pages in wikipedia by august 2020. Searcher.py is the best search method the search engine could suggest. indexer.py is the class that contain the inverted index and add methods. like cossine similarity, binary match. Utils.py is class that upload and return the doc titles and return normalize doc for cossine similarty formula. BM25.py have a class that impliments the BM25 ranking method. search_frontend.py is the "main" of the search engine app. in the app there are 6 routes:
1. /search 2. /search_body 3. /search_title 4. /search_anchor 5. /get_pagerank 6. /get_pageview

Link to engine: /http://34.136.131.91:8080

INCOMING

query qexpansion with thesaurus or wordnet by nltk package

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages