Skip to content

Find the best route to the inner links of a website, find dead links and create the sitemap.xml

Notifications You must be signed in to change notification settings

Johnmaras/BestRouteTo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BestRouteTo

Find the best route to the inner links of a website, find dead links and create the sitemap.xml

Implements the Dijkstra algorithm for finding the shortest paths. Not professionally made, but provides three utilities that can be easily extended. All results are saved on separate xml files.

It is also provided an http server for testing purposes and a script that creates randomly interlinked html pages.

Depedencies

It requires the following libraries:

  • bs4
  • dominate

    Running

    Runs on python 3.x

    From terminal:
    python3 web_crawler.py --domain http://www.example.com --firstpage thefirstpage.html
    -d, --domain is required
    -f, --firstpage defaults to /
    -so, --sitemapout defaults to sitemap.xml
    -po, --pathsout defaults to paths.xml
    -do, --deadout defaults to dead.xml

  • About

    Find the best route to the inner links of a website, find dead links and create the sitemap.xml

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages