Zauba Scrapper

My first web scrapping project using Beautiful Soup4, lxml and urllib3.

Objective: To crawl www.zaubacorp.com to understand Director relationships.

Background: Zaubacorp is a website that neatly categorizes publicly available information with the registrar of Indian companies.

Inputs:

Output: Csv file containing URL, DIN, Director Name, Designation, Appointment Date, Search Depth.

File desciption:

zauba.py : Python script to implement the web scrapper.
output.csv : Output for the URL and DEPTH = 3.
requirements.txt : Requirements for the above script to work. Install in virtual-environment and run the script from there.
firstDraft.py : This was my initial approach to the problem (without multi-processing). It's not documented but you will get an idea.

I have tried my best to document the code wherever possible so that you don't have a hard time figuring out what I wanted to do.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
firstDraft.py		firstDraft.py
output.csv		output.csv
requirements.txt		requirements.txt
zauba.py		zauba.py