Scraping data from web using python library BeautifulSoup and requests.
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic.
Link-> https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Link-> http://docs.python-requests.org/en/master/
- Insect the page
- Obtain HTML
- Choose a parser (lxml , html5lib , html.parser)
- Create a beautifulsoup object
- Extract tags that we need
- Store the data in lists
- Make a dataframe
- Download a CSV file that contains all data scraped Specification
-
Just run
jupyter notebook
in terminal and it will run in your browser.Install Jupyter here i've you haven't.
-
install BautifulSoup by using
pip install beautifulsoup4
in command line prompt/ anconda i've you haven't.
- from bs4 import BeautifulSoup
- import requests
- import pandas as pd
Pull requests are always welcome. For major changes, please contact me on my LinkedIn account https://www.linkedin.com/in/rahulsisodia06/