Skip to content

Scraping data from web using python library BeautifulSoup and requests

Notifications You must be signed in to change notification settings

rahkum96/Web-Scraping-Using-Python-BeautifulSoup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Web-Scraping-Using-Python-BeautifulSoup

Scraping data from web using python library BeautifulSoup and requests.

BeautifulSoup

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Requests

Requests allows you to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic.

Documentation

Link-> https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Link-> http://docs.python-requests.org/en/master/

Workflow

  1. Insect the page
  2. Obtain HTML
  3. Choose a parser (lxml , html5lib , html.parser)
  4. Create a beautifulsoup object
  5. Extract tags that we need
  6. Store the data in lists
  7. Make a dataframe
  8. Download a CSV file that contains all data scraped Specification

Usage

  • Just run jupyter notebook in terminal and it will run in your browser.

    Install Jupyter here i've you haven't.

  • install BautifulSoup by using pip install beautifulsoup4 in command line prompt/ anconda i've you haven't.

Packages used:

- from bs4 import BeautifulSoup
- import requests
- import pandas as pd

Contributing

Pull requests are always welcome. For major changes, please contact me on my LinkedIn account https://www.linkedin.com/in/rahulsisodia06/

About

Scraping data from web using python library BeautifulSoup and requests

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published