Web Scraping Project

This project demonstrates web scraping using Scrapy to extract data from multiple websites, including the Steam gaming platform and Inshorts news site. The extracted data is then processed and saved in various formats, including CSV and PDF.

Project Description

The project consists of following main parts:

**Reddit/subreddit data and comment and metadata extractor.
Steam Scraper: Scrapes top-selling game data from the Steam platform.
Inshorts Scraper: Scrapes news articles from Inshorts.

Reddit Scraper

A Scrapy spider designed to scrape posts and comments from a specified subreddit on [https://old.reddit.com].
The spider extracts post titles, links, and comments, storing them in a structured format.

Steam Scraper

Extracts data such as game name, game URL, image URL, release date, price, and review summary.
Saves the extracted data into a CSV file.
Converts the CSV data into a formatted PDF.

Inshorts Scraper

Extracts news articles including titles, content, author, and timestamp.
Saves the extracted data into a CSV file.

Overview of a Web Scraper functioning

Setup Instructions

Clone the Repository

git clone https://github.com/ManikSinghSarmaal/Web-Scraping
cd Web-Scraping

Create and Activate a Virtual Environment

# On macOS and Linux
python3 -m venv venv
source venv/bin/activate

# On Windows
python -m venv venv
venv\Scripts\activate

Install Required Packages
```
pip install -r requirements.txt
```
Configure Scrapy Settings
- Ensure you have the correct settings in settings.py for each Scrapy spider.

Usage

Running the Subreddit Scraper

Navigate to the Reddit Scraper Directory
```
cd subreddit
```
Run the Scraper
```
scrapy crawl subreddit_data -o data.csv
```

Note -

If you wish to use rotating proxy to prevent ban from website as request count increases, make your account on ScrapeOps and add your api key and uncomment some lines, for more informatiion on using scrapy-scrapeops-proxy-sdk refer this [https://github.com/ScrapeOps/scrapeops-scrapy-proxy-sdk#integrating-into-your-scrapy-project]

Running the Steam Scraper

Navigate to the Steam Scraper Directory
```
cd steam_scraper
```

Run the Scraper

scrapy crawl infinite_scroll -o steam_best_sellers.csv

Convert CSV to PDF
```
python csv_to_pdf.py
```

Running the Inshorts Scraper

Navigate to the Inshorts Scraper Directory
```
cd inshorts_scraper
```

Run the Scraper

scrapy crawl inshorts -o inshorts_news.csv

File Structure

Please clone this repo and run this command in terminal tree to understand the directory structure

Demo

Contribution Guidelines

Fork the Repository
Create a New Branch
```
git checkout -b feature-branch
```
Make Changes and Commit
```
git commit -m "Description of changes"
```
Push to Your Fork
```
git push origin feature-branch
```
Create a Pull Request

Contact

For any questions or suggestions, feel free to contact me at [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Images		Images
inshorts_scrapy		inshorts_scrapy
steam		steam
subreddit		subreddit
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Project

Project Description

Reddit Scraper

Steam Scraper

Inshorts Scraper

Overview of a Web Scraper functioning

Setup Instructions

Usage

Running the Subreddit Scraper

Note -

Running the Steam Scraper

Running the Inshorts Scraper

File Structure

Demo

Contribution Guidelines

Contact

About

Releases

Packages

Languages

ManikSinghSarmaal/Web-Scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project

Project Description

Reddit Scraper

Steam Scraper

Inshorts Scraper

Overview of a Web Scraper functioning

Setup Instructions

Usage

Running the Subreddit Scraper

Note -

Running the Steam Scraper

Running the Inshorts Scraper

File Structure

Demo

Contribution Guidelines

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages