Skip to content

BotScraper is a web crawling, used to crawl Brazilian contests websites

License

Notifications You must be signed in to change notification settings

jefersonjlima/botscraper

Repository files navigation

Contest BotScraper

Pylint Ubuntu GitHub

Overview

The Contest BotScraper is a web crawling, used to crawl Brazilian contests websites and extract structured data from their pages.

Check the BotScraper homepage at BotScraper for more information, including a list of features.

Requirements

  • Python 3.7+
  • Tested on Linux (Ubuntu).

Install

The quick way:

$ git clone https://github.com/jefersonjlima/botscraper.git
$ cd botscraper
$ echo 'export TELEGRAM_TOKEN="YOUR_ACCESS_TOKEN"' >> ~/.bashrc
$ make prepare-dev
$ make run

Configuration

First of all, if you want to use the botscraper with telegram, you will need an Access Token. The lazy way to generate a Telegram Token is you have to talk to @BotFather and follow a few simple steps to generate it.

Edit the config/configs.cfg to change the configurations. For example, if you want to add or remove some keyword you need to change GERAL.keywords (you can do it by the telegram commands too). If you are looking for an internship, just change the url_base to https://www.pciconcursos.com.br/estagios/.

In TELEGRAM.etl_schedule you will define the UTC time to start the ETL.

Commands

  • /helpto see all commands.
  • /set <keyword> to add new keyword.
  • /unset <keyword> to remove a keyword.
  • /notify to enable notification.
  • /non_notify to disable notification.
  • /keywords to show all keywords.
  • /show <keyword> to list contests by keyword.
  • /show to show all filtered contests.

Documentation

Documentation is available online at BotScraper and in the docs directory.

Contributing

About

BotScraper is a web crawling, used to crawl Brazilian contests websites

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published