This project contains a Python-based web scraper that extracts data about past conferences from Midwifery Today. The extracted data includes the place, title, and time of each conference, which is then saved to a CSV file.
- Uses Selenium to navigate to the "Past Conferences" page.
- Extracts conference data using BeautifulSoup.
- Filters out irrelevant data and entries.
- Saves the extracted data to a CSV file.
It's recommended to set up a virtual environment within the cloned project folder. This ensures that dependencies required by this project do not interfere with packages globally installed on your system.
You can set up a virtual environment using the following steps:
- First, clone the repository to your local machine:
git clone [repository-link]
- Navigate to the cloned project directory:
cd webscraper-midwife
- Make sure you have Python's venv module installed. If not, you can install it using:
pip install virtualenv
- Within the project directory, create the virtual environment:
python -m venv .
- Activate the virtual environment:
.\venv\Scripts\activate
source venv/bin/activate
- Once activated, you'll see (venv) in the terminal prompt. This indicates that the virtual environment is active. Now, you can install the project dependencies:
pip install -r requirements.txt
When you're done working on the project, you can deactivate the virtual environment by simply typing:
deactivate
- Python 3
- Selenium
- BeautifulSoup
- Requests
You can install the required packages using pip
as mentioned in the step above:
pip install -r requirements.txt
python main.py
After execution, the extracted data will be saved as conferences.csv in the data directory.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.