NSF Next Launch Scraper is a Python module designed to scrape and export the latest upcoming space launch data from Next Spaceflight. It allows you to customize the environment (local, AWS, or GCP) and configure storage locations as needed, making it flexible for various use cases.
- Scrapes detailed information about the next space launch:
- Date, Organization, Rocket, Mission Details, and more.
- Supports local export (JSON file) and AWS S3, GCP Cloud Storage integration.
- Easily configurable via environment variables (
ENV
).
Clone the repository and install the package:
git clone git+https://github.com/Tanguy9862/Next-Launch-Scraper.git
pip install -r requirements.txt
Create a .env
file in the directory where you’ll run the scraper. Specify the environment:
ENV=local
(default): Export to a local JSON file.ENV=aws
: Export to an S3 bucket (requires proper IAM permissions).
Example .env
:
ENV=local
Import and call the main function:
from next_launch_scraper.scraper import scrape_next_launch_data
scrape_next_launch_data()
- Local Mode: Exports data to a
data/
folder in the current directory. - AWS/GCP Mode: Uploads the data to your specified S3/Cloud Storage bucket (requires IAM setup).
This scraper can be seamlessly integrated into pipelines. See Space-App for a practical example:
- A Lambda function calls this scraper to update data in an S3 bucket.
- The Space-App consumes the data for visualization.
If using ENV=aws
, ensure:
- Your AWS credentials are configured in your environment or via
.aws/credentials
. - The Lambda function or local user has appropriate permissions:
s3:PutObject
s3:GetObject