## GROUP NO : 7
------------------------------------------------
| No. | Name | Registration Number |
|-----|-------------------|---------------------|
| 1 | NDUWAYEZU Placide | 223027936 |
| 2 | UWASE Aline | 218009283 |
| 3 | MUREMYI Samuel | 223026694 |
-------------------------------------------------
In an era of increasing environmental concerns, real-time air quality monitoring has become crucial for public health and environmental policy. The Air Quality Monitor project aims to create a robust, scalable solution for tracking and analyzing air quality data.
- Real-time Data Collection: Develop a system to continuously fetch air quality data from official sources.
- Data Processing Pipeline: Create an efficient mechanism to transform raw API data into meaningful insights.
- Distributed Storage: Implement a scalable storage solution using HDFS and cloud databases.
- Data Visualization: Build an interactive dashboard for accessible environmental insights.
- Data Ingestion: Retrieve data from ACT Government Air Quality API
- Stream Processing: Apache Kafka for real-time data streaming
- Data Storage:
- Distributed Storage: Apache Hadoop HDFS
- Persistent Storage: AWS MySQL RDS
- Web Framework: Django
- Data Processing: Pandas, PyArrow
[API Source] → [Kafka Stream] → [Data Processing] → [HDFS Storage] → [MySQL RDS] → [Django Dashboard]
- Dataset Availability
- Challenge: Lack of Accessible Air Quality Datasets for Rwanda
- Initial project goal was to develop an air quality monitoring system for Rwanda
- Significant obstacles encountered in obtaining comprehensive, reliable air quality data
- Limited public APIs and open data sources for environmental monitoring in Rwanda
- Solution:
- Utilized ACT's robust air quality monitoring system as a proof-of-concept model
- Local Infrastructure set-up Complexity: Kafka and Hadoop Setup
- Challenge: Overcoming Windows-Specific Installation Barriers
- Significant complexity in natively installing Kafka and Hadoop on Windows
- Multiple compatibility and configuration issues with distributed systems
- Solution:
------------------------------------------------
| No. | Name | Registration Number |
|-----|-------------------|---------------------|
| 1 | NDUWAYEZU Placide | 223027936 |
| 2 | UWASE Aline | 218009283 |
| 3 | MUREMYI Samuel | 223026694 |
-------------------------------------------------
The project retrieves real-time air quality data from the ACT (Australian Capital Territory) Ambient Air Quality Monitoring API: https://www.data.act.gov.au/resource/94a5-zqnn.json
- Real-time data retrieval from official air quality API
- Data processing pipeline
- HDFS storage using Docker
- AWS MySQL RDS data persistence
- Interactive dashboard for air quality visualization
-
Backend: Django
-
Data Processing:
- Pandas
- PyArrow
-
Data Storage:
- Apache Hadoop (HDFS)
- AWS MySQL RDS
-
Message Streaming: Apache Kafka
-
Additional Libraries:
- Requests
- Confluent Kafka
- Pytz
- Python 3.8+
- Docker (optional, for HDFS)
- AWS RDS MySQL instance
- Apache Kafka
- Apache Hadoop
- Clone the repository:
git clone https://github.com/your-username/air_quality_monitor.git
cd air_quality_monitor
- Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Configure Database:
- Set up your AWS MySQL RDS credentials in settings.py
- Configure Kafka and Hadoop connection details
- Run Database Migrations:
python manage.py makemigrations
python manage.py migrate
- Create Superuser (Optional):
python manage.py createsuperuser
python manage.py run_air_quality_pipeline
python manage.py runserver 9006
To copy processed data to HDFS:
docker cp <local_parquet_file> <hadoop_container>:/path/in/hdfs
Ensure the following configurations are set:
- API Endpoint
- Kafka Bootstrap Servers
- HDFS Connection
- AWS RDS Credentials
- Fetch data from ACT Air Quality API
- Process raw data
- Generate unique Parquet filename
- Convert to PyArrow table
- Save to HDFS
- Persist in AWS MySQL RDS
- Visualize in Dashboard
Create a .env file for sensitive data (like Credential):
API_ENDPOINT=https://www.data.act.gov.au/resource/94a5-zqnn.json
KAFKA_BOOTSTRAP_SERVERS=your-kafka-servers
HDFS_HOST=your-hdfs-host
AWS_RDS_HOST=your-rds-endpoint
AWS_RDS_USER=your-username
AWS_RDS_PASSWORD=your-password
- Ensure all services (Kafka, Hadoop, RDS) are running
- Check network connectivity
- Verify API access
- Review logs for detailed error information
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
- MIT
- ACT Government for Open Data
- Open Source Community
- Placide - ndiplacide7@gailcom
- Project Link: https://github.com/ndiplacide7/air_quality_monitor