Student Dropout Prediction

Student Dropout Prediction To create a README file based on the content from your "Data Science Week 1" and "Data Science Week 2" PDFs, here's a draft that outlines the projects, tasks, and deliverables for each week:

Data Science Projects - Weeks 1 & 2

Overview

This repository contains projects focused on data analytics, machine learning, and predictive modeling. The case study for these weeks is based on predicting student dropouts using machine learning models, with a structured approach to data collection, wrangling, exploration, and visualization.

Week 1: Data Collection and Wrangling

Objective

The goal is to prepare the data for predictive modeling by setting up the environment, collecting data, and performing initial data cleaning and transformation.

Tasks

Environment Setup
- Install Python, TensorFlow/PyTorch, and Streamlit.
- Set up version control with Git.
- Create a virtual environment for the project.
Data Import and Initial Exploration
- Download the dataset.
- Load the data into a pandas DataFrame.
- Explore the data (shape, types, summary statistics).
Data Cleaning and Validation
- Handle missing values.
- Identify and handle outliers.
- Correct data types and remove duplicates.
Data Transformation
- Normalize numerical features.
- Encode categorical variables.
- Create derived features.
Statistical Analysis
- Perform descriptive statistics.
- Conduct correlation analysis.
- Perform hypothesis testing (t-tests, chi-square tests).

Deliverables

Jupyter Notebook: Contains all data wrangling steps and code.
Cleaned Dataset: Exported as CSV.
Data Preprocessing Report (PDF): Details data cleaning steps and quality issues.
Statistical Analysis Report (PDF): Includes descriptive statistics, correlation heatmaps, and hypothesis test results.

Week 2: Data Exploration and Visualization

Objective

Explore the dataset through univariate, bivariate, and multivariate analysis, and develop interactive visualizations to derive insights.

Tasks

Univariate Analysis
- Create histograms and box plots for numerical variables.
- Create bar charts for categorical variables.
- Compute and visualize descriptive statistics.
Bivariate Analysis
- Generate scatter plots for numerical variable pairs.
- Create box plots grouped by categorical variables.
- Perform correlation analysis and chi-square tests.
Multivariate Analysis
- Create pair plots.
- Perform and visualize Principal Component Analysis (PCA).
- Generate parallel coordinate plots.
Advanced Visualization
- Build interactive visualizations using Plotly or Bokeh.
- Develop a dashboard using Streamlit.
Insight Generation
- Identify key patterns and relationships in the data.
- Formulate new hypotheses based on exploratory analysis.

Deliverables

Jupyter Notebook: Contains all exploratory data analysis code and visualizations.
Exploratory Data Analysis Report (PDF): Includes visualizations, detailed interpretations, and key findings.
Streamlit Dashboard: Interactive platform summarizing key insights.
Updated Dataset: Incorporates new features or transformations.

Instructions to Run the Code

Set Up the Environment:
- Ensure Python, TensorFlow/PyTorch, and Streamlit are installed.
- Create a virtual environment:
```
python -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate
```
- Install required packages:
```
pip install -r requirements.txt
```
Run Jupyter Notebook:
- Launch the notebook server:
```
jupyter notebook
```
Deploy Streamlit Dashboard:
- To view the interactive visualizations:
```
streamlit run dashboard.py
```

Datasets

The datasets required for these tasks can be downloaded from 3signet.

Hypotheses

Higher socio-economic status correlates with lower dropout rates.
Higher admission grades reduce the likelihood of dropping out.
Financial aid or scholarships lower dropout rates.

Ethical Considerations

Ensure that the model addresses biases in the data and its predictions, and that predictive analytics are used responsibly in educational contexts.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
Reports		Reports
data		data
models		models
notebooks		notebooks
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Student Dropout Prediction

Data Science Projects - Weeks 1 & 2

Overview

Week 1: Data Collection and Wrangling

Objective

Tasks

Deliverables

Week 2: Data Exploration and Visualization

Objective

Tasks

Deliverables

Instructions to Run the Code

Datasets

Hypotheses

Ethical Considerations

About

Releases

Packages

Languages

matidesalegn/Student-Dropout-Prediction

Folders and files

Latest commit

History

Repository files navigation

Student Dropout Prediction

Data Science Projects - Weeks 1 & 2

Overview

Week 1: Data Collection and Wrangling

Objective

Tasks

Deliverables

Week 2: Data Exploration and Visualization

Objective

Tasks

Deliverables

Instructions to Run the Code

Datasets

Hypotheses

Ethical Considerations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages