Skip to content

matidesalegn/Student-Dropout-Prediction

Repository files navigation

Student Dropout Prediction

Student Dropout Prediction To create a README file based on the content from your "Data Science Week 1" and "Data Science Week 2" PDFs, here's a draft that outlines the projects, tasks, and deliverables for each week:


Data Science Projects - Weeks 1 & 2

Overview

This repository contains projects focused on data analytics, machine learning, and predictive modeling. The case study for these weeks is based on predicting student dropouts using machine learning models, with a structured approach to data collection, wrangling, exploration, and visualization.


Week 1: Data Collection and Wrangling

Objective

The goal is to prepare the data for predictive modeling by setting up the environment, collecting data, and performing initial data cleaning and transformation.

Tasks

  1. Environment Setup

    • Install Python, TensorFlow/PyTorch, and Streamlit.
    • Set up version control with Git.
    • Create a virtual environment for the project.
  2. Data Import and Initial Exploration

    • Download the dataset.
    • Load the data into a pandas DataFrame.
    • Explore the data (shape, types, summary statistics).
  3. Data Cleaning and Validation

    • Handle missing values.
    • Identify and handle outliers.
    • Correct data types and remove duplicates.
  4. Data Transformation

    • Normalize numerical features.
    • Encode categorical variables.
    • Create derived features.
  5. Statistical Analysis

    • Perform descriptive statistics.
    • Conduct correlation analysis.
    • Perform hypothesis testing (t-tests, chi-square tests).

Deliverables

  1. Jupyter Notebook: Contains all data wrangling steps and code.
  2. Cleaned Dataset: Exported as CSV.
  3. Data Preprocessing Report (PDF): Details data cleaning steps and quality issues.
  4. Statistical Analysis Report (PDF): Includes descriptive statistics, correlation heatmaps, and hypothesis test results.

Week 2: Data Exploration and Visualization

Objective

Explore the dataset through univariate, bivariate, and multivariate analysis, and develop interactive visualizations to derive insights.

Tasks

  1. Univariate Analysis

    • Create histograms and box plots for numerical variables.
    • Create bar charts for categorical variables.
    • Compute and visualize descriptive statistics.
  2. Bivariate Analysis

    • Generate scatter plots for numerical variable pairs.
    • Create box plots grouped by categorical variables.
    • Perform correlation analysis and chi-square tests.
  3. Multivariate Analysis

    • Create pair plots.
    • Perform and visualize Principal Component Analysis (PCA).
    • Generate parallel coordinate plots.
  4. Advanced Visualization

    • Build interactive visualizations using Plotly or Bokeh.
    • Develop a dashboard using Streamlit.
  5. Insight Generation

    • Identify key patterns and relationships in the data.
    • Formulate new hypotheses based on exploratory analysis.

Deliverables

  1. Jupyter Notebook: Contains all exploratory data analysis code and visualizations.
  2. Exploratory Data Analysis Report (PDF): Includes visualizations, detailed interpretations, and key findings.
  3. Streamlit Dashboard: Interactive platform summarizing key insights.
  4. Updated Dataset: Incorporates new features or transformations.

Instructions to Run the Code

  1. Set Up the Environment:

    • Ensure Python, TensorFlow/PyTorch, and Streamlit are installed.
    • Create a virtual environment:
      python -m venv env
      source env/bin/activate  # On Windows: env\Scripts\activate
    • Install required packages:
      pip install -r requirements.txt
  2. Run Jupyter Notebook:

    • Launch the notebook server:
      jupyter notebook
  3. Deploy Streamlit Dashboard:

    • To view the interactive visualizations:
      streamlit run dashboard.py

Datasets

The datasets required for these tasks can be downloaded from 3signet.


Hypotheses

  • Higher socio-economic status correlates with lower dropout rates.
  • Higher admission grades reduce the likelihood of dropping out.
  • Financial aid or scholarships lower dropout rates.

Ethical Considerations

Ensure that the model addresses biases in the data and its predictions, and that predictive analytics are used responsibly in educational contexts.


About

Student Dropout Prediction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published