Skip to content

Data-driven stroke risk assessment & personalized recommendations, powered by machine-learning and an NLU-driven chatbot.

License

Notifications You must be signed in to change notification settings

HlexNC/Project-Arepo

Repository files navigation

Project Apero Banner

Open in GitHub Codespaces

GitHub Badge Requires Python 3.9+ GPLv3 License Badge Docker Enabled

Project Apero is a comprehensive web application—originally developed as a university project—designed to help individuals assess their stroke risk and receive personalized recommendations. It leverages data analysis, machine learning, and a conversational chatbot interface to provide actionable health insights.


Table of Contents


Project Description

Project Apero provides an intuitive interface for users to input personal health metrics—such as age, gender, BMI, blood glucose levels, etc.—and receive an estimated probability of experiencing a stroke. It then offers evidence-based recommendations and insights to help reduce stroke risk.

The application is built using:

  • Streamlit for the web front end
  • Scikit-Learn for machine learning model training and prediction
  • Rasa for an integrated chatbot
  • Docker for containerization and ease of deployment

Why "Apero"?
The project name is derived from a playful acronym referencing a healthy approach to “Assessing Personal Event Risks Online,” aiming to make preventative health measures more approachable.


Screenshots

Data Analysis Page and Chatbot Interface


Key Features

  1. Personalized Stroke Risk Assessment

    • Input Health Metrics: Users provide personal data (e.g., age, gender, BMI, blood glucose levels, and more).
    • Risk Probability: The machine learning model calculates a stroke risk score (0.0 to 1.0).
    • Actionable Recommendations: Users receive suggestions tailored to their risk factors.
  2. Interactive Data Analysis & Visualization

    • Data Exploration: Filter and explore health-related metrics to reveal trends and correlations.
    • Visual Insights: Interactive charts, heatmaps, and summary statistics enhance comprehension of stroke risk factors.
  3. Conversational Chatbot Assistance

    • Real-Time Queries: A Rasa-powered chatbot answers questions about stroke risk, data insights, and wellness tips.
    • Guided Interaction: The chatbot can direct users to relevant pages, simplify data analysis steps, and deliver personalized feedback.
  4. Robust Data Handling and Augmentation

    • Outlier Detection: Statistical methods to detect and remove anomalies.
    • Synthetic Data Generation: Augments the dataset by ~30% to improve model performance and generalization.

Installation

Prerequisites

  • Docker
    Make sure Docker is installed and running on your machine.
  • Git
    Required to clone the repository.
  • Python 3.9+ (Optional)
    If you want to run or modify components without Docker.

Steps to Get Started

  1. Clone the Repository

    git clone https://github.com/HlexNC/Project-Arepo.git
    cd Project-Arepo
  2. Build and Run Docker Services

    docker-compose up --build
    • This command builds Docker images, trains machine learning models, and starts all necessary services.
    • The initial startup may take several minutes as the environment sets up.
  3. Access the Web Application

    • Open a web browser and go to: http://localhost:8501.
    • You will see the main page with navigation options for data analysis, personalized recommendations, and chatbot interaction.

Tip

If you want to run or modify components without Docker, refer to the comments in the docker-compose.yml or optional instructions in the docs/ folder (if provided).

Training the Rasa Chatbot (Optional)

If you make changes to the chatbot’s training data or if the chatbot fails to respond:

  1. Open a terminal inside the Rasa server container

    docker exec -it rasa_server bash
  2. Train the Rasa Model

    rasa train
  3. Restart Services

    exit
    docker-compose down
    docker-compose up --build -d

Data Overview

Dataset

We utilize the publicly available Stroke Prediction Dataset. This dataset includes key health metrics:

  • Age, BMI, Glucose Levels
  • Hypertension, Heart Disease
  • Smoking Status, etc.

Note

This dataset is used strictly for educational and demonstration purposes.

Data Handling and Augmentation

  • Outlier Detection: Employs a Z-score based method to filter anomalous data points.
  • Synthetic Data Augmentation: Adds ~30% realistic synthetic data to boost model robustness and representation.
  • Feature Scaling: Numerical features are normalized or standardized, aligning with best practices from Google’s ML Crash Course.

Usage

Personalized Stroke Risk Assessment

  1. Navigate to "Personalized Recommendations"
    From the sidebar, select Personalized Recommendations.

  2. Provide Personal Health Metrics
    Enter details such as age, gender, BMI, glucose levels, hypertension status, etc.

  3. Receive Risk Probability & Recommendations
    The system displays a stroke risk (0.0 to 1.0) and personalized tips (e.g., diet, exercise, medical follow-up).

Interactive Data Analysis

  1. Go to "Data Analysis"
    Choose Data Analysis from the sidebar.

  2. Explore and Visualize

    • Filter or query the dataset to inspect correlations and distributions.
    • View generated charts, tables, or heatmaps.
  3. Model Training & Evaluation
    Train or re-train the underlying machine learning models (Logistic Regression, Random Forest, SVM) within the app. Evaluate performance via metrics like Accuracy, Precision, Recall, etc.

Chatbot Assistance

  1. Select "Chatbot"
    Click on Chatbot in the sidebar to open the conversational assistant interface.

  2. Interact with the Rasa-Powered Chatbot

    • Ask questions about stroke risk or how to interpret certain metrics.
    • Get immediate recommendations and clarifications on data analysis results.

Project Structure

Below is a simplified overview of the repository layout:

Project-Arepo/
├── actions/
│   ├── actions.py              # Custom Rasa actions
│   ├── Dockerfile
│   └── requirements-actions.txt
├── data/
│   ├── data_analysis.py
│   ├── data_augmentation.py
│   ├── data_loader.py
│   ├── data_preprocessor.py
│   ├── raw/
│   │   └── healthcare-dataset-stroke-data.csv
│   └── processed/
├── src/
│   ├── app.py                  # Streamlit main entry point
│   ├── chatbot/
│   │   └── rasa_chatbot.py
│   └── web/
│       ├── home.py
│       ├── recommendations.py
│       ├── chatbot_page.py
│       └── data_analysis_page.py
├── models/
├── docs/
│   ├── img/
│   │   └── ASP_Banner.png
│   └── ...
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── LICENSE
└── README.md

Warning

Project Apero is not a substitute for professional medical advice, diagnosis, or treatment.
The stroke risk assessments and recommendations are for educational and research purposes only. Always seek the advice of qualified healthcare professionals for any medical concerns.
Use of this application is entirely at your own risk, and the developers assume no liability for any actions taken based on its output.


Repository Visualization

Below is an automatically generated repository structure diagram. The diagram is updated whenever changes are pushed to the main branch:

Repository Visualization


License

This project is licensed under the GNU General Public License v3.
See the LICENSE file for details.


Contact

For inquiries, feedback, or suggestions:

We welcome contributions! Feel free to open a pull request or start a discussion in our GitHub repository.

About

Data-driven stroke risk assessment & personalized recommendations, powered by machine-learning and an NLU-driven chatbot.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •