GitHub - HlexNC/Project-Arepo: Data-driven stroke risk assessment & personalized recommendations, powered by machine-learning and an NLU-driven chatbot.

Project Apero is a comprehensive web application—originally developed as a university project—designed to help individuals assess their stroke risk and receive personalized recommendations. It leverages data analysis, machine learning, and a conversational chatbot interface to provide actionable health insights.

Project Description

Project Apero provides an intuitive interface for users to input personal health metrics—such as age, gender, BMI, blood glucose levels, etc.—and receive an estimated probability of experiencing a stroke. It then offers evidence-based recommendations and insights to help reduce stroke risk.

The application is built using:

Streamlit for the web front end
Scikit-Learn for machine learning model training and prediction
Rasa for an integrated chatbot
Docker for containerization and ease of deployment

Why "Apero"?
The project name is derived from a playful acronym referencing a healthy approach to “Assessing Personal Event Risks Online,” aiming to make preventative health measures more approachable.

Screenshots

Key Features

Personalized Stroke Risk Assessment
- Input Health Metrics: Users provide personal data (e.g., age, gender, BMI, blood glucose levels, and more).
- Risk Probability: The machine learning model calculates a stroke risk score (0.0 to 1.0).
- Actionable Recommendations: Users receive suggestions tailored to their risk factors.
Interactive Data Analysis & Visualization
- Data Exploration: Filter and explore health-related metrics to reveal trends and correlations.
- Visual Insights: Interactive charts, heatmaps, and summary statistics enhance comprehension of stroke risk factors.
Conversational Chatbot Assistance
- Real-Time Queries: A Rasa-powered chatbot answers questions about stroke risk, data insights, and wellness tips.
- Guided Interaction: The chatbot can direct users to relevant pages, simplify data analysis steps, and deliver personalized feedback.
Robust Data Handling and Augmentation
- Outlier Detection: Statistical methods to detect and remove anomalies.
- Synthetic Data Generation: Augments the dataset by ~30% to improve model performance and generalization.

Installation

Prerequisites

Docker
Make sure Docker is installed and running on your machine.
Git
Required to clone the repository.
Python 3.9+ (Optional)
If you want to run or modify components without Docker.

Steps to Get Started

Clone the Repository

git clone https://github.com/HlexNC/Project-Arepo.git
cd Project-Arepo

Build and Run Docker Services
```
docker-compose up --build
```
- This command builds Docker images, trains machine learning models, and starts all necessary services.
- The initial startup may take several minutes as the environment sets up.
Access the Web Application
- Open a web browser and go to: http://localhost:8501.
- You will see the main page with navigation options for data analysis, personalized recommendations, and chatbot interaction.

Tip

If you want to run or modify components without Docker, refer to the comments in the docker-compose.yml or optional instructions in the docs/ folder (if provided).

Training the Rasa Chatbot (Optional)

If you make changes to the chatbot’s training data or if the chatbot fails to respond:

Open a terminal inside the Rasa server container
```
docker exec -it rasa_server bash
```
Train the Rasa Model
```
rasa train
```

Restart Services

exit
docker-compose down
docker-compose up --build -d

Data Overview

Dataset

We utilize the publicly available Stroke Prediction Dataset. This dataset includes key health metrics:

Age, BMI, Glucose Levels
Hypertension, Heart Disease
Smoking Status, etc.

Note

This dataset is used strictly for educational and demonstration purposes.

Data Handling and Augmentation

Outlier Detection: Employs a Z-score based method to filter anomalous data points.
Synthetic Data Augmentation: Adds ~30% realistic synthetic data to boost model robustness and representation.
Feature Scaling: Numerical features are normalized or standardized, aligning with best practices from Google’s ML Crash Course.

Usage

Personalized Stroke Risk Assessment

Navigate to "Personalized Recommendations"
From the sidebar, select Personalized Recommendations.
Provide Personal Health Metrics
Enter details such as age, gender, BMI, glucose levels, hypertension status, etc.
Receive Risk Probability & Recommendations
The system displays a stroke risk (0.0 to 1.0) and personalized tips (e.g., diet, exercise, medical follow-up).

Interactive Data Analysis

Go to "Data Analysis"
Choose Data Analysis from the sidebar.
Explore and Visualize
- Filter or query the dataset to inspect correlations and distributions.
- View generated charts, tables, or heatmaps.
Model Training & Evaluation
Train or re-train the underlying machine learning models (Logistic Regression, Random Forest, SVM) within the app. Evaluate performance via metrics like Accuracy, Precision, Recall, etc.

Chatbot Assistance

Select "Chatbot"
Click on Chatbot in the sidebar to open the conversational assistant interface.
Interact with the Rasa-Powered Chatbot
- Ask questions about stroke risk or how to interpret certain metrics.
- Get immediate recommendations and clarifications on data analysis results.

Project Structure

Below is a simplified overview of the repository layout:

Project-Arepo/
├── actions/
│   ├── actions.py              # Custom Rasa actions
│   ├── Dockerfile
│   └── requirements-actions.txt
├── data/
│   ├── data_analysis.py
│   ├── data_augmentation.py
│   ├── data_loader.py
│   ├── data_preprocessor.py
│   ├── raw/
│   │   └── healthcare-dataset-stroke-data.csv
│   └── processed/
├── src/
│   ├── app.py                  # Streamlit main entry point
│   ├── chatbot/
│   │   └── rasa_chatbot.py
│   └── web/
│       ├── home.py
│       ├── recommendations.py
│       ├── chatbot_page.py
│       └── data_analysis_page.py
├── models/
├── docs/
│   ├── img/
│   │   └── ASP_Banner.png
│   └── ...
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── LICENSE
└── README.md

Warning

Project Apero is not a substitute for professional medical advice, diagnosis, or treatment.
The stroke risk assessments and recommendations are for educational and research purposes only. Always seek the advice of qualified healthcare professionals for any medical concerns.
Use of this application is entirely at your own risk, and the developers assume no liability for any actions taken based on its output.

Repository Visualization

Below is an automatically generated repository structure diagram. The diagram is updated whenever changes are pushed to the main branch:

License

This project is licensed under the GNU General Public License v3.
See the LICENSE file for details.

Contact

For inquiries, feedback, or suggestions:

GitHub Issues: Project Apero Repository
Email: Please refer to the repository maintainers’ profiles.

We welcome contributions! Feel free to open a pull request or start a discussion in our GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
actions		actions
data		data
docs		docs
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.ignore		.ignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
credentials.yml		credentials.yml
dir_to_json.py		dir_to_json.py
docker-compose.yml		docker-compose.yml
domain.yml		domain.yml
endpoints.yml		endpoints.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Project Description

Screenshots

Key Features

Installation

Prerequisites

Steps to Get Started

Training the Rasa Chatbot (Optional)

Data Overview

Dataset

Data Handling and Augmentation

Usage

Personalized Stroke Risk Assessment

Interactive Data Analysis

Chatbot Assistance

Project Structure

Repository Visualization

License

Contact

About

Contributors 3

Languages

License

HlexNC/Project-Arepo

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Project Description

Screenshots

Key Features

Installation

Prerequisites

Steps to Get Started

Training the Rasa Chatbot (Optional)

Data Overview

Dataset

Data Handling and Augmentation

Usage

Personalized Stroke Risk Assessment

Interactive Data Analysis

Chatbot Assistance

Project Structure

Repository Visualization

License

Contact

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 3

Languages