// project

Diabetes Risk Prediction System with Explainable AI and MLOps Pipeline

Diabetes risk prediction system using an optimized MLP, enhanced with Explainable AI (SHAP, LIME), and deployed through a sustainable MLOps pipeline with monitoring and drift detection.

PythonDeep LearningMLOpsExplainable AIFastAPIOptuna

Overview

This project delivers a production-ready machine learning system for predicting diabetes risk from health indicator data. It addresses the challenge of early and accurate diabetes detection by combining advanced neural network modeling, explainable AI, and robust MLOps practices.

Problem Statement

Early detection of diabetes is critical for patient outcomes, but clinical settings require models that are both highly accurate and interpretable. Many existing solutions lack transparency or are not production-ready for real-world deployment.

Solution & Features

Binary classification model for diabetes risk prediction
Optimized Multi-Layer Perceptron (MLP) using Bayesian tuning (Optuna)
Explainable AI integration using SHAP (global) and LIME (local)
Threshold optimization for recall-focused predictions
Full MLOps pipeline with API deployment and monitoring
Data drift detection and automated retraining strategy
Carbon footprint tracking using Green AI principles

Tech Stack

Python
PyTorch
Scikit-learn
Optuna
FastAPI
SQLite
SHAP / LIME
CodeCarbon

Model Architecture

Funnel-based MLP architecture: 128 → 64 → 32 neurons
ReLU activation for hidden layers, Sigmoid for output
Dropout (0.3) for regularization
Adam optimizer with tuned learning rate
Binary Cross-Entropy loss

Key Capabilities

Predicts diabetes risk with high sensitivity (Recall-focused)
Provides both global and patient-specific explanations
Detects and adapts to real-world data drift
Logs predictions for continuous monitoring and improvement
Balances model performance with environmental sustainability

Development Highlights

Used Bayesian Optimization (Optuna) for efficient hyperparameter tuning
Applied threshold tuning (0.33) to reduce false negatives
Achieved high recall (0.89), prioritizing early detection
Compared deep learning with traditional models (e.g., XGBoost)
Implemented Green AI practices to minimize carbon footprint

Explainability & Insights

SHAP: Identified key global drivers (BMI, Age, Blood Pressure, General Health)
LIME: Provided local explanations for individual predictions
Revealed bias toward “classic” metabolic diabetes profiles
Enabled transparent, clinically interpretable decision-making

System Architecture (MLOps)

API Layer: FastAPI for serving predictions
Inference Engine: Loads trained model, scaler, and XAI artifacts
Storage: SQLite database for logging predictions
Monitoring Loop: Detects data drift (covariate, prior, concept) and triggers retraining

Impact

Supports early detection of diabetes risk
Enhances trust through interpretable AI decisions
Demonstrates real-world ML deployment beyond experimentation
Balances accuracy, interpretability, and sustainability

Future Improvements

Incorporate additional clinical features (e.g., glucose levels, family history)
Improve detection of atypical diabetes cases
Explore alternative deep learning architectures
Expand dataset for better generalization
Enhance monitoring with real-time dashboards