A machine learning project for predicting early diabetes risk using the PIMA Indians dataset. Includes data preprocessing, model training (Random Forest), SHAP-based explainability, and a Streamlit we
Early Diabetes Risk Prediction System
Developed a machine learning application to predict early diabetes risk using the PIMA Indians Diabetes Dataset. The project involved data preprocessing, feature scaling, and handling class imbalance to ensure accurate predictions. Implemented a Random Forest classifier for risk prediction and integrated SHAP (SHapley Additive Explanations) to provide model interpretability and highlight key risk factors influencing predictions. Deployed the solution as an interactive Streamlit web app, enabling users to input health parameters and receive real-time risk assessment with explainable AI insights.
Tech Stack: Python, Pandas, Scikit-learn, Random Forest, SHAP, Streamlit
Increased model accuracy from 60% to 78%.