Machine Learning Projects
Featured Projects
Insurance Premium Predictor
-
This project leverages machine learning regression models—Linear Regression, Lasso Regression, Ridge Regression, Random Forest, XGBoost, and LightGBM—to predict premium prices with higher accuracy.
-
Through hyperparameter tuning and cross-validation, we benchmarked their performance using the R² score, while feature importance analysis revealed the key drivers influencing outcomes.
-
These insights not only help in choosing the most reliable model but also enable stakeholders to make data-driven pricing decisions and strategic business improvements.
View Notebook
Streamlit App
Delivery Time Prediction
-
Built an end-to-end machine learning regression model to predict delivery time using 175k+ real-world orders, covering data cleaning, feature engineering, modeling, and evaluation.
-
Performed deep EDA and statistical validation (Kruskal-Wallis, Dunn post-hoc, Chi-square tests) to uncover time-of-day, market, and protocol-based delivery patterns.
-
Engineered impactful features from timestamps (delivery sessions, weekday/weekend) and handled outliers using the IQR method to improve model stability.
-
Trained and compared Linear Regression, Random Forest, XGBoost, LightGBM and Neural Networks, selecting LightGBM for its superior generalization.
-
Achieved production-ready performance with RMSE ≈ 1.27 minutes, MAE ≈ 1.14 minutes and R² ≈ 0.97.
View Notebook
Scaler Clustering
-
Built an end-to-end unsupervised machine learning pipeline to segment employees based on compensation patterns using KMeans++ clustering, enabling data-driven workforce and salary insights.
-
Performed extensive data cleaning, outlier handling, and feature engineering on large-scale salary data (~200K records), ensuring model stability and realistic cluster formation.
-
Improved clustering quality significantly by optimizing feature selection, achieving a Silhouette Score of 0.81 through the use of job position–level encoding instead of broad job categories.
-
Identified and labeled three actionable employee segments — Entry / Low CTC, Mid-level Professionals, and High Earners / Leaders — providing clear business interpretation of compensation structures.
-
Designed the solution with production readiness in mind, including scalable preprocessing, clustering inference logic, and a clear path for Flask-based deployment for real-world usage.
View Notebook
OLA Drivers Churn Prediction
-
Developed an end-to-end ML churn prediction pipeline including data cleaning, EDA, feature engineering,
SMOTE class balancing, and robust model evaluation.
-
Trained and compared 7 classification models (Logistic Regression, Random Forest, SVM, GBDT, XGBoost,
LightGBM) using cross-validation and hyperparameter tuning
(GridSearchCV & RandomizedSearchCV).
-
Selected Gradient Boosting (GBDT) as the final model, achieving
Recall = 0.935 and F1-score = 0.886, prioritizing churn detection accuracy.
-
Applied SHAP explainability to interpret model predictions and identify key churn drivers such as
low quarterly ratings, low business value, and driver grade.
-
Delivered actionable business recommendations, including performance-based incentives, rating system
improvements, and city-level retention strategies to reduce driver attrition.
View Notebook