Dhayalan's ML Projects

Featured Projects

This project leverages machine learning regression models—Linear Regression, Lasso Regression, Ridge Regression, Random Forest, XGBoost, and LightGBM—to predict premium prices with higher accuracy.
Through hyperparameter tuning and cross-validation, we benchmarked their performance using the R² score, while feature importance analysis revealed the key drivers influencing outcomes.
These insights not only help in choosing the most reliable model but also enable stakeholders to make data-driven pricing decisions and strategic business improvements.

View Notebook Streamlit App

Built an end-to-end machine learning regression model to predict delivery time using 175k+ real-world orders, covering data cleaning, feature engineering, modeling, and evaluation.
Performed deep EDA and statistical validation (Kruskal-Wallis, Dunn post-hoc, Chi-square tests) to uncover time-of-day, market, and protocol-based delivery patterns.
Engineered impactful features from timestamps (delivery sessions, weekday/weekend) and handled outliers using the IQR method to improve model stability.
Trained and compared Linear Regression, Random Forest, XGBoost, LightGBM and Neural Networks, selecting LightGBM for its superior generalization.
Achieved production-ready performance with RMSE ≈ 1.27 minutes, MAE ≈ 1.14 minutes and R² ≈ 0.97.

Built an end-to-end unsupervised machine learning pipeline to segment employees based on compensation patterns using KMeans++ clustering, enabling data-driven workforce and salary insights.
Performed extensive data cleaning, outlier handling, and feature engineering on large-scale salary data (~200K records), ensuring model stability and realistic cluster formation.
Improved clustering quality significantly by optimizing feature selection, achieving a Silhouette Score of 0.81 through the use of job position–level encoding instead of broad job categories.
Identified and labeled three actionable employee segments — Entry / Low CTC, Mid-level Professionals, and High Earners / Leaders — providing clear business interpretation of compensation structures.
Designed the solution with production readiness in mind, including scalable preprocessing, clustering inference logic, and a clear path for Flask-based deployment for real-world usage.

Developed an end-to-end ML churn prediction pipeline including data cleaning, EDA, feature engineering, SMOTE class balancing, and robust model evaluation.
Trained and compared 7 classification models (Logistic Regression, Random Forest, SVM, GBDT, XGBoost, LightGBM) using cross-validation and hyperparameter tuning (GridSearchCV & RandomizedSearchCV).
Selected Gradient Boosting (GBDT) as the final model, achieving Recall = 0.935 and F1-score = 0.886, prioritizing churn detection accuracy.
Applied SHAP explainability to interpret model predictions and identify key churn drivers such as low quarterly ratings, low business value, and driver grade.
Delivered actionable business recommendations, including performance-based incentives, rating system improvements, and city-level retention strategies to reduce driver attrition.

View Notebook