EDA - Python Projects
Featured Projects
Insurance Premium Pricing Analysis
-
This project began with an in-depth Exploratory Data Analysis (EDA) to uncover trends, patterns, and relationships within the insurance premium dataset.
-
Examined distributions, correlations, and outliers to better understand the factors driving premium variations.
-
Building on these insights, we applied machine learning models like Random Forest, XGBoost, and LightGBM to test predictive performance.
-
The combination of EDA-driven understanding and model evaluation enabled both accurate predictions and valuable business insights for pricing strategies.
View Notebook
ECommerce Data Analysis
-
Analyzed 1M transaction records using SQL & Python (Pandas, NumPy, Matplotlib, Seaborn).
-
Conducted RFM analysis for customer segmentation (loyal, at-risk, lost customers), enabling targeted marketing.
-
Implemented statistical tests (Chi-Square, Kruskal-Wallis, Shapiro-Wilk) to validate sales trends, payment method preferences, and customer purchasing behavior.
-
Provided data-driven recommendations to increase revenue, optimize store performance & enhance product offerings.
View Notebook
NYC Buildings Energy Consumption Survey
-
Conducted an in-depth analysis of energy consumption data for 13,223 buildings in New York City using Python.
-
Identified patterns in energy consumption across building types and geographies, providing recommendations to improve energy efficiency.
-
Created compelling visualizations using Matplotlib and Seaborn to communicate trends and findings effectively to a non-technical audience.
-
Highlighted inefficiencies in energy use patterns, proposing optimization strategies for cost reduction and improved sustainability.
View Notebook
Walmart – Confidence Interval & Central Limit Theorem Analysis
-
Data Analysis & Visualization: Conducted exploratory data analysis (EDA) on Walmart’s customer spending patterns using Python, Pandas, Matplotlib, and Seaborn.
-
Statistical Modeling & Hypothesis Testing: Applied T-tests, Chi-Square tests, and Kruskal-Wallis tests to compare customer segments and validate statistical significance.
-
Confidence Interval & Central Limit Theorem: Estimated 90%, 95%, and 99% confidence intervals to analyze spending behavior and leveraged CLT-based simulations for predictive insights.
-
Business Insights & Decision Making: Identified key gender- and age-based spending trends, providing actionable recommendations for targeted promotions, inventory optimization, and pricing strategies.
View Notebook