Home

EDA - Python Projects

Featured Projects

Insurance Premium Pricing Analysis

  • This project began with an in-depth Exploratory Data Analysis (EDA) to uncover trends, patterns, and relationships within the insurance premium dataset.
  • Examined distributions, correlations, and outliers to better understand the factors driving premium variations.
  • Building on these insights, we applied machine learning models like Random Forest, XGBoost, and LightGBM to test predictive performance.
  • The combination of EDA-driven understanding and model evaluation enabled both accurate predictions and valuable business insights for pricing strategies.

View Notebook

ECommerce Data Analysis

  • Analyzed 1M transaction records using SQL & Python (Pandas, NumPy, Matplotlib, Seaborn).
  • Conducted RFM analysis for customer segmentation (loyal, at-risk, lost customers), enabling targeted marketing.
  • Implemented statistical tests (Chi-Square, Kruskal-Wallis, Shapiro-Wilk) to validate sales trends, payment method preferences, and customer purchasing behavior.
  • Provided data-driven recommendations to increase revenue, optimize store performance & enhance product offerings.

View Notebook

NYC Buildings Energy Consumption Survey

  • Conducted an in-depth analysis of energy consumption data for 13,223 buildings in New York City using Python.
  • Identified patterns in energy consumption across building types and geographies, providing recommendations to improve energy efficiency.
  • Created compelling visualizations using Matplotlib and Seaborn to communicate trends and findings effectively to a non-technical audience.
  • Highlighted inefficiencies in energy use patterns, proposing optimization strategies for cost reduction and improved sustainability.

View Notebook

Walmart – Confidence Interval & Central Limit Theorem Analysis

  • Data Analysis & Visualization: Conducted exploratory data analysis (EDA) on Walmart’s customer spending patterns using Python, Pandas, Matplotlib, and Seaborn.
  • Statistical Modeling & Hypothesis Testing: Applied T-tests, Chi-Square tests, and Kruskal-Wallis tests to compare customer segments and validate statistical significance.
  • Confidence Interval & Central Limit Theorem: Estimated 90%, 95%, and 99% confidence intervals to analyze spending behavior and leveraged CLT-based simulations for predictive insights.
  • Business Insights & Decision Making: Identified key gender- and age-based spending trends, providing actionable recommendations for targeted promotions, inventory optimization, and pricing strategies.

View Notebook