I'm a self-taught Data Scientist and AI Engineer based in Navi Mumbai, focused on building machine learning and AI systems that are reliable, explainable, and useful in real-world decision making.
My work spans predictive modeling, uncertainty quantification, retrieval-augmented generation (RAG), and memory-augmented AI systems.
- Probabilistic modeling β calibrated probabilities over hard classifications
- Uncertainty quantification β prediction intervals, confidence estimation
- Explainability β SHAP-based model transparency for regulated domains
- Time-series forecasting β demand forecasting with feature engineering
- Simulation β Monte Carlo methods for season-level uncertainty
- Applied AI systems β retrieval-augmented generation (RAG), semantic search, vector databases, and long-term memory architectures
- Retrieval-augmented generation (RAG) systems
- Long-term memory architectures
- Vector databases and semantic search
- AI evaluation and model routing
- Applied machine learning systems
End-to-end credit risk pipeline predicting loan default probability on the Home Credit dataset.
- Platt Scaling calibration reducing ECE from 0.346 β 0.001 β 99.7% improvement
- Risk bucketing (Low / Medium / High / Very High) aligned with lending policy
- SHAP explainability for individual applicant decisions and regulatory transparency
PythonScikit-learnXGBoostSHAP
Conversational AI assistant featuring persistent memory, multi-session chat management, and real-time web retrieval.
- Four-layer memory architecture combining session memory, semantic vector retrieval, structured fact memory, and conversation summarization
- Automatic user profiling with semantic deduplication and memory conflict resolution to maintain accurate long-term user profiles
- Running conversation summaries to reduce prompt growth and preserve long-term context
- Intelligent model routing using lightweight and large LLMs to balance latency, cost, and response quality
- Real-time web search integration using Tavily, LangChain agents, and tool-augmented reasoning
- Streamlit deployment with caching, session persistence, and automated chat organization
- Centralized debug logging and modular memory architecture
PythonLangChainPostgreSQLpgvectorGroqStreamlit
Hourly electricity demand forecasting on real EIA grid data (Texas, 2018β2023).
- Time-series feature engineering: lag features, rolling stats, cyclical encoding
- XGBoost achieving 2.40% MAPE β 48% improvement over seasonal naive baseline
- Weather integration via Open-Meteo API
PythonXGBoostScikit-learnPandas
Probabilistic match outcome modeling with explicit focus on draw modeling.
- Calibrated Home / Draw / Away probabilities using Platt Scaling
- Expected Points (xPts) league table from match-level probabilities
- 10,000 Monte Carlo season simulations for title, top-4, and relegation probabilities
PythonXGBoostMonte Carlo Simulation