Skip to content
View pratiksingh1296's full-sized avatar

Block or report pratiksingh1296

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pratiksingh1296/README.md

Hi, I'm Pratik πŸ‘‹

I'm a self-taught Data Scientist and AI Engineer based in Navi Mumbai, focused on building machine learning and AI systems that are reliable, explainable, and useful in real-world decision making.

My work spans predictive modeling, uncertainty quantification, retrieval-augmented generation (RAG), and memory-augmented AI systems.


πŸ” What I Work On

  • Probabilistic modeling β€” calibrated probabilities over hard classifications
  • Uncertainty quantification β€” prediction intervals, confidence estimation
  • Explainability β€” SHAP-based model transparency for regulated domains
  • Time-series forecasting β€” demand forecasting with feature engineering
  • Simulation β€” Monte Carlo methods for season-level uncertainty
  • Applied AI systems β€” retrieval-augmented generation (RAG), semantic search, vector databases, and long-term memory architectures

Currently Exploring

  • Retrieval-augmented generation (RAG) systems
  • Long-term memory architectures
  • Vector databases and semantic search
  • AI evaluation and model routing
  • Applied machine learning systems

πŸ“‚ Featured Projects

End-to-end credit risk pipeline predicting loan default probability on the Home Credit dataset.

  • Platt Scaling calibration reducing ECE from 0.346 β†’ 0.001 β€” 99.7% improvement
  • Risk bucketing (Low / Medium / High / Very High) aligned with lending policy
  • SHAP explainability for individual applicant decisions and regulatory transparency
  • Python Scikit-learn XGBoost SHAP

Conversational AI assistant featuring persistent memory, multi-session chat management, and real-time web retrieval.

  • Four-layer memory architecture combining session memory, semantic vector retrieval, structured fact memory, and conversation summarization
  • Automatic user profiling with semantic deduplication and memory conflict resolution to maintain accurate long-term user profiles
  • Running conversation summaries to reduce prompt growth and preserve long-term context
  • Intelligent model routing using lightweight and large LLMs to balance latency, cost, and response quality
  • Real-time web search integration using Tavily, LangChain agents, and tool-augmented reasoning
  • Streamlit deployment with caching, session persistence, and automated chat organization
  • Centralized debug logging and modular memory architecture
  • Python LangChain PostgreSQL pgvector Groq Streamlit

Hourly electricity demand forecasting on real EIA grid data (Texas, 2018–2023).

  • Time-series feature engineering: lag features, rolling stats, cyclical encoding
  • XGBoost achieving 2.40% MAPE β€” 48% improvement over seasonal naive baseline
  • Weather integration via Open-Meteo API
  • Python XGBoost Scikit-learn Pandas

Probabilistic match outcome modeling with explicit focus on draw modeling.

  • Calibrated Home / Draw / Away probabilities using Platt Scaling
  • Expected Points (xPts) league table from match-level probabilities
  • 10,000 Monte Carlo season simulations for title, top-4, and relegation probabilities
  • Python XGBoost Monte Carlo Simulation

πŸ› οΈ Tech Stack

Languages & Core

Python SQL

Machine Learning

Scikit-learn XGBoost Pandas NumPy Matplotlib SHAP

AI & LLM Engineering

LangChain Groq Llama SentenceTransformers pgvector Tavily

Databases

PostgreSQL SQLite

Tools & Deployment

Streamlit Git Jupyter


πŸ“« Connect

LinkedIn GitHub

Pinned Loading

  1. credit-risk-modeling credit-risk-modeling Public

    End-to-end credit risk modeling pipeline with probability calibration, risk bucketing, and SHAP explainability.

    Jupyter Notebook

  2. premier-league-forecasting premier-league-forecasting Public

    Probabilistic Premier League match forecasting with calibrated predictions and Monte Carlo season simulations.

    Jupyter Notebook

  3. electricity-demand-forecasting electricity-demand-forecasting Public

    Electricity demand forecasting using time-series feature engineering and ML models (Linear Regression, Random Forest, XGBoost) with strong baseline comparison.

    Jupyter Notebook

  4. context-aware-ai-assistant context-aware-ai-assistant Public

    Context-aware AI assistant with persistent semantic memory, multi-session chat management, and real-time web retrieval built using LangChain, ChromaDB, Groq, and Streamlit.

    Python 1