Skip to content

4bdu114h/hackx.ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SentinelFL — AI-Powered Federated Due Diligence Platform

Detect financial fraud in startups using federated graph intelligence and AI-generated audit reports.

SentinelFL enables investment firms to collaboratively identify fraud — circular transactions, shell companies, inflated revenue — without sharing confidential data. Upload financials, see the fraud network, get an AI-generated audit report.

Architecture

┌────────────────────────────────────────────────────────────────┐
│  React + Vite + Tailwind CSS                                   │
│  Dashboard · Companies · Graph Explorer · Reports · FL Status  │
└──────────────────────┬─────────────────────────────────────────┘
                       │ REST API (proxied via Vite)
┌──────────────────────▼─────────────────────────────────────────┐
│  FastAPI Backend                                                │
│  ┌──────────┐ ┌──────────────┐ ┌───────────┐ ┌──────────────┐ │
│  │  Graph    │ │  Fraud       │ │  ML       │ │  Federated   │ │
│  │  Builder  │ │  Detector    │ │  Models   │ │  Learning    │ │
│  │ NetworkX  │ │ Rules Engine │ │ IsoForest │ │ FedAvg + DP  │ │
│  └──────────┘ └──────────────┘ └───────────┘ └──────────────┘ │
│  ┌──────────────────┐ ┌─────────────────┐ ┌────────────────┐  │
│  │  Risk Scorer      │ │  GenAI Reports  │ │  Explainer     │  │
│  │ Weighted Formula  │ │ OpenAI/Template │ │  SHAP-like     │  │
│  └──────────────────┘ └─────────────────┘ └────────────────┘  │
└────────────────────────────────────────────────────────────────┘

Investment bank mental model

Concept In SentinelFL
Graph nodes Startup / portfolio companies (company_id in companies.csv).
Transaction history transactions.csv → aggregated stats, graph edges, anomaly detection.
Company reports Structured fields in companies.csv (revenue vs. bank inflow, sector, status) — NLP + rules treat these as “report text” proxies; extend with real PDFs via NLPProcessor later.
Personal / director reports directors.csv (overlaps, other_directorships) → network and overlap features.
NLP → fixed attributes NLPProcessor + feature_schema.py produce one standardized row per company (transactions + company + director + graph + keyword/NLP signals). See GET /features and GET /features/{id}.
Fraud model FraudModelTrainer (MLP) learns from pseudo-labels derived from red-flag features.
Federated learning FederatedEngine groups companies into notional fund clients, runs FedAvg + DP on weights, produces a global model.
Score every company on the global model fl_engine.score_companies()fl_fraud_probability per company; blended into composite risk as historical_pattern_score (15%).
Bank-wide view GET /portfolio/summary — all companies sorted by risk, counts, FL status. Per-company narrative: POST /generate-report.

End-to-end flow in code: run_initial_analysis() in backend/main.py (graphs → anomalies → NLP features → FL train → fl_scoresRiskScorer + FraudDetector).

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+

1. Install backend dependencies

pip install -r requirements.txt

2. Start the backend server

uvicorn backend.main:app --host 127.0.0.1 --port 8000

The API will load sample data from data/ automatically and run initial fraud analysis on all 25 companies.

3. Install and start the frontend

cd frontend
npm install
npm run dev

4. Open the dashboard

Navigate to http://localhost:5173 in your browser.

Project Structure

sentinelfl/
├── backend/
│   ├── main.py                # FastAPI app with DataStore & startup analysis
│   ├── models.py              # Pydantic data models
│   ├── routes/
│   │   ├── upload.py          # POST /upload-data
│   │   ├── company.py         # GET /companies, /company/{id}
│   │   ├── fraud.py           # GET /fraud-analysis/{id}
│   │   ├── graph.py           # GET /graph, /graph/{id}
│   │   ├── report.py          # POST /generate-report, GET /report/{id}
│   │   ├── federated.py       # GET /federated/status, POST /federated/simulate
│   │   ├── features.py        # GET /features, /features/{id} — standardized dataset
│   │   ├── portfolio.py       # GET /portfolio/summary — bank portfolio view
│   │   └── explain.py         # GET /explain/{entity_id}
│   ├── services/
│   │   ├── fraud_detector.py  # Rule-based fraud detection (6 detection methods)
│   │   ├── nlp_processor.py   # NLP + structured features → fixed attribute rows
│   │   ├── feature_schema.py  # Feature names and metadata for the ML dataset
│   │   └── risk_scorer.py     # Composite risk scoring (includes FL probability)
│   ├── ml/
│   │   ├── anomaly_detector.py    # Isolation Forest anomaly detection
│   │   ├── gnn_model.py          # Graph-based risk scoring (PageRank, centrality)
│   │   ├── isolation_forest.py   # Extended anomaly detector with feature extraction
│   │   └── explainer.py          # SHAP-like feature importance explanations
│   ├── graph/
│   │   └── graph_builder.py   # NetworkX graph construction from CSV data
│   ├── federated/
│   │   ├── simulator.py       # FedAvg simulation with 5 financial institutions
│   │   ├── fedavg.py          # Full federated learning with PyTorch MLP
│   │   └── dp_wrapper.py      # Differential privacy noise injection
│   └── genai/
│       └── report_generator.py # GenAI audit reports (OpenAI or template fallback)
├── frontend/
│   ├── src/
│   │   ├── App.jsx            # Sidebar layout with routing
│   │   ├── pages/
│   │   │   ├── Dashboard.jsx      # Overview with stats, charts, top risks
│   │   │   ├── Companies.jsx      # Searchable company listing with risk bars
│   │   │   ├── CompanyDetail.jsx  # Deep-dive: risk gauge, signals, graph, SHAP
│   │   │   ├── UploadData.jsx     # CSV drag-and-drop upload
│   │   │   ├── GraphExplorer.jsx  # Interactive force-directed entity graph
│   │   │   ├── FederatedLearning.jsx  # FL metrics, convergence chart, clients
│   │   │   └── Report.jsx        # AI-generated audit report viewer
│   │   └── services/
│   │       └── api.js         # API client for all endpoints
│   └── vite.config.js         # Vite + Tailwind v4 + API proxy
├── data/
│   ├── transactions.csv       # 10,000 transactions with planted fraud
│   ├── companies.csv          # 25 companies (3 shell clusters, 2 inflated)
│   ├── directors.csv          # 40 directors (5 controlling multiple entities)
│   └── generate_data.py       # Dataset generation script
└── requirements.txt

Fraud Detection Methods

Method Algorithm What It Detects
Circular Transactions DFS cycle detection A → B → C → A money flows
Shared Directors Director-company mapping Shell company networks
Revenue Inflation Revenue vs. bank inflow comparison Fabricated growth
Fund Diversion Egocentric network analysis Money siphoned to personal accounts
Anomaly Detection Isolation Forest (ML) Statistically unusual transactions
Graph Risk PageRank + Degree Centrality Suspicious network positions

Risk Score Formula

Risk Score = (0.30 × Anomaly) + (0.30 × Graph Risk) + (0.25 × Rule Violations) + (0.15 × Historical)
Score Range Level Action
0–30 Low Standard due diligence
30–70 Medium Enhanced monitoring
70–100 High Escalate to forensic audit

API Endpoints

Endpoint Method Description
/upload-data POST Upload CSV files for analysis
/companies GET List all companies with risk scores
/company/{id} GET Company profile with directors and flags
/fraud-analysis/{id} GET Full fraud analysis breakdown
/graph/{id} GET Entity graph data (ego network)
/graph GET Full entity graph
/generate-report POST Generate AI audit report
/report/{id} GET Retrieve generated report
/federated/status GET Federation metrics and accuracy
/federated/simulate POST Run a new FL training round
/features GET Standardized NLP/feature dataset for all companies
/features/{id} GET Feature vector + fl_fraud_probability for one company
/portfolio/summary GET Investment-bank view: all startups, risk mix, FL status
/explain/{id} GET SHAP-like feature importances

Sample Data (Planted Fraud)

The dataset includes deliberately planted fraud patterns:

  • Circular transactions: Alpha Corp → Beta Ltd → Gamma Inc → Alpha Corp (30 rounds)
  • Shell company clusters: Zeta/Eta/Theta, Rho/Sigma/Tau (dormant companies with 10x revenue inflation)
  • Revenue inflation: Xi Global (100Cr reported, 30Cr actual), Upsilon Corp (85Cr reported, 22Cr actual)
  • Fund diversion: Omicron Labs funneling money to personal accounts
  • Shared directors: 5 directors controlling multiple entities across fraud clusters

Optional: Enable GenAI Reports

Set your OpenAI API key for AI-generated audit reports:

export OPENAI_API_KEY=your-key-here

Without the API key, the system generates comprehensive template-based reports.

Tech Stack

Component Technology
Backend FastAPI, Python 3.10+
Frontend React 19, Vite 8, Tailwind CSS v4
Charts Recharts
Icons Lucide React
Graph Engine NetworkX (in-memory)
ML scikit-learn (Isolation Forest)
Federated Learning Simulated FedAvg with DP
Reports OpenAI GPT-4o-mini / Template engine

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors