GitHub - pratyushuniv2022-code/FrontLine-AI: Autonomous AI-powered intelligence pipeline that scrapes, analyzes, and summarizes geopolitical developments related to India. Integrates Exa.ai search with extractive summarization to produce structured threat insights for real-time visualization.

🚨 Project Title 🇮🇳 India-Focused Threat Intelligence & Safety Monitoring Dashboard

Youtube video link of live presentation https://youtu.be/oE9aobRJSOo

This is the deployed link of my ai agent do play and suggest me for any improvements https://autonomous-scraper.fly.dev/

🧠 Objective

To build an autonomous, LLM-assisted threat analysis dashboard that continuously:

🛰️ Scrapes and aggregates open-source intelligence (OSINT) related to India (from GitHub datasets, Stratfor feeds, or APIs).

🧩 Summarizes and analyzes the latest geopolitical, security, and economic developments.

🔐 Detects and flags potentially unsafe or malicious content in user queries (AI safety layer).

📊 Visualizes all processed insights in an interactive Streamlit dashboard for analysts or researchers.

⚙️ Integrates automation via Zapier and Hook0 for triggers, and Docker + Fly.io for deployment.

🚀 Note: The deployed version of this dashboard will be added soon once final testing and optimization are complete.

🛰️ Data Sources

This project uses two key data streams — one for real-time geopolitical intelligence and another for crime analytics. Both are processed through specialized AI agents within the system.

🌍 1. Geopolitical Data (Threat Intelligence)

Source: Stratfor Worldview

Purpose: Used as the primary data feed for analyzing India’s geopolitical, economic, and security-related developments.

Process Overview:

The scraper fetches the latest news briefs, situation reports, and articles related to India from Stratfor.

The Insight Agent summarizes the current geo-financial scenario of the country.

The Scoring & Keyword Agent ranks the summarized items by criticality and highlights key topics, regions, and phrases.

Example Outputs:

📜 Analyst brief summarizing regional trends

🔑 Top keywords (e.g., “trade route,” “defense cooperation,” “energy policy”)

🧩 Automated categorization into economic / military / cyber / policy / social dimensions

🧠 2. Crime Data (Domestic Safety Intelligence)

Source: Indian Crimes Dataset – Kaggle

Purpose: Used by the Crime Intelligence Agent to study patterns of crime across Indian states and districts, supporting analysis on public safety, law enforcement trends, and crime reduction strategies.

Process Overview:

The dataset is loaded into the system for profiling, trend analysis, and visualization.

The Crime Agent identifies hotspots, frequency patterns, and time-based trends.

Includes a chat-based analyzer where users can ask:

“Which cities have the highest crime rates?”

“Is there a seasonal trend in offenses?”

“What type of crimes are rising in urban regions?”

Example Outputs:

🥧 Pie charts for top crime types and regions

📈 Time-series charts for daily/weekly trends

🤖 LLM-based narrative insights about emerging crime clusters

🔒 Data Ethics & Attribution

All scraped data from Stratfor is used strictly for educational and research purposes under fair use, with no redistribution of raw content.

The Kaggle crime dataset remains governed by its original open license and attribution terms.

No private, confidential, or personally identifiable information (PII) is collected or stored.

🧩 Combined Use in the Project Data Source Responsible Agent Purpose Stratfor Worldview Insight & Scoring Agents Summarization, risk scoring, keyword extraction Indian Crimes Dataset (Kaggle) Crime Intelligence Agent Crime profiling, hotspot detection, safety analysis ⚙️ System Architecture 1️⃣ Data Collection / Scraping Layer

Periodically fetches updated JSON/Excel feeds from:

GitHub repositories

Web pipelines (e.g., latest_summary.json)

Stratfor-style public intelligence feeds

Employs retry-based requests (requests + urllib3 Retry Adapter) for stability.

Cleans data to remove boilerplate, PGP blocks, and non-relevant text.

💡 Ensures fresh geopolitical summaries are available every 12 hours without manual refresh.

2️⃣ NLP & Summarization Layer

Uses RAKE (Rapid Automatic Keyword Extraction) for initial keyword extraction.

Generates WordClouds and keyword bar charts for visual insights.

Optionally uses LLMs (Cerebras or OpenAI) for:

Cleaning and deduplicating extracted phrases.

Categorizing into geo / policy / military / economic / cyber / social classes.

Ranking by weighted importance to search queries.

Models Used:

🧠 llama-3.3-70b (Cerebras)

⚡ gpt-4o-mini (OpenAI fallback)

3️⃣ Automated Analyst Brief (LLM Summarizer)

Creates concise 150-word briefings summarizing India’s:

Geopolitical posture (China, Pakistan, IOR region)

Economic and energy security trends

Domestic risks (cyber threats, unrest, natural disasters)

Auto-refresh every 12 hours (via Streamlit caching / Zapier webhook).

Displays a safe message (“No LLM API key configured…”) if keys are unavailable.

4️⃣ Safety & Threat Monitoring System

Built-in Safety Monitor inspects user queries in real time.

Detects risky or violent intent using:

Regex rule-based filters (e.g., bomb, attack, murder, explosive, etc.)

Optional LLM-based intent classifier for nuanced euphemisms.

Classifies queries as Low / Medium / High Risk.

Logs flagged events with:

Timestamp

IP & geolocation (ipapi.co)

Triggered terms and AI decision

Sends instant push notifications (via Pushover) for high-risk alerts.

✅ Ensures the system remains LLM-safe and OSINT-compliant.

5️⃣ Visualization & Dashboard Layer

Built entirely in Streamlit, featuring:

🧭 Sidebar Filters – search text, regex search, source filters

📊 Interactive Visualizations – Plotly bar charts, WordClouds, KPI counters

🧠 LLM Refinement Toggle – switch between rule-based and AI-cleaned results

📘 Excel Analyzer (Beta) – upload or connect GitHub Excel datasets to:

Profile columns

Detect outliers

Suggest improvements

Generate AI-driven summaries (“Ask your dataset”)

The UI auto-updates every 12 hours and hides sensitive admin-only details.

6️⃣ Automation & Integration Layer

Zapier + Hook0 Integration for:

Webhook-based data refresh triggers

Sending summaries to Slack/Email

Remote cache updates

GitHub-hosted NLTK Data:

To keep Docker lightweight, all NLTK corpora (stopwords, punkt, punkt_tab) are hosted on GitHub and downloaded during container startup.

7️⃣ Deployment Architecture

Containerized using Docker

Base: python:3.11-slim

Optimized environment (PYTHONOPTIMIZE=2, limited threads)

Uses docker_entrypoint.sh for runtime setup

Deployed on Fly.io with:

Persistent logs

Secret management (OPENAI_API_KEY, CEREBRAS_API_KEY, PUSHOVER_USER_KEY, etc.)

Built-in health checks (auto-restart if Streamlit is unresponsive)

💡 Core Innovations Component Innovation Scraping → Summarization → Visualization Fully automated 12-hour pipeline with fallback cache LLM Curation Cleans, categorizes, and ranks key phrases intelligently Safety Monitor Hybrid rule + LLM model for real-time risk classification Push Alerts Instant Pushover alerts for high-risk detections Excel Analyzer ML-assisted profiling and chat interface for structured datasets Lightweight Deployment Docker + Fly.io + GitHub hosting for efficient deployment 🧰 Tech Stack Layer Tools / Libraries Language & Environment Python 3.11, Docker Frameworks Streamlit, Plotly, Matplotlib NLP & AI NLTK, RAKE, WordCloud, LLM APIs (Cerebras/OpenAI) Networking requests, urllib3.Retry Automation Zapier, Hook0, Fly.io Data Handling pandas, openpyxl Alerting Pushover API Deployment Fly.io, Docker Hub Version Control GitHub (auto-updating datasets) 🧭 Conceptual Summary

This project acts as a 24/7 digital analyst that:

Continuously scans, summarizes, and interprets open-source data.

Scores and classifies risks in real time.

Monitors for threats, ensuring AI safety compliance.

Visualizes all findings through an elegant, interactive dashboard.

It demonstrates:

Autonomous LLM pipelines for real-time intelligence.

End-to-end integration: data ingestion → NLP → visualization → safety → deployment.

Scalable, containerized design ready for real-world use.

🚀 Future Extensions

🔗 Integrate LangChain for multi-source contextual summarization

🧮 Add Neo4j or DuckDB for historical trend tracking

🧠 Implement LLM-driven clustering for theme evolution detection

🌏 Extend to multi-country threat comparison dashboards

🤖 Introduce a multi-agent MCP (Model Control Protocol) layer for asynchronous coordination between:

Scraper Agent

Summarizer Agent

Safety Agent

Dashboard Agent

Crime Intelligence Agent

🧩 Deployment Note

🌐 Deployed version of the dashboard will be added soon after final optimization and container testing.

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
crime_agent_logs		crime_agent_logs
data		data
nltk_data		nltk_data
safety_logs		safety_logs
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
Presentation1.pptx		Presentation1.pptx
README.md		README.md
fly.toml		fly.toml
nltk_file.py		nltk_file.py
requirements-prod.txt		requirements-prod.txt
requirements.txt		requirements.txt
stratfor_india_agent.py		stratfor_india_agent.py
summarization_stratfor.py		summarization_stratfor.py
threat_dashboard_with_safety.py		threat_dashboard_with_safety.py
trial_excel_chat.py		trial_excel_chat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

pratyushuniv2022-code/FrontLine-AI

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages