Real-time Fraud Detection on AWS

Detecting potential fraud in financial systems is a major challenge for organizations worldwide. Building robust solutions that enable real-time actions is essential for companies aiming to provide greater security to their customers during financial transactions.

This repository demonstrates a complete machine learning pipeline for credit card fraud detection using the Kaggle Credit Card Fraud Detection dataset, which contains 284,807 European cardholder transactions from 2013 (including 492 fraudulent cases) with 28 PCA-transformed features plus original Amount and Time variables.

The project showcases a production-ready streaming architecture that integrates Amazon SageMaker for training both supervised and unsupervised ML models and deploying them as managed endpoints. The complete AWS solution includes:

Training of supervised and unsupervised ML models and deployment to a managed-endpoint using Amazon SageMaker
REST API deployment via Chalice (Lambda + API Gateway)
Streaming data pipeline (Kinesis → Spark/Glue → RDS)
(Optional) Interactive dashboard for real-time fraud monitoring and analysis.

Architecture overview:

Useful links:

Our detailed architecture documentation
AWS Chalice documentation

Project structure

aws-realtime-fraud-detection/
├── app/                 
│   ├── chalice/                  # Serverless API (Chalice)
│   └── streamlit/                # Dashboard (Streamlit)
├── assets/                       # Images, diagrams
├── devops/infra/                 # Infrastructure-as-Code (Terraform, etc.)
├── docs/                         # Documentation
├── scripts/                      # Data generation (client simulator)
├── src/fraudit/                  # Streaming pipeline & utilities
│   ├── jobs/elt/                 # Schema, transformations, loading
│   └── utils/                    # PostgreSQL DDL, logging, etc.
├── dataset/                      # Local datasets (e.g., creditcard.csv)
├── docker-compose.yml            # Launching the dashboard (optional)
└── pyproject.toml                # Package configuration (single source of truth)

Prerequisites

Python 3.10
AWS CLI and configure it with your AWS credentials.
Terraform

Quick start

Setup your virtual env and install required packages

$ uv sync

Setup your AWS credentials

$ aws configure

Provision AWS resources

$ make tf.init
$ make tf.plan
$ make tf.apply

Setup Chalice configuration file app/chalice/.chalice/config.json using lambda execution role given by terraform output. After that you can provision Lambda and API GateAway and deploy your API app on Lambda.

$ make chalice.deploy

Deployment Build and deploy fraudit package wheel and Glue job artifacts to S3 for AWS Glue job consumption.

$ make deploy.glue

AWS SageMaker

Sagemaker is used to train and deploy the ML models. The training and deployment notebooks are located in the sagemaker/ folder.

Inference API (Chalice)

Route: POST /predict

Setup

Setup Chalice configuration file: app/chalice/.chalice/config.json

{
    "version": "2.0",
    "app_name": "ml-inference-api",
    "stages": {
        "dev": {
            "api_gateway_stage": "api",
            "manage_iam_role": false,
            "iam_role_arn": "<terraform_lambda_exec_role_arn_output>",
            "environment_variables": {
                "solution_prefix": "fraud-detection",
                "stream_name": "fraud-predictions-stream",
                "aws_region": "eu-west-1"
            }
        }
    }
}

(Optional) Test Chalice deployment locally

$ chalice local --port 8000 # Optional -> urls: http://localhost:8000/

Deploy Chalice app to AWS Lambda

$ cd app/chalice
$ chalice deploy

Minimal transaction example

JSON input (minimal example):

{
  "metadata": {
    "timestamp": "2025-08-21T17:45:00Z",
    "user_id": "u_123",
    "source": "checkout",
    "device_info": {"device_type": "mobile", "os_version": "iOS 17", "app_version": "2.4.1"},
    "ip_address": "203.0.113.10",
    "geo": {"country": "fr", "region": "IDF", "city": "Paris", "latitude": 48.85, "longitude": 2.35}
  },
  "data": "0.12, 50.3, 1, 0, 3, ..."
}

API Output (excerpt):

{
  "anomaly_detector": {"score": 0.02},
  "fraud_classifier": {"pred_proba": 0.13, "prediction": 0}
}

Environment variables

Create a .env file at the repo root (do not commit secrets). Tip: use a .env.example without secrets in the repo and keep your .env locally.

Spark Streaming job

Glue Job Deployment

Install the build package

$ python3 -m pip install build

Package the project (wheel)

$ python3 -m build

This will result in a wheel file fraudit-0.0.1-py3-none-any.whl in the dist/ directory.

Deploy the job, wheel and Kinesis connector for Spark to their respective AWS S3 for Glue

Tip: See devops/infra/main/glue.tf --additional-python-modules and --extra-jars Terraform options for more details.
- Download the Kinesis connector JAR for Spark: https://github.com/awslabs/spark-sql-kinesis-connector
- Upload the wheel, job and Kinesis connector to S3:

$ aws s3 cp dist/fraudit-0.0.1-py3-none-any.whl s3://credit-card-fraud-detection-spark-streaming-bucket/wheel/fraudit-0.0.1-py3-none-any.whl
$ aws s3 cp src/fraudit/glue_job.py s3://credit-card-fraud-detection-spark-streaming-bucket/spark-jobs/
$ aws s3 cp src/resources/spark-streaming-sql-kinesis-connector_2.12-1.0.0 s3://credit-card-fraud-detection-spark-streaming-bucket/jars/spark-streaming-sql-kinesis-connector_2.12-1.0.0

Once the artifacts are uploaded, you can start the Glue job from the console, ensuring the default arguments defined in glue.tf are set.

Local Job Running

Download and setup Apache Spark locally. To do so, refer to the spark installation guide.
Make sure environment variables are set in .env.
- Download the Kinesis connector JAR for Spark: https://github.com/awslabs/spark-sql-kinesis-connector
- Place the JAR and/or set KINESIS_CONNECTOR_PATH to: src/resources/spark-streaming-sql-kinesis-connector_2.12-1.0.0.jar
Run the job

$ python fraudit.main

The job reads the Kinesis stream (KINESIS_STREAM), transforms the data (src/fraudit/jobs/elt/transform.py), and appends into the fraud_predictions table.

Simulated data generation

Prerequisites: .env with CHALICE_API_URL and dataset/creditcard.csv present.

$ python -m pip install -e .[scripts]
$ python scripts/generate_data.py

PARALLEL_INVOCATION in scripts/generate_data.py allows sending in parallel.
Adjust max_requests according to desired throughput.

Dashboard (Streamlit)

Via Docker:

$ docker compose up dashboard

Or locally:

$ cd streamlit
$ pip install -r requirements.txt
$ streamlit run app.py

Ensure POSTGRES_HOST/DB/USER/PASSWORD/PORT are configured.

Clean up

Destroy the infrastructure:

$ cd devops/infra/main && terraform destroy

Delete the Chalice API:

$ cd app/chalice && chalice delete

Troubleshooting

Error "Missing required environment variables" when starting locally: check your .env (see variables above).
Kinesis connector not found: set KINESIS_CONNECTOR_PATH to the JAR.
API 4xx/5xx during generation: check CHALICE_API_URL and quotas; reduce PARALLEL_INVOCATION.
Do not commit secrets in .env.

License

Educational/demo project. Adapt before production use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-time Fraud Detection on AWS

Project structure

Prerequisites

Quick start

AWS SageMaker

Inference API (Chalice)

Setup

Minimal transaction example

Environment variables

Spark Streaming job

Glue Job Deployment

Local Job Running

Simulated data generation

Dashboard (Streamlit)

Clean up

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
app		app
assets		assets
devops		devops
docs		docs
sagemaker		sagemaker
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

goamegah/aws-realtime-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Real-time Fraud Detection on AWS

Project structure

Prerequisites

Quick start

AWS SageMaker

Inference API (Chalice)

Setup

Minimal transaction example

Environment variables

Spark Streaming job

Glue Job Deployment

Local Job Running

Simulated data generation

Dashboard (Streamlit)

Clean up

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages