Detecting potential fraud in financial systems is a major challenge for organizations worldwide. Building robust solutions that enable real-time actions is essential for companies aiming to provide greater security to their customers during financial transactions.
This repository demonstrates a complete machine learning pipeline for credit card fraud detection using the Kaggle Credit Card Fraud Detection dataset, which contains 284,807 European cardholder transactions from 2013 (including 492 fraudulent cases) with 28 PCA-transformed features plus original Amount and Time variables.
The project showcases a production-ready streaming architecture that integrates Amazon SageMaker for training both supervised and unsupervised ML models and deploying them as managed endpoints. The complete AWS solution includes:
- Training of supervised and unsupervised ML models and deployment to a managed-endpoint using Amazon SageMaker
- REST API deployment via Chalice (Lambda + API Gateway)
- Streaming data pipeline (Kinesis → Spark/Glue → RDS)
- (Optional) Interactive dashboard for real-time fraud monitoring and analysis.
Architecture overview:
Useful links:
- Our detailed architecture documentation
- AWS Chalice documentation
aws-realtime-fraud-detection/
├── app/
│ ├── chalice/ # Serverless API (Chalice)
│ └── streamlit/ # Dashboard (Streamlit)
├── assets/ # Images, diagrams
├── devops/infra/ # Infrastructure-as-Code (Terraform, etc.)
├── docs/ # Documentation
├── scripts/ # Data generation (client simulator)
├── src/fraudit/ # Streaming pipeline & utilities
│ ├── jobs/elt/ # Schema, transformations, loading
│ └── utils/ # PostgreSQL DDL, logging, etc.
├── dataset/ # Local datasets (e.g., creditcard.csv)
├── docker-compose.yml # Launching the dashboard (optional)
└── pyproject.toml # Package configuration (single source of truth)
- Python 3.10
- AWS CLI and configure it with your AWS credentials.
- Terraform
- Setup your virtual env and install required packages
$ uv sync- Setup your AWS credentials
$ aws configure- Provision AWS resources
$ make tf.init
$ make tf.plan
$ make tf.applySetup Chalice configuration file app/chalice/.chalice/config.json using lambda execution role given by terraform output.
After that you can provision Lambda and API GateAway and deploy your API app on Lambda.
$ make chalice.deploy- Deployment
Build and deploy
frauditpackage wheel and Glue job artifacts to S3 for AWS Glue job consumption.
$ make deploy.glueSagemaker is used to train and deploy the ML models. The training and deployment notebooks are located in the
sagemaker/ folder.
- Route: POST /predict
- Setup Chalice configuration file:
app/chalice/.chalice/config.json
{
"version": "2.0",
"app_name": "ml-inference-api",
"stages": {
"dev": {
"api_gateway_stage": "api",
"manage_iam_role": false,
"iam_role_arn": "<terraform_lambda_exec_role_arn_output>",
"environment_variables": {
"solution_prefix": "fraud-detection",
"stream_name": "fraud-predictions-stream",
"aws_region": "eu-west-1"
}
}
}
}- (Optional) Test Chalice deployment locally
$ chalice local --port 8000 # Optional -> urls: http://localhost:8000/- Deploy Chalice app to AWS Lambda
$ cd app/chalice
$ chalice deploy- JSON input (minimal example):
{
"metadata": {
"timestamp": "2025-08-21T17:45:00Z",
"user_id": "u_123",
"source": "checkout",
"device_info": {"device_type": "mobile", "os_version": "iOS 17", "app_version": "2.4.1"},
"ip_address": "203.0.113.10",
"geo": {"country": "fr", "region": "IDF", "city": "Paris", "latitude": 48.85, "longitude": 2.35}
},
"data": "0.12, 50.3, 1, 0, 3, ..."
}- API Output (excerpt):
{
"anomaly_detector": {"score": 0.02},
"fraud_classifier": {"pred_proba": 0.13, "prediction": 0}
}Create a .env file at the repo root (do not commit secrets).
Tip: use a .env.example without secrets in the repo and keep your .env locally.
- Install the build package
$ python3 -m pip install build- Package the project (wheel)
$ python3 -m buildThis will result in a wheel file fraudit-0.0.1-py3-none-any.whl in the dist/ directory.
-
Deploy the job, wheel and Kinesis connector for Spark to their respective AWS S3 for Glue
Tip: See devops/infra/main/glue.tf
--additional-python-modulesand--extra-jarsTerraform options for more details.- Download the Kinesis connector JAR for Spark: https://github.com/awslabs/spark-sql-kinesis-connector
- Upload the wheel, job and Kinesis connector to S3:
$ aws s3 cp dist/fraudit-0.0.1-py3-none-any.whl s3://credit-card-fraud-detection-spark-streaming-bucket/wheel/fraudit-0.0.1-py3-none-any.whl
$ aws s3 cp src/fraudit/glue_job.py s3://credit-card-fraud-detection-spark-streaming-bucket/spark-jobs/
$ aws s3 cp src/resources/spark-streaming-sql-kinesis-connector_2.12-1.0.0 s3://credit-card-fraud-detection-spark-streaming-bucket/jars/spark-streaming-sql-kinesis-connector_2.12-1.0.0- Once the artifacts are uploaded, you can start the Glue job from the console, ensuring the default arguments defined
in
glue.tfare set.
- Download and setup Apache Spark locally. To do so, refer to the spark installation guide.
- Make sure environment variables are set in
.env.- Download the Kinesis connector JAR for Spark: https://github.com/awslabs/spark-sql-kinesis-connector
- Place the JAR and/or set KINESIS_CONNECTOR_PATH to:
src/resources/spark-streaming-sql-kinesis-connector_2.12-1.0.0.jar
- Run the job
$ python fraudit.mainThe job reads the Kinesis stream (KINESIS_STREAM), transforms the data (src/fraudit/jobs/elt/transform.py), and appends into the fraud_predictions table.
Prerequisites: .env with CHALICE_API_URL and dataset/creditcard.csv present.
$ python -m pip install -e .[scripts]
$ python scripts/generate_data.py- PARALLEL_INVOCATION in scripts/generate_data.py allows sending in parallel.
- Adjust max_requests according to desired throughput.
- Via Docker:
$ docker compose up dashboard- Or locally:
$ cd streamlit
$ pip install -r requirements.txt
$ streamlit run app.pyEnsure POSTGRES_HOST/DB/USER/PASSWORD/PORT are configured.
- Destroy the infrastructure:
$ cd devops/infra/main && terraform destroy- Delete the Chalice API:
$ cd app/chalice && chalice delete- Error "Missing required environment variables" when starting locally: check your .env (see variables above).
- Kinesis connector not found: set KINESIS_CONNECTOR_PATH to the JAR.
- API 4xx/5xx during generation: check CHALICE_API_URL and quotas; reduce PARALLEL_INVOCATION.
- Do not commit secrets in .env.
Educational/demo project. Adapt before production use.
