Skip to content

Promote fraud-detection example (IEEE-CIS) to main#154

Merged
ZhengyaoJiang merged 1 commit into
mainfrom
dev
Jun 8, 2026
Merged

Promote fraud-detection example (IEEE-CIS) to main#154
ZhengyaoJiang merged 1 commit into
mainfrom
dev

Conversation

@ZhengyaoJiang

Copy link
Copy Markdown
Contributor

Promotes the fraud-detection example (PR #140) from dev to main.

This is an example-only change — no pyproject.toml version bump, so the Release
workflow will detect no version change and skip the PyPI publish (release_needed=false).

End-to-end validated twice (two independent synthetic-data fixtures) on the merged
dev code: prepare_data.py → evaluate.py runs the full pipeline (time-split, V-corr
pruning, label-encode, stratified 100K/25K subsample, LightGBM) and emits a parseable
auc_roc: line for both the strict and loose variants. Lint (ruff) green.

Reproducible Weco example on the IEEE-CIS Fraud Detection Kaggle dataset
(real Vesta payment transactions), mirroring the published case study.

- examples/fraud-detection/      strict fit/transform API (FeatureBuilder + train_and_evaluate) that makes train/val leakage impossible by construction.
- examples/fraud-detection-loose/ earlier single-file build_features(train_df, val_df) API, kept for comparison.

End-to-end validated: prepare_data -> evaluate emits a parseable `auc_roc:` line through the full pipeline (time-split, V-corr pruning, label-encode, stratified 100K/25K subsample, LightGBM). Lint (ruff) green.
@ZhengyaoJiang ZhengyaoJiang merged commit 58b4b01 into main Jun 8, 2026
2 checks passed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1a4588c10

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
return 1

auc = train_and_evaluate(X_train_t, y_train, X_val_t, y_val)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep validation labels out of editable model code

When users run the documented Model-only or Full-pipeline scopes, model.py is one of the files Weco rewrites, but this call gives that editable code y_val before it produces predictions/AUC. In that context the strict API does not actually prevent validation-label leakage: a candidate can train/tune on X_val, y_val or directly return an inflated score, so the reported optimization result can be invalid despite the evaluator being frozen.

Useful? React with 👍 / 👎.

Comment thread examples/README.md
- **Run**:
```bash
cd examples/fraud-detection
weco run --source train.py \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point the quickstart at the strict example sources

This quickstart changes into examples/fraud-detection, but that directory has no train.py (it has features.py and model.py), and the CLI validation rejects missing source files. Users following the top-level README will fail before any evaluation runs; this should mirror the example README’s --sources features.py model.py command or point at fraud-detection-loose if train.py is intended.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant