Tutorial 1 — End-to-end fraud classifier (BFSI)¶
A realistic BFSI scenario from scratch. We'll generate a synthetic credit-card transactions dataset, run it through the default_ml_pipeline with the rbi_free_ai compliance profile, shadow-test against a champion, promote, and inspect the regulator-format audit PDF.
Time: ~30 minutes interactive (30 seconds if you just read the rendered output).
Assumes: swarm running at http://localhost:8000, ANTHROPIC_API_KEY or OPENAI_API_KEY set. See Quickstart to get there.
0. Setup¶
We'll use the swarm CLI + REST from inside Python. swarm the CLI is installed via pip install -e ml_team/.
import os
import time
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
SWARM_API = os.environ.get('SWARM_API', 'http://localhost:8000')
SWARM_TOKEN = os.environ['SWARM_TOKEN'] # see: swarm login
print(f'Using swarm at {SWARM_API}')
1. Generate a realistic-looking fraud dataset¶
Rather than hand-waving with sklearn defaults, we'll build something that looks like real credit-card data — skewed amounts, time-of-day patterns, merchant category fields, and a ~2% fraud prior (realistic BFSI rate).
This is synthetic and will not leave your machine.
np.random.seed(42)
n_samples = 50_000
# Base numerical features via sklearn — gives us signal without leaking label perfectly
X, y = make_classification(
n_samples=n_samples,
n_features=15,
n_informative=8,
n_redundant=3,
weights=[0.98, 0.02], # 2% fraud
class_sep=0.8,
flip_y=0.01, # a little label noise
random_state=42,
)
# Dress up the DataFrame to look like real BFSI data
df = pd.DataFrame(X, columns=[
'amount_zscore', 'velocity_zscore', 'merchant_risk',
'hour_of_day', 'day_of_week', 'card_age_months',
'prev_7d_txn_count', 'prev_7d_amount_sum',
'foreign_txn_flag', 'cnp_flag', # card-not-present
'geo_risk_score', 'device_risk_score',
'recipient_new_flag', 'pin_tries', 'chargeback_prev_90d'
])
df['is_fraud'] = y
# Add a protected attribute to demonstrate fairness audit (synthetic!)
df['cardholder_region'] = np.random.choice(
['north', 'south', 'east', 'west', 'metro'],
size=n_samples,
p=[0.22, 0.22, 0.15, 0.15, 0.26],
)
print(f'Dataset: {len(df):,} rows, fraud rate: {df.is_fraud.mean():.2%}')
df.head()
# Save to the swarm data directory
data_path = Path('ml_team/data/fraud_synthetic.csv')
data_path.parent.mkdir(exist_ok=True, parents=True)
df.to_csv(data_path, index=False)
print(f'Saved to {data_path}')
2. Kick off the pipeline with RBI FREE-AI profile¶
We request rbi_free_ai at pipeline start so the fairness auditor + SHAP explainer + audit PDF template activate.
import httpx
resp = httpx.post(
f'{SWARM_API}/api/v1/pipelines',
headers={'Authorization': f'Bearer {SWARM_TOKEN}'},
json={
'problem_statement': (
'Classify credit-card transactions as fraud (1) or legitimate (0). '
'Target column is is_fraud. Protected attribute is cardholder_region. '
'Optimise for PR-AUC; we care about catching fraud without frustrating legit users.'
),
'dataset_path': 'fraud_synthetic.csv',
'template': 'default_ml_pipeline',
'compliance_profile': 'rbi_free_ai',
'name': 'fraud-v1-tutorial',
},
timeout=30,
)
resp.raise_for_status()
run_id = resp.json()['run_id']
print(f'Pipeline started: run_id={run_id}')
3. Watch it run¶
Poll for status. Expect ~15-30 minutes on the default_ml_pipeline — this does hyperparam tuning, cross-validation, fairness audit, SHAP, and model-card generation.
while True:
status = httpx.get(
f'{SWARM_API}/api/v1/pipelines/{run_id}',
headers={'Authorization': f'Bearer {SWARM_TOKEN}'},
).json()
phase = status.get('phase', '?')
agent = status.get('active_agent', '?')
print(f'{time.strftime("%H:%M:%S")} phase={phase} agent={agent}')
if status.get('state') in {'completed', 'failed', 'cancelled'}:
break
time.sleep(30)
print(f'\nFinal state: {status["state"]}')
4. Inspect the outputs¶
The run_dir contains model artefacts, per-agent JSONL journals, compliance reports, and — because we enabled rbi_free_ai — an audit PDF.
run_dir = Path(f'pipeline_runs/{run_id}')
for p in sorted(run_dir.rglob('*')):
if p.is_file():
rel = p.relative_to(run_dir)
size_kb = p.stat().st_size / 1024
print(f'{size_kb:>8.1f} KB {rel}')
# Read the model card
print((run_dir / 'reports' / 'model_card.md').read_text()[:3000])
# Check the fairness audit
import json
fairness = json.loads((run_dir / 'reports' / 'fairness_audit.json').read_text())
print(json.dumps(fairness, indent=2)[:2000])
5. Inspect the audit PDF¶
Regulator-format PDF in audit/audit_report.pdf.
# Open in the system PDF viewer
import subprocess
subprocess.run(['open', str(run_dir / 'audit' / 'audit_report.pdf')])
See Reading the audit PDF for a section-by-section walkthrough.
6. Deploy — shadow traffic + champion-challenger¶
Package the model into a container, register as a challenger (there's no champion yet; first deploy becomes champion automatically):
package_resp = httpx.post(
f'{SWARM_API}/api/v1/deployments/package',
headers={'Authorization': f'Bearer {SWARM_TOKEN}'},
json={'run_id': run_id, 'name': 'fraud_classifier', 'version': 'v1'},
timeout=180, # Docker builds take a bit
)
package_resp.raise_for_status()
print(package_resp.json())
7. Iterate — train a challenger and compare¶
Train a second model with a different template (e.g. fast_prototype with CatBoost instead of XGBoost). Package as v2. Shadow against v1 for 24 hours. Promote if it wins.
# Pipeline v2
swarm pipelines run \
--problem "...(same problem statement)..." \
--dataset fraud_synthetic.csv \
--template fast_prototype \
--compliance rbi_free_ai \
--name fraud-v2
# Package v2
swarm deployments package --run-id <v2_run_id> --name fraud_classifier --version v2
# Shadow v2 against champion v1
swarm deployments shadow-start \
--model fraud_classifier --challenger v2 --champion v1 \
--sample-rate 0.1 --duration 24h
# After 24h: compare
swarm deployments compare --model fraud_classifier --champion v1 --challenger v2
# If challenger wins, promote (HITL gate)
swarm deployments promote --model fraud_classifier --challenger v2
8. Summary — what you just got¶
- A trained fraud classifier — joblib model in
pipeline_runs/<id>/models/ - A model card — human-readable, fit for technical handoff
- A fairness audit — demographic parity + equalized odds on
cardholder_region - SHAP explanations — global + per-prediction feature importances
- A drift baseline — pinned for future monitoring
- A regulator-format audit PDF — tamper-evident, ready for RBI / internal audit
- Per-agent conversation logs — full trail of who did what
- A deployed container — ready for shadow traffic
All in one pipeline run. The alternative (by-hand):
- Week 1: data profiling + algorithm selection spreadsheet
- Week 2: training loop + hyperparam search + evaluation
- Week 3: fairness audit + SHAP + document writing
- Week 4: compliance review with CRO's team
- Week 5: deploy + shadow setup
That's the pitch.
Next¶
- Drift investigation tutorial — what to do when this model's nightly drift check fires in 3 months
- Plugin authoring tutorial — extend swarm with your own surfaces
- Compliance: RBI FREE-AI — the regulatory context
