Deploy a model#
Goal: take a trained model, run it against shadow traffic, compare to the current champion, and promote it. Every step produces audit evidence.
Time: ~15 minutes. Prerequisite: you have a trained model from Train your first classifier.
The deployment pipeline#
Trained model (joblib)
│
▼
Package (Docker image + metadata)
│
▼
Shadow traffic (parallel prediction, no user impact)
│
▼
Compare to champion (fairness, accuracy, latency delta)
│
▼
Promote or retire (HITL approval gate)
│
▼
New champion serving production traffic
1. Package the model#
What happens:
- model_packager agent builds an OCI image ghcr.io/yourorg/iris_classifier:2
- Metadata (dependencies, training data hash, model card) pinned in the image labels
- Model registered in the registry as iris_classifier:v2 (status: packaged)
2. Spin up shadow traffic#
swarm deployments shadow-start \
--model iris_classifier \
--challenger v2 \
--champion v1 \
--sample-rate 0.1 \
--duration 24h
What happens:
- 10% of incoming prediction requests are duplicated to v2; user gets v1's response
- Both predictions logged to shadow_predictions table with correlation id
- Runs for 24 hours (or until manually stopped)
During shadow:
Reports: - Request count matched - Agreement rate (both models gave same prediction) — typically 90-99% - Disagreement breakdown by class - Latency delta (p50, p99) - Fairness delta on protected attributes (if configured)
3. Compare champion vs challenger#
At any point during or after the shadow window:
Reports:
| Metric | v1 (champion) | v2 (challenger) | Δ | Acceptable? |
|---|---|---|---|---|
| Accuracy | 0.9412 | 0.9667 | +2.55% | ✅ |
| Macro F1 | 0.9389 | 0.9674 | +2.85% | ✅ |
| p99 latency | 22ms | 19ms | −14% | ✅ |
| Fairness (ΔDP on protected attr) | 0.021 | 0.018 | −14% | ✅ |
| Agreement | — | 0.962 | — | — |
serving_monitor agent writes this as reports/champion_challenger_v1_v2.json.
4. Promotion gate (HITL)#
Promotion requires an operator approval because it's sensitive:
This raises ApprovalRequired. Approve via:
Deployments → Approvals pending → [gate] → review → Approve.
On approval, swarm:
1. Re-runs the promote_challenger tool call
2. Marks v2 as the new champion in the registry
3. Routes 100% of traffic to v2
4. Retains v1 in "retired" state for N days (retention policy)
5. Writes an audit PDF documenting the whole promotion
5. Generate the promotion audit PDF#
Produces reports/promotion_v1_to_v2_audit.pdf containing:
- Shadow-traffic evidence (counts, disagreement, latency delta)
- Champion-challenger comparison table
- Fairness delta on protected attributes
- Promotion approver identity + timestamp + comment
- Tamper-evident SHA-256 manifest
6. Retire the old champion#
Status changes to retired. Image not deleted (needed for rollback); removed from routing. After retention period (default 90 days), the image is garbage-collected by a cron job.
7. Rollback (if needed)#
Instant. No HITL gate (because rollback is a safety action, not a deploy).
Champion-challenger with compliance profile#
If the pipeline ran with --compliance rbi_free_ai:
- Fairness audit is required before promotion (not optional)
- explain_model must have produced SHAP values
- Audit PDF template uses the RBI format
- The accountability agent auto-attributes the promotion approver to the audit record
Next#
- Generate an audit PDF — the regulator-format output
- Schedule a drift check — post-deploy monitoring
- Reading the audit PDF
Troubleshooting#
Shadow traffic shows 100% agreement
Either the challenger is identical to champion (verify v1_hash vs v2_hash), or your traffic is homogeneous (all requests hit the same prediction). Increase sample rate or wait longer.
Fairness delta is high (>5%)
Don't promote. Open an investigation. Usually means: - Training data has shifted against the protected attribute - Hyperparameter change had disparate impact - Bug in feature engineering
Use swarm pipelines run --problem "Investigate fairness drift on iris_classifier" to have the eval team look at it.
Approval gate fires but I'm the only operator
Approve yourself. The audit trail records you approved your own change; that's fine for design-partner phase. Production BFSI enforces separation of duties via RBAC policy.