Deploy a model#

Goal: take a trained model, run it against shadow traffic, compare to the current champion, and promote it. Every step produces audit evidence.

Time: ~15 minutes. Prerequisite: you have a trained model from Train your first classifier.

The deployment pipeline#

Trained model (joblib)
      │
      ▼
  Package (Docker image + metadata)
      │
      ▼
  Shadow traffic (parallel prediction, no user impact)
      │
      ▼
  Compare to champion (fairness, accuracy, latency delta)
      │
      ▼
  Promote or retire (HITL approval gate)
      │
      ▼
  New champion serving production traffic

1. Package the model#

swarm deployments package \
  --run-id 7f8e9a2b \
  --name iris_classifier \
  --version 2

What happens: - model_packager agent builds an OCI image ghcr.io/yourorg/iris_classifier:2 - Metadata (dependencies, training data hash, model card) pinned in the image labels - Model registered in the registry as iris_classifier:v2 (status: packaged)

2. Spin up shadow traffic#

swarm deployments shadow-start \
  --model iris_classifier \
  --challenger v2 \
  --champion v1 \
  --sample-rate 0.1 \
  --duration 24h

What happens: - 10% of incoming prediction requests are duplicated to v2; user gets v1's response - Both predictions logged to shadow_predictions table with correlation id - Runs for 24 hours (or until manually stopped)

During shadow:

swarm deployments shadow-status iris_classifier

Reports: - Request count matched - Agreement rate (both models gave same prediction) — typically 90-99% - Disagreement breakdown by class - Latency delta (p50, p99) - Fairness delta on protected attributes (if configured)

3. Compare champion vs challenger#

At any point during or after the shadow window:

swarm deployments compare \
  --model iris_classifier \
  --champion v1 \
  --challenger v2

Reports:

Metric	v1 (champion)	v2 (challenger)	Δ	Acceptable?
Accuracy	0.9412	0.9667	+2.55%	✅
Macro F1	0.9389	0.9674	+2.85%	✅
p99 latency	22ms	19ms	−14%	✅
Fairness (ΔDP on protected attr)	0.021	0.018	−14%	✅
Agreement	—	0.962	—	—

serving_monitor agent writes this as reports/champion_challenger_v1_v2.json.

4. Promotion gate (HITL)#

Promotion requires an operator approval because it's sensitive:

swarm deployments promote \
  --model iris_classifier \
  --challenger v2

This raises ApprovalRequired. Approve via:

CLIDashboardREST

swarm deployments approvals list
# gate_id: ga_abc123 | tool: promote_challenger | status: pending
swarm deployments approvals approve ga_abc123 \
  --comment "Challenger beats champion on all metrics; shadow 24h clean"

Deployments → Approvals pending → [gate] → review → Approve.

curl -X POST http://localhost:8000/api/v1/approvals/ga_abc123/approve \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"comment": "..."}'

On approval, swarm: 1. Re-runs the promote_challenger tool call 2. Marks v2 as the new champion in the registry 3. Routes 100% of traffic to v2 4. Retains v1 in "retired" state for N days (retention policy) 5. Writes an audit PDF documenting the whole promotion

5. Generate the promotion audit PDF#

swarm deployments audit \
  --model iris_classifier \
  --from-version v1 \
  --to-version v2

Produces reports/promotion_v1_to_v2_audit.pdf containing: - Shadow-traffic evidence (counts, disagreement, latency delta) - Champion-challenger comparison table - Fairness delta on protected attributes - Promotion approver identity + timestamp + comment - Tamper-evident SHA-256 manifest

See Reading the audit PDF.

6. Retire the old champion#

swarm deployments retire --model iris_classifier --version v1

Status changes to retired. Image not deleted (needed for rollback); removed from routing. After retention period (default 90 days), the image is garbage-collected by a cron job.

7. Rollback (if needed)#

swarm deployments rollback --model iris_classifier --to-version v1

Instant. No HITL gate (because rollback is a safety action, not a deploy).

Champion-challenger with compliance profile#

If the pipeline ran with --compliance rbi_free_ai: - Fairness audit is required before promotion (not optional) - explain_model must have produced SHAP values - Audit PDF template uses the RBI format - The accountability agent auto-attributes the promotion approver to the audit record

Next#

Generate an audit PDF — the regulator-format output
Schedule a drift check — post-deploy monitoring
Reading the audit PDF

Troubleshooting#

Shadow traffic shows 100% agreement

Either the challenger is identical to champion (verify v1_hash vs v2_hash), or your traffic is homogeneous (all requests hit the same prediction). Increase sample rate or wait longer.

Fairness delta is high (>5%)

Don't promote. Open an investigation. Usually means: - Training data has shifted against the protected attribute - Hyperparameter change had disparate impact - Bug in feature engineering

Use swarm pipelines run --problem "Investigate fairness drift on iris_classifier" to have the eval team look at it.

Approval gate fires but I'm the only operator

Approve yourself. The audit trail records you approved your own change; that's fine for design-partner phase. Production BFSI enforces separation of duties via RBAC policy.