Reading the audit PDF#
Section-by-section walkthrough. Every pipeline run with a compliance profile produces pipeline_runs/<run_id>/audit/audit_report.pdf. This page tells you what each section means and how to verify integrity.
Structure at a glance#
| Section | Pages | Who reads this? |
|---|---|---|
| 1. Cover | 1 | Everyone |
| 2. Executive summary | 1 | CRO, Model Risk Committee |
| 3. Model card | 2-3 | ML team, compliance |
| 4. Data governance | 1-2 | Data governance, compliance |
| 5. Fairness audit | 1-2 | Compliance, model risk |
| 6. Explainability | 2-4 | Model risk, product |
| 7. Drift baseline | 1 | MLOps, compliance |
| 8. Agent trail | 2-4 | Auditors |
| 9. Permission denials | 1-2 | Security, auditors |
| 10. Tamper-evident manifest | 1 | Auditors |
Total: 12-24 pages depending on model complexity and profile.
Section 1 — Cover#
Contains:
- Run ID (UUID, human-readable short form)
- Pipeline template + compliance profile used
- Model name + version
- Pipeline start + end timestamps (ISO-8601 with timezone)
- Compliance profile + version
- swarm version
- SHA-256 of the PDF itself (pinned in the footer)
- QR code linking to audit verification URL
Section 2 — Executive summary#
One paragraph, non-technical. Example:
This audit report covers the training and initial deployment of model
fraud_classifier_v2by the swarm platform on 2026-04-15. The model was trained on 50,000 synthetic transaction records using a logistic regression algorithm with elastic-net regularization, achieving 96.7% accuracy and 0.94 ROC-AUC on the held-out test set. Fairness metrics across the protected attributecardholder_regionshowed a demographic-parity delta of 0.021, well within the 0.05 threshold. SHAP explainability analysis was generated. The model is recommended for shadow-traffic deployment pending CRO sign-off.
One paragraph. Readable by a CRO.
Section 3 — Model card#
From reports/model_card.md. Full sections:
- Intended use — primary use case + foreseeable misuse
- Model details — algorithm, framework, version
- Training data — source, size, statistics, date, hash
- Performance metrics — accuracy, precision/recall, F1, AUC, per-class
- Evaluation data — how the held-out set was constructed; representativeness
- Ethical considerations — protected attributes considered; known limitations
- Caveats + recommendations — what this model should NOT be used for
For RBI / EU AI Act profiles: additional sections for Sutra 7 / Article 13 requirements.
Section 4 — Data governance#
- Data source lineage —
dataset_path,SHA-256of the file at ingest, upstream source if applicable - Columns — name, type, cardinality, missing-ness
- Protected attributes — flagged with legal basis for use
- Data quality score — from
data_validatoragent - Transformations applied — every step from raw data to features, in order
- PII/PHI handling — what was masked, hashed, or excluded (HIPAA profile)
Section 5 — Fairness audit#
From reports/fairness_audit.json. Rendered as:
- Protected attribute table — per-group accuracy, precision, recall, F1
- Metric deltas — demographic parity, equalized odds, equal opportunity; flagged green/yellow/red vs thresholds
- Mitigation actions taken — if any (e.g., reweighting, threshold adjustment per group)
- Residual disparity justification — if any metric is out of threshold, why it was accepted (written by the approver)
For RBI profile (Sutra 4): flags requiring Model Risk Committee sign-off.
Section 6 — Explainability#
From reports/shap_explanation.json. Includes:
- Global feature importance chart — top-20 features by mean absolute SHAP value
- Dependence plots — for the top-5 features
- Per-class importance — if multi-class classification
- Surrogate model — if the primary model is black-box (gradient boosting), a trained decision tree surrogate approximates it for regulator comprehension
- Local explanation example — one prediction walked through (anonymized)
Section 7 — Drift baseline#
Pinned feature distributions for future monitoring:
- Baseline window — date range used
- Per-feature statistics — mean, std, quantiles, unique-count if categorical
- Reference hash —
baseline_run_idfor downstream drift checks to compare against
Section 8 — Agent trail#
Compressed narrative of the pipeline run:
- Agents invoked — order + role
- Major decisions — what each agent decided + why
- Tool calls per agent — count + categorical breakdown
- Approval gates — who approved, when, with comment
- LLM call summary — provider, token counts, cost
Not the full conversation (that's in the evidence bundle conversations/*.jsonl). The agent trail is the readable summary.
Section 9 — Permission denials#
Every DENY decision during the run. Columns:
| Tool | Agent | Rule source | Reason | Timestamp |
|---|---|---|---|---|
export_raw_data |
data_cleaner |
policy:hipaa |
HIPAA: raw PHI export requires de-id workflow | 2026-04-15 14:22:03 |
deploy_serving |
— | rbac:operator |
Operator role cannot deploy to prod | 2026-04-15 14:25:17 |
Usually short. A clean run has 0-2 entries.
Section 10 — Tamper-evident manifest#
The most important page for auditors:
- PDF SHA-256 — hash of this PDF document
- Manifest SHA-256 — hash of the bundled
audit_report.sigfile - Artefact manifest — per-file SHA-256:
model.joblibmodel_card.mdfairness_audit.jsonshap_explanation.jsondrift_baseline.jsonrun_events.jsonlconversations/*.jsonl- swarm version + commit SHA — the code that produced this
- Certificate chain — platform signing key → fingerprint
Auditor verification:
Output:
Verifying audit_report.pdf...
Manifest found: audit_report.sig
PDF SHA-256: ok (3f8a2e...)
model.joblib: ok (a12b...)
model_card.md: ok (c45d...)
fairness_audit.json: ok (e67f...)
shap_explanation.json: ok (g89h...)
run_events.jsonl: ok (i12j...)
conversations: 6 files, all ok
Verdict: UNTAMPERED.
Audited at 2026-04-15T14:35:02Z by run_id=7f8e9a2b, swarm=v0.11.0.
If any file modified since audit: verify fails with the specific mismatch. That's what regulators look for.
Profile-specific sections#
RBI FREE-AI#
Adds Section 3.5 — Sutra compliance matrix — a 7-row table mapping each Sutra to the evidence artefact for this pipeline run.
HIPAA#
Adds Section 4.5 — De-identification audit — § 164.514 Safe Harbor or Expert Determination evidence.
EU AI Act#
Replaces Section 3 with the full Annex IV technical documentation (Article 11): 1. General description 2. Detailed description 3. Monitoring + testing results 4. Risk management system 5. Post-market monitoring plan
Customizing the template#
The audit PDF template is a YAML file + a ReportLab renderer. To customize for internal governance:
cp ml_team/tools/audit_templates/rbi_free_ai.yaml \
ml_team/tools/audit_templates/acme_bank_internal.yaml
# Edit — add branding, internal sections, different section ordering
swarm audit generate \
--run-id 7f8e9a2b \
--template acme_bank_internal \
--output acme_internal_audit.pdf
Templates use a simple DSL — sections, fields, Markdown + Jinja2. See ml_team/tools/audit_pdf.py for the schema.
When something's off#
If the auditor flags an issue:
- Tampering suspected —
swarm audit verifywill flag; preserve the file; contact us atsecurity@theaisingularity.org - Missing artefact — pipeline didn't produce a required output. Re-run with the correct profile. See How-to: Generate an audit PDF
- Content question — "why did
algorithm_selectorpick logistic regression?" is answered in the agent trail (Section 8) pointing to the reasoning inconversations/algorithm_selector.jsonl
Evidence bundle vs PDF#
The PDF is the narrative. The evidence bundle (swarm audit bundle --run-id ...) is the raw artefacts:
evidence_bundle.tar.gz
├── audit_report.pdf # the narrative
├── audit_report.sig # the manifest
├── MANIFEST.txt # human-readable summary
├── README.md # how to use this bundle
├── model.joblib
├── reports/
│ ├── model_card.md
│ ├── fairness_audit.json
│ └── shap_explanation.json
├── run_events.jsonl
└── conversations/
└── (all agent journals)
Regulators who want everything get the bundle. Those who want a readable summary get the PDF.
Retention#
Audit PDFs retained per jurisdiction default:
- RBI: 7 years
- HIPAA: 6 years (per § 164.316)
- EU AI Act: 10 years (Art. 12)
- SOC 2: 1 year minimum; 3 years recommended
Override via SWARM_RETENTION_AUDIT_PDF_DAYS.
Next#
- How-to: Generate an audit PDF
- RBI FREE-AI — India BFSI context
- HIPAA — US healthcare context
- EU AI Act — EU high-risk context