Features Catalogue#

Canonical source: docs/FEATURES.md in the repo. This page is rendered via include-markdown and stays in sync automatically on every push to master.

Every feature organized by point of view — product / technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply chain / performance / UX. 573 numbered features. Each has a file-path or test-file citation for verifiability.

Version: v0.12.0 Last updated: 2026-04-22 Scope: every feature, organized by point of view. Technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply-chain. Every feature has a file-path or test-file citation for verifiability.

For module-organized reference, see docs/REFERENCE.md. For architecture narrative, see MASTER_README.md.

Table of contents#

Product / Customer POV — what you can do with it
Technical / Engineering POV — what's under the hood
Security POV — guardrails + controls
Compliance / Regulatory POV — framework mappings
Operator POV — day-to-day running
Developer / Integrator POV — how to extend
ML Practitioner POV — ML-specific capabilities
Auditor POV — evidence artifacts
CISO / Procurement POV — evaluation artifacts
Supply Chain POV — build + release integrity
Performance + scale POV
UX / Dashboard POV

1. Product / Customer POV#

1.1 Autonomous end-to-end ML pipeline#

Accept a problem statement + a dataset path; return a trained model, fairness audit, drift report, SHAP explanations, model card, audit PDF, Docker image, and K8s manifests — end-to-end autonomous (POST /api/v1/pipelines)
3 pre-built pipeline configs (fast_prototype · default_ml_pipeline · parallel_research) + customer-authored YAML workflows (lib/workflows/)
Human-in-the-loop gates at 6 points: deploy · data request · manual · security · cost · custom (ml_team/core/approval.py)
Checkpoint-and-resume on every HITL gate — pipeline state persists through pauses (ml_team/core/approval.py::ApprovalGate)
Run cancellation via DELETE /api/v1/pipelines/{run_id} — graceful shutdown with partial artefact preservation
Real-time run status + WebSocket live-streaming (/api/v1/pipelines/{id}/ws)

1.2 Multi-agent architecture#

40 specialized ML agents across 7 teams (Data · Algorithm · Training · Evaluation · Deployment · Quality · Management) (lib/agents/base/)
3-tier hierarchy: director → coordinator → worker with transfer_to_* delegation (ml_team/core/orchestrator.py)
ReAct loop per agent — LLM decides → calls tool → consumes result → iterates (ml_team/core/agent_runner.py)
Evaluator-generator separation — optional per-agent rubric grading (ml_team/core/evaluator.py)
Context compaction at 80% of window to keep long runs alive (ml_team/core/context_compaction.py)
Customer-specific agents via YAML extends: overlays (ml_team/core/agent_composer.py)

1.3 Multi-provider LLM support#

Anthropic Claude (first-class)
OpenAI GPT-4o / o1 (first-class)
vLLM — local GPU serving for air-gapped deployments
Ollama — quantised local models
Single-model override for all agents (dev/testing)
Per-agent provider selection (mix small local + frontier in one pipeline) (ml_team/core/llm_client.py)

1.4 Customer-composable deployment model#

Three-layer architecture: core/ (never forked) + lib/ (versioned shelf) + deployments/<customer>/ (composition) (v0.12.0)
SWARM_DEPLOYMENT=deployments/hdfc_bank at boot selects which customer's full stack runs (ml_team/core/deployment_loader.py)
Three deployment templates: generic_ml · bfsi_baseline (RBI FREE-AI) · hipaa_baseline (stub) (lib/templates/)
Per-deployment branding (product_name · logo · colors · compliance badges) surfaced at /api/v1/config/branding (ml_team/api/routers/config.py)
Per-deployment knowledge bases (RAG corpus) in deployments/<customer>/knowledge/
Per-deployment permission policy + retention overrides (lib/templates/<>/retention.yaml)

1.5 Dashboard#

Next.js 15 + React 19 + TypeScript operator dashboard
Live pipeline view with conversation tree · trace · cost breakdown · timeline
Real-time updates via WebSocket (no polling)
OIDC login (Okta · Azure AD · Google Workspace) + username/password fallback
Champion / challenger deployment management view
40-agent roster with per-agent tool sets + conversation trails

1.6 CLI (`swarm`)#

Auth: login / logout / whoami / health
Feature flags: features list|get|set|reset
Pipelines: pipelines list|run|status|cancel
Plugins: plugins list|inspect|install
Cron: cron list|create|run|delete|runs
Batch: batch list|submit|status|results|resume
Deployments (runtime): deployments list|promote|retire
Ship pipeline (v0.12.0): deploy new|validate|ship|whitepaper

1.7 Pricing + deployment posture (product positioning)#

VPC-installable — data never leaves customer network (on-prem / AWS ap-south-1 / Azure Pune / Yotta)
Single-tenant per customer (multi-tenancy deferred)
$25K pilot → $75K deployment → $10K/mo retainer framework (see MASTER_README.md § Pricing)
First production model target: 8 weeks from kickoff

2. Technical / Engineering POV#

2.1 Runtime#

Python 3.11+ (tested on 3.11 · 3.12)
FastAPI 0.115+ async backend, Starlette ASGI
Pydantic v2.9+ with extra="forbid" on every manifest schema
Next.js 15 SSR + server components dashboard
Thread-local SQLite connection pool with WAL mode + FK enforcement (ml_team/api/database.py)
HTTP connection pool + prompt cache + schema cache (W1 optimizations)

2.2 Data persistence tiers#

SQLite (primary) — runs · users · permission_denials · lineage (G14) · champion-challenger · approvals · plugin installs (12 tables)
JSONL per agent — conversation store with buffered flushes (10 msgs or 1s threshold)
Postgres (optional) — cross-project memory for multi-project orgs
ChromaDB + keyword fallback — RAG
Per-run work_dir — scratch for metrics · model cards · fairness JSON · SHAP · drift · audit PDFs

2.3 Agent runtime internals#

Supervisor-worker orchestration, custom implementation (not LangChain) (ml_team/core/orchestrator.py)
Two swappable backends: native + LangGraph sharing one agent config
CrewAI adapter (legacy migration path)
Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
Per-agent memory (ephemeral during run) + cross-run learning via save_agent_learning
Delegation tool-call synthesis (transfer_to_<agent>) injected at supervisor level

2.4 Permission engine (8 rule sources)#

RBAC source — translates require_role(min) into DENY rules
Agent allowlist source — per-agent tool_set enforcement
Feature flag source — auto-skip tools when their flag is off
HITL source — approval-required tools
Policy source — operator-authored permission_policies.yaml
Profile source — template-baked rules (BFSI etc.); emits at priority 45 (ask) or 60 (deny)
Compliance-gate source — runtime gate verdicts become DENY rules (priority 55)
Egress allowlist source (G1, v0.12.0) — URL walker + host classifier
Tier-aware resolution: ALLOW > DENY > ASK > default with priority tiebreaks (ml_team/core/permissions.py)
Invariant-DENY floor at priority 60 — profile rules beat operator POLICY ALLOW at 50

2.5 Hook lifecycle (10 integration points)#

SESSION_START · PRE_TOOL · POST_TOOL · PRE_COMPACTION · POST_COMPACTION (pre-Track 2)
PRE_LLM · POST_LLM · STORAGE_WRITE · LLM_CALL_WRAPPER · AGENT_DELEGATE (v0.12.0)
Plugin-loaded hooks compose with core hooks (same pipeline)
Shell-command hooks ({"type": "command", ...}) executed behind feature flag with rlimits (ml_team/core/shell_hook_runner.py)
Per-execution audit rows in plugin_shell_executions SQLite table

2.6 Guardrails runtime (v0.12.0)#

GuardrailRegistry (thread-safe) with @register(point, id, priority) decorator
Priority-sorted handler evaluation per integration point
Outcome model: ALLOW chains · REDACT threads payload · DENY short-circuits · ERROR fails open unless invariant
15 guardrails split across 6 categories (detail in § 3)
Auto-configuration from deployment guardrail_configs block via guardrail_bootstrap.bootstrap_from_deployment()
Prometheus counters (guardrail_triggered_total, guardrail_bypass_attempts_total) + duration histogram

2.7 Authentication + authorization#

JWT Bearer (HS256, 24h TTL, rotatable secret via Doppler)
Legacy X-API-Key header (admin-equivalent, backcompat)
OIDC SSO — Okta / Azure AD / Google Workspace via Authlib (PKCE + state cookie)
IdP group → swarm role mapping via ML_TEAM_OIDC_ROLE_MAP
3 roles: viewer / operator / admin
require_role(min_role) FastAPI dependency routes denials through permission engine
bcrypt password hashing (12 rounds)
Admin bootstrap from env vars at first boot

2.8 Observability#

Prometheus metrics on every subsystem (28+ counters/histograms) at /api/v1/metrics
OpenTelemetry spans with parent/child, tokens, costs
Real-time WebSocket streaming of pipeline events
Structured JSON logs (ml_team/core/logging_config.py)
G6 credential filter at root logger (all logs scrubbed at source)
Per-agent conversation JSONL (durable, grep-friendly)

2.9 Plugin ecosystem (Claude Code marketplace format)#

Skills · MCPs · hooks · commands · agents — all installable via one .mcp.json + manifest
MCP client over JSON-RPC (stdio + SSE) (ml_team/core/mcp_client.py)
Install-time manifest validation + install-drop tracking (scan_install_drops)
Namespace isolation — plugin agents forced to plugin-<name>::<agent> so no shadowing
Feature-flag gated plugin shell hooks with invoke-time validation
CC marketplace compat — tested against superpowers v5.0.7 (100% surface retention)

2.10 Ship pipeline (v0.12.0)#

swarm deploy new <customer> --template=<> scaffolds deployments/<customer>/ (ml_team/deploy/scaffold.py)
swarm deploy validate — lint config + lib refs + customer-name match (ml_team/deploy/validator.py)
swarm deploy ship — build-time positive-list filter · MANIFEST.yaml · whitepaper (ml_team/deploy/ship.py)
swarm deploy whitepaper — 5-section markdown whitepaper (ml_team/deploy/whitepaper.py)
Per-customer build-time isolation — other customers' deployments/<other>/ excluded at tarball build time (tested at tar-member level in test_ship_excludes_other_customer_dirs)
Deterministic MANIFEST.yaml with pinned lib versions + config SHA-256 + build commit + host + timestamp (sort-keyed YAML)

2.11 Ops primitives (W7)#

Cron scheduler — 60s tick, file-backed store, 4 task kinds (retrain · drift_check · audit_pdf · custom) (ml_team/core/cron.py)
Batch runner — JSONL → processor → results.jsonl; checkpoints every 10 records; resume-on-restart (ml_team/core/batch.py)
Retention daemon — 24h sweep, per-artefact TTLs (2555d BFSI default) (ml_team/core/retention.py)
Feature-flag registry with 3 tiers: INVARIANT / FLAG / USER_OVERRIDE (ml_team/core/feature_flags.py)

2.12 Tests (1258 total, 2 skipped)#

Unit tests per guardrail (17-30 each)
Integration tests via FastAPI TestClient (23 routers)
End-to-end BFSI baseline test — biased-model → deployment blocked
Tamper-evident bundle hash regression test
Snapshot parity tests (pre- vs post-refactor byte-identical dicts)
Full regression gate in CI (matrix py3.11 + py3.12)
Nightly real-LLM golden-path run (.github/workflows/nightly-e2e.yml)
Performance bench baselines (ml_team/tests/bench/) with nightly diff

2.13 CI/CD#

GitHub Actions on every push + PR to master (5 workflows)
Lint (ruff + mypy) · pytest (matrix) · bandit HIGH gate · semgrep ERROR gate (ci.yml)
Release workflow on v*.*.* tags with signed-commit gate + SBOM + Cosign (release-supply-chain.yml)
SARIF upload to GitHub Security tab for semgrep findings
Coverage XML artifact upload
Advisory doc-drift check (doc-drift.yml)

3. Security POV#

3.1 Network controls#

G1 Egress allowlist (in-process) — URL walker + RFC1918/loopback/link-local/ULA block (ml_team/core/egress_allowlist.py)
Suffix patterns (*.example.com) + fnmatch allow-patterns
Scheme-level block list (file, gopher, dict, etc.)
Literal-string host matching (no DNS); mitmproxy sidecar for DNS-rebind coming in follow-up
Audit trail via permission_denials with source=egress_allowlist

3.2 Execution sandbox#

G2 Python sandbox driver abstraction — SandboxDriver Protocol (ml_team/core/python_sandbox.py)
nsjail driver (Linux production) — seccomp + user namespace + RO rootfs + net namespace
Docker driver (macOS dev) — throwaway python:3.11-slim with --network=none --read-only --rm
Subprocess driver (portable fallback) with loud WARN on each invocation
strict=True + unavailable driver → API boot aborts (no silent degrade)
Per-call resource limits: memory_mb / cpu_time_sec / allow_network / allow_paths

3.3 Input safety#

G3 Prompt-injection heuristic — 25 patterns (12 high + 13 medium severity) at PRE_LLM priority 70
Occurrence counting via re.findall (multi-hit aware)
OpenAI multi-part + bare-string + single-dict payload shapes handled
System role skipped by default (operator authoring system prompts isn't attacking)
Honest ceiling disclosed in manifest: ~55-65% recall vs Lakera ~80-87%

3.4 Output + persistence safety#

G4 PII detection — 12 regex detectors with structural validation (ml_team/core/pii/regex_detectors.py)
Luhn check on credit cards (drops false-positive digit runs)
Verhoeff check on Aadhaar (kills false-positive 12-digit strings)
Indian BFSI recognisers: Aadhaar · PAN · IFSC · IN_PHONE
International: EMAIL · CREDIT_CARD · IBAN · IPV4 · IPV6 · PRIVATE_KEY_BLOCK · US_SSN · US_PHONE
3 action modes: redact / mask (keep first 2 + last 2) / hash (SHA-256 prefix)
Overlap resolver: higher-confidence wins → longer span → registration order
Registered at POST_LLM + POST_TOOL + STORAGE_WRITE (three integration points)
Optional Microsoft Presidio shim (lazy import; 300MB spaCy model opt-in) (ml_team/core/pii/presidio_shim.py)
G5 Conversation JSONL scrubber — wraps ConversationStore._flush_locked (ml_team/core/conversation_scrubber.py)
Recursive value walk (scans strings in dicts + lists)
Byte-identical output on zero-mutation lines (no disk churn)
_redacted: true tag on mutated lines (audit grep)
Non-JSON lines pass through untouched (defensive)
G6 Logs credential filter — logging.Filter subclass at root logger
13 known-secret regex patterns (sk-…, ghp_…, AKIA…, BEGIN PRIVATE KEY, JWT, etc.)
Anthropic before OpenAI via negative lookahead sk-(?!ant-)
Shannon-entropy fallback ≥4.2 bits/char on ≥32-char tokens
Scrubs record.__dict__ fields too (structured logging caught)
Replaces with [REDACTED_SECRET_<sha256[:8]>] (stable placeholder allows correlation without leaking value)

3.5 Rate limiting + cost controls#

G7 Per-user rate limit — composite (caller_identity, endpoint_class) key (ml_team/api/rate_limit.py)
Identity precedence: X-API-Key (SHA-256[:12]) → JWT sub → client IP
Per-role limits: viewer 100r/0w · operator 600r/20w · admin 2000r/100w per minute (env-configurable)
Response headers: X-RateLimit-{Limit,Remaining,Role}

3.6 Agent safety#

G10 Delegation loop detector — stack depth cap + fan-out cap + same-args dedup (lib/guardrails/platform_integrity/delegation_loop_detector/)
Default caps: max_depth=5, max_delegations_per_run=50
Two scopes: strict (agent + args_hash) or name_only
Per-run state dict (isolated across runs)
G16 HITL TTL + escalation — ApprovalGate fields + cron sweep (ml_team/core/hitl_sweep.py)
Pure function sweep(store, notifier, now) → SweepReport
Escalation fires before expiry in the same sweep (documented ordering)
Monotonic timer for expiry + wall-clock for display (clock-skew safe)

3.7 Data protection at rest#

G12 Encryption at rest — AES-GCM-256 envelope (ml_team/core/encryption.py)
3 at-rest driver options: sqlite_host_fs_only (default) · sqlcipher · postgres_pgcrypto
Per-call DEK + customer-wrapped KEK
AAD = sorted context dict; decrypt with mismatched context fails AEAD integrity
WrappedDek + Ciphertext dataclasses, JSON-serialisable end-to-end
G13 BYOK KeyProvider Protocol — 5 implementations
StubProvider (tests, deterministic base64)
EnvKeyProvider (dev via SWARM_KEK env var)
AwsKmsProvider (production — shells to aws kms encrypt/decrypt)
GcpKmsProvider (fail-fast stub pending customer demand)
VaultTransitProvider (fail-fast stub pending customer demand)
Threat model explicitly scoped: "cold DB file / disk image" ≠ "attacker with RCE on running API"

3.8 Audit trail + erasure#

G14 Data lineage — 3 SQLite tables with enforced FKs (ml_team/api/database.py)
datasets (dataset_id PK + consent_doc_ref G15 index)
lineage_models (model_id PK + dataset_id FK ON DELETE SET NULL)
lineage_deployments (deployment_id PK + model_id FK ON DELETE CASCADE)
chain_for_deployment(deployment_id) returns full joined dict for audit PDF
Helper APIs: record_dataset/model/deployment (idempotent upserts) (ml_team/core/lineage.py)
G15 Right-to-be-forgotten — admin-only endpoint + signed receipt (ml_team/api/routers/subjects.py)
GET /api/v1/subjects/{id}/preview (dry-run)
DELETE /api/v1/subjects/{id} (execute)
Cascade delete of dataset rows · FK sets model dataset_id to NULL · conversation JSONL rewritten to tombstones
ErasureReceipt dataclass with SHA-256 signature over sorted JSON (ml_team/core/rtbf.py)
Regex metacharacters in subject_id escaped (no ReDoS)
_subject_pattern uses re.escape (tested in test_regex_metacharacters_in_subject_id_are_escaped)
Tombstone preserves line order + count (downstream JSONL parsers still work)

3.9 Audit PDF signing#

G11 Audit PDF signing — 4-driver abstraction (ml_team/core/audit_signer.py)
Stub driver (deterministic test signature)
Offline Ed25519 driver (air-gap, key on build host)
Cosign KMS driver (AWS/GCP/Vault — BFSI default)
Cosign keyless driver (Sigstore + GitHub OIDC — SaaS default)
Sidecar receipt JSON written alongside .sig + .pem files (canonical source for downstream verifiers)
Rekor log index captured for keyless signatures

3.10 Supply-chain integrity#

G17 SBOM + signed commits — CI gate (.github/workflows/release-supply-chain.yml)
Every commit in release range verified via git log %G? (non-G/U fails workflow)
CycloneDX 1.5 JSON SBOM via cyclonedx-bom (scripts/gen_sbom.py)
Cosign keyless tarball + SBOM signing via GitHub OIDC
Base-image cosign verify on python:3.12-slim (WARN-not-block on upstream policy changes)
6 signed release assets per v*.*.* tag
Rekor transparency log entry for every signed artefact

3.11 Cryptography inventory#

AES-GCM-256 (envelope encryption)
Ed25519 (offline audit signing)
ECDSA-P256 (Cosign)
bcrypt 12 rounds (password at rest)
HS256 (JWT)
SHA-256 (hashing, fingerprints)
No deprecated algorithms — no MD5 for auth, no SHA-1 for signing, no RC4/DES/3DES anywhere

3.12 RBAC + access control#

3-role enum: viewer < operator < admin
Per-endpoint role guards on all sensitive routes
Admin-only endpoints: /subjects/* (G15 RTBF), /permissions/denials (audit), /features/* (flag mutation)
JWT revocation: grace-period + secret rotation (documented; no revocation list yet)
Session state in JWT only (no server-side session store to compromise)

3.13 Static security analysis#

bandit (HIGH severity gate in CI) (ml_team/pyproject.toml::[tool.bandit])
semgrep p/python + p/security-audit (ERROR severity gate)
MEDIUM findings reviewed and annotated inline with # nosec BXXX + rationale (4 suppressions)
SARIF upload to GitHub Security tab for every scan
0 HIGH bandit findings · 0 ERROR semgrep findings at v0.12.0

3.14 Threat model#

STRIDE analysis across 9 critical assets (.project/security/threat_model.md)
DREAD scores on every row
Top-10 residual risk register with owners + status
Cross-cutting attack scenarios documented (prompt-injection exfil chain, compromised dev laptop, backup-exfiltration + RTBF, supply-chain via transitive dep)
Quarterly review cadence documented

4. Compliance / Regulatory POV#

4.1 Framework coverage (8 frameworks)#

RBI FREE-AI (India) — Pillars 2, 3, 5, 6 full; 1, 4, 7 partial
DPDP Act 2023 (India) — §§ 8, 10(8), 12 full
EU AI Act (high-risk) — Arts. 10, 12, 14, 15 full; Art. 13 partial
HIPAA Security Rule — 164.308, 164.312, 164.514, 164.528 (controls present; BAA template pending)
GDPR — Arts. 5, 17, 22, 30, 32 (controls present)
SOC 2 — CC6.x, CC7.x, CC8.1 (design ready; Type I readiness Q2 2026)
OWASP LLM Top 10 — direct controls for LLM01, 02, 04, 05, 06, 08, 10
NIST AI RMF 1.0 — Govern 1.4/1.6/1.7, Map 2.3/4.1, Manage 2.2/2.3

4.2 BFSI / Indian-specific controls#

RBI FREE-AI Pillar 2 (Consent) — G4 PII detection + G15 erasure
Pillar 3 (Robustness) — G3 prompt-injection + G10 delegation-loop + drift/fairness gates
Pillar 5 (Accountability) — G14 data lineage + G11 audit-PDF signing
Pillar 6 (Security) — G1 + G2 + G6 + G12 + G13 + G17
DPDP Act § 8 (data fiduciary) — G14 + G15
DPDP Act § 10(8) (named DPO) — self-designated 2026-04-22 (SECURITY.md)
DPDP Act § 12 (erasure) — G15 with signed receipt
CERT-In 6-hour breach notification — runbook pending; technical capability (audit logs + metrics) in place
RBI 7-year retention (2555 days) — BFSI baseline default

4.3 Compliance artefact generation#

Model card (Markdown, RBI-aligned structure) via tools/model_card.py
Fairness audit JSON (fairlearn MetricFrame, per-group metrics) via tools/fairness.py
Drift report (PSI + KS + chi², BFSI thresholds 0.10/0.25) via tools/drift.py
SHAP explanations JSON via tools/explainability.py
Audit PDF with tamper-evident source-bundle SHA-256 on cover
Signed conversation JSONL per agent (retention-policy governed)
Retention log (retention_log.json) documenting every artefact deletion
permission_denials SQLite table — every denial source-attributed

4.4 Invariant-DENY guarantees#

Profile DENY rules emit at priority 60 → CANNOT be overridden by operator POLICY ALLOW at 50
Invariant tier feature flags CANNOT be toggled at runtime
strict=True guardrails fail API boot if driver unavailable (no silent degrade)
Tested end-to-end in test_bfsi_baseline_e2e.py

4.5 Procurement artefact pack#

1-page security architecture diagram (.project/security/architecture.md)
STRIDE threat model with DREAD scoring (.project/security/threat_model.md)
Pre-filled CAIQ v4.0.3 questionnaire (60 Qs, ~75% Y/Y+P) (.project/security/caiq_lite.md)
Commit-signing setup guide (.project/security/signing_setup.md)
README pack index with audience-routing guide (.project/security/README.md)

5. Operator POV#

5.1 Installation#

pip install -e "ml_team/.[ml]" — one-step install on a clean Python 3.12 venv
.env template for OPENAI_API_KEY + JWT secret + admin bootstrap
uvicorn ml_team.api.app:app — zero-config local start
SQLite default (no DB service to run)
Optional Docker image (planned)

5.2 Configuration#

25+ environment variables (documented in README.md § Configuration reference)
YAML deployment config (deployments/<customer>/config.yaml) — declarative, Pydantic-validated
Per-template profile defaults (lib/templates/<>/)
Per-customer overrides (branding · retention · guardrail configs)
Operator-authored custom policy rules (ml_team/config/permission_policies.yaml)

5.3 Dashboards#

/ — pipelines list + live feed
/pipelines/[id] — drill-down
/deployments — champion/challenger
/agents — 40-agent roster
/transparency — denial log · retention · cron · batch (one operator-facing "everything" page)
/cron — scheduler
/plugins — marketplace install + inspect
/knowledge — RAG corpus management
/settings — feature flags (admin only)
/docs — in-app documentation browser

5.4 Monitoring + alerting#

Prometheus /api/v1/metrics endpoint (unauthenticated, industry standard)
28+ counters + histograms covering every subsystem
Key counters: pipelines_started_total · llm_calls_total{agent,model} · tool_calls_total · tool_denied_total · permission_denials_total · guardrail_triggered_total{name,outcome} · guardrail_bypass_attempts_total · active_pipelines
Key histograms: pipeline_duration_seconds · llm_call_duration_seconds · guardrail_evaluation_duration_seconds
OpenTelemetry traces (parent/child) with token/cost metadata
Structured JSON logs (Loki/Splunk/CloudWatch-ready)
WebSocket event streaming for real-time dashboard updates

5.5 Ops primitives#

Cron scheduler with 4 task kinds (retrain, drift_check, audit_pdf, custom)
Batch runner (JSONL → processor → results.jsonl, 10-record checkpoints)
Retention daemon (24h sweep, per-artefact TTLs)
Feature flag admin UI (/settings)
Runtime feature-flag overrides via POST /api/v1/features/{name} (admin)
HITL approval UI at /pipelines/[id] (gate type + rationale surfaced)

5.6 Secrets management#

Doppler per-customer projects (swarm-<customer>-{dev,staging,prod})
.env file fallback for dev
SOPS + age migration path documented (no vendor lock-in)
swarm deploy rotate-secret planned (not yet implemented)

5.7 Backups + disaster recovery#

SQLite dumpable via sqlite3 .dump (trivial backup)
JSONL files rsync-friendly (append-only)
Host-FS encryption recommended (LUKS · EBS-KMS · GCP PD · Azure Disk encryption)
Customer-controlled backup policy (documented in deployment runbook)
Postgres migration path for multi-node HA (not yet executed)

5.8 Governance contacts#

Named DPO (DPDP Act § 10(8)) — security@theaisingularity.org
Named Security Officer — same contact
24h acknowledgement SLA + 5-business-day substantive SLA
CERT-In 6h mandate when applicable

6. Developer / Integrator POV#

6.1 Tool authoring#

Plain Python functions with type hints become tools automatically
Docstring doubles as LLM-facing tool description
JSON schema auto-generated from type hints (Pydantic under the hood)
Lib asset manifest for versioned distribution (lib/tools/<id>/tool.yaml)
Per-tool tests in lib/tools/<id>/tests/

6.2 Agent authoring#

Author YAML manifest under lib/agents/base/<id>/agent.yaml — no Python required
extends: lib/agents/base/<base>@vX.Y.Z composition
Overlay actions: system_prompt_append · system_prompt_prepend · system_prompt_replace · field replace
knowledge_bases: list pointing at Markdown/PDF corpus files
tool_set: referencing lib tools + custom per-deployment tools
tier: (director / coordinator / worker) with routing implications
model: per-agent override (frontier for directors, small for workers)
evaluator_rubric: optional grading criteria

6.3 Guardrail authoring#

Create lib/guardrails/<category>/<id>/guardrail.yaml + enforce.py
Register at one or more integration points via @register(point, "id", priority=N)
Handler signature: (payload: dict) -> GuardrailResult
configure() function at module level → add to guardrail_bootstrap._CONFIGURATORS dispatch table
Tests alongside the guardrail (recommended: one red-team test minimum)

6.4 Plugin authoring (CC marketplace format)#

Standard Claude Code plugin layout: .mcp.json + skills/ + hooks/ + commands/ + agents/
Install via swarm plugins install <name>
Hooks can be Python or shell (shell gated by plugin_shell_hooks_enabled flag)
Commands with $ARGUMENTS substitution
Agents namespace-isolated (plugin-<name>::<agent>)
Install-drop telemetry shows silently-skipped surfaces

6.5 Workflow authoring#

YAML manifest at lib/workflows/<id>/workflow.yaml
Graph model: nodes (stage/router/checkpoint) + edges (source/target, priority, metadata)
Node types: stage (tool call), router (branching), checkpoint (HITL pause)
Hot-reload via SWARM_DEPLOYMENT restart (no mid-run reload)

6.6 Permission rule authoring#

YAML in permission_baseline.yaml for profile templates OR ml_team/config/permission_policies.yaml for operator rules
Fields: id · tool_name (glob) · behavior (allow/deny/ask) · pattern (regex on args JSON) · priority · reason
Rules loaded at boot; restart to reload

6.7 Compliance gate authoring#

YAML in compliance_gates.yaml per profile template
Fields: id · triggers_on_tool · computes_via_tool · deny_if (restricted-eval expression) · blocks_tool
Restricted eval uses _SAFE_BUILTINS (no open/exec/import)

6.8 Deployment template authoring#

Create lib/templates/<name>/ with template.yaml · permission_baseline.yaml · compliance_gates.yaml · retention.yaml · branding.json · README.md
Referenced from deployments via based_on: lib/templates/<name>@vX.Y.Z
Auto-inherited by every deployment using the template

6.9 REST API#

OpenAPI 3.0 auto-generated at /docs (dev mode)
23 routers covering pipelines · agents · models · evaluations · MCP · knowledge · chat · datasets · inference · deployments · features · plugins · permissions · cron · batch · subjects · auth · config · docs
Consistent auth: Bearer JWT or X-API-Key
Structured error responses via FastAPI/Pydantic

6.10 CLI library#

stdlib argparse + httpx (no new deps)
Persistent JWT at ~/.swarm/token
JSON output by default; --table for compact view on list commands
Exit non-zero on API errors

7. ML Practitioner POV#

7.1 Classification#

train_classifier tool — LightGBM / XGBoost / RandomForest / Logistic (ml_team/tools/training.py)
Stratified split (sklearn) with customer-configurable ratio
Metrics sidecar (JSON): accuracy · precision · recall · F1 · AUC · per-class
Post-save verification (model must load + predict correctly)
Optional MLflow logging

7.2 Drift detection#

detect_drift tool — 3 statistical tests (ml_team/tools/drift.py)
Population Stability Index (PSI) with BFSI thresholds 0.10 / 0.25
Kolmogorov-Smirnov test (continuous features)
Chi-squared test (categorical features)
Per-feature JSON output + aggregate drift score
bfsi_drift_baseline_gate — deploy refused when max PSI > 0.25

7.3 Fairness audit#

audit_fairness tool — fairlearn MetricFrame (ml_team/tools/fairness.py)
Per-protected-attribute group metrics: accuracy · precision · recall · F1 · selection rate
Binary disparate-impact scalars: demographic parity · equal opportunity · equalized odds
bfsi_fairness_gate — deploy refused when demographic parity > 0.1

7.4 Explainability#

explain_model tool — SHAP (ml_team/tools/explainability.py)
Tree-native TreeExplainer fast path (100× faster than KernelExplainer)
Generic KernelExplainer fallback for non-tree models
Top-N feature importance + per-sample attribution
JSON output consumed by model card generator

7.5 Model card#

generate_model_card tool (ml_team/tools/model_card.py)
Markdown document following RBI FREE-AI structure
Sections: metadata · training data · metrics · fairness · drift · SHAP · intended use · limitations · contact
Embedded in audit PDF

7.6 Deployment artefacts#

package_model — Dockerfile + serve.py (FastAPI /predict + /health) + requirements.txt (ml_team/tools/deploy.py)
generate_k8s_manifests — Deployment + Service + HPA (Horizontal Pod Autoscaler)
BFSI-hardened defaults: runAsNonRoot · readOnlyRootFilesystem · allowPrivilegeEscalation: false · capabilities.drop: ['ALL']
Optional docker build step
Post-build verification (image must run + respond to /health)

7.7 Champion-challenger#

Model registry with deployment_id · traffic_pct · environment (ml_team/tools/champion_challenger.py)
Atomic champion promotion (old champion → retired; new → active)
Shadow prediction log for challenger A/B analysis
Configurable agreement thresholds before promotion
Full history via GET /api/v1/deployments/{model} + UI at /deployments

7.8 Batch inference#

Batch runner: JSONL input → processor → results.jsonl output (ml_team/core/batch.py)
3 processor kinds: inference (calls a registered model) · echo (debug) · custom (operator-authored)
Checkpoints every 10 records
Resume-on-restart from last checkpoint
Streams results via GET /api/v1/batch/{id}/results

7.9 Experiment tracking#

Optional MLflow integration (ml_team/tools/mlflow_tools.py)
Auto-log training run if MLflow URI set

7.10 Memory#

Per-run JSON memory in work_dir
Cross-run SQLite memory (recall_past_runs, save_agent_learning)
Cross-project Postgres memory (optional, multi-project orgs)
RAG retrieval via ChromaDB (keyword fallback when embeddings unavailable)

8. Auditor POV#

8.1 On-disk evidence artefacts (per run)#

pipeline_runs/<run_id>/metrics.json — training metrics sidecar
pipeline_runs/<run_id>/*_card.md — model card
pipeline_runs/<run_id>/fairness.json — fairness audit output
pipeline_runs/<run_id>/drift.json — drift report
pipeline_runs/<run_id>/shap.json — SHAP explanations
pipeline_runs/<run_id>/approvals.json — HITL gate state (who approved, when, rationale)
pipeline_runs/<run_id>/audit_report_<run_id>.pdf — signed audit PDF
pipeline_runs/<run_id>/conversations/<agent>.jsonl — per-agent full turn log
pipeline_runs/<run_id>/conversations/_index.json — agent hierarchy + timing

8.2 SQLite compliance ledger#

permission_denials table — every denial with source · reason · run_id · agent · ts
runs table — pipeline run state + profile_at_creation
run_events table — per-run event stream
model_deployments table — champion/challenger history
shadow_predictions table — challenger agreement log
approvals table — HITL gate state
datasets table — dataset lineage with consent_doc_ref
lineage_models table — training run lineage
lineage_deployments table — deployment lineage
plugin_installations + plugin_shell_executions — plugin audit

8.3 Audit PDF content#

Cover page with: run ID · profile · build timestamp · bundle SHA-256 · signature + cert fingerprint · Rekor index
Metrics section (training + per-class)
Model card (Markdown embedded)
Fairness section (per-group + scalar disparate-impact)
Drift section (per-feature + aggregate)
SHAP section (top feature importance)
Approval gates section (who approved each HITL gate)
Conversation summaries (per-agent message + tool-denial counts, full JSONL not embedded for size)
Data lineage section (G14: dataset → model → deployment chain)

8.4 Retention#

2555-day (7 year) retention for BFSI — configurable per artefact class
Daemon sweep every 24h (ml_team/core/retention.py)
Deletion logged to retention_log.json (immutable append log)
Classes: conversations_days · run_events_days · audit_pdfs_days · shadow_predictions_days

8.5 Audit queries (SQL-greppable)#

"Who denied what in the last 24h?" — one SQL query on permission_denials
"Which dataset trained which model?" — chain_for_deployment(deployment_id)
"Every G1 egress denial by target host" — GROUP BY on permission_denials.reason
"Every model that trained on subject X" — subjects_in_dataset(consent_doc_ref) + models_for_dataset
"Every model card produced in Q3" — SQLite timestamp range

8.6 RTBF verification#

Signed erasure receipt with SHA-256 over sorted JSON (excluding signature field)
ErasureReceipt.verify() — rehash + compare
Tombstone marker [SUBJECT_DELETED] with deleted_subject + deleted_at fields
Audit row preserved (tombstone, not true delete — regulator's evidence survives)
Line-for-line replacement (preserves ordering + line count)

8.7 Conversation tamper-evidence#

Per-line sequence numbers (monotonic per agent)
JSONL flush lock (thread-safe buffered writes)
Audit PDF bundle hash includes conversation filename + size manifest
Bundle hash stable under identical source artefacts

9. CISO / Procurement POV#

9.1 One-command verification#

cosign verify-blob on the release tarball (reproducible offline via Rekor)
cosign verify-blob on the SBOM
SHA-256 reproduction of deployment_config_sha256 from MANIFEST
jq '.components | length' on SBOM for component count
SQL audit queries on permission_denials

9.2 Independent verifiability#

No network access required beyond Rekor transparency log
No Sigstore-corp dependency (Rekor is public, offline-queryable)
GitHub Release assets are immutable
Rekor entries are append-only + censorship-resistant
Every release is reproducible from signed source commits

9.3 Procurement artefacts#

1-page architecture + verification commands (.project/security/architecture.md)
STRIDE threat model with 9 assets · DREAD scoring · top-10 residual risks (.project/security/threat_model.md)
CAIQ v4.0.3 pre-filled (60 Qs, ~75% Y/Y+P) (.project/security/caiq_lite.md)
Commit-signing setup guide (.project/security/signing_setup.md)
Comprehensive reference (docs/REFERENCE.md)
This features catalogue (docs/FEATURES.md)

9.4 Certification track (in-flight)#

SOC 2 Type I readiness — Q2 2026 via Drata or Vanta
SOC 2 Type II — Q4 2026 (requires 6+ months of Type I operating)
Pen test — Q2 2026 (Lucideus / Cobalt quotes in flight)
ISO 27001 — post-SOC 2 Type II
Formal CSA STAR Level 1 — post-SOC 2 Type I

9.5 Legal artefacts (track, not yet landed)#

MSA template — BFSI-savvy Indian tech lawyer engagement pending
DPA template — same
DPIA template — same
Cyber liability insurance — pending
E&O insurance — pending
HIPAA BAA template — pending

9.6 Honest disclosures (residual risks, documented)#

Prompt-injection heuristic recall ~55-65% (defense-in-depth via G1 + G2 + G4)
nsjail blocks syscall escape, not app-logic via allowed imports
Paraphrased PII ("number ends in 4729") not caught
Encryption at rest defends cold disk, not RCE on running API
G15 RTBF has no reach into customer backups (runbook must document)
Solo-dev bus factor — documentation system + ADR log mitigate
Single-node SQLite writer (Postgres migration documented when needed)

9.7 Status reporting#

CAIQ Lite scorecard — ~75% Y/Y+P aggregate
Top-10 gaps enumerated with rupee-effort estimates (.project/security/caiq_lite.md § Top-10 gaps)
Quarterly review cadence documented
Update triggers documented (per release tag, per ADR, per pen-test finding)

10. Supply Chain POV#

10.1 Source integrity#

All commits on master since v0.12.0 are SSH-signed
GitHub "Verified" badge on all post-v0.12.0 commits (signing key registered)
Branch protection: require signed commits (recommended config documented)
G17 CI gate rejects unsigned commits in release range

10.2 Build integrity#

CI runs on GitHub Actions ubuntu-latest (ephemeral, reproducible)
Minimum permissions per job (permissions: contents: read default; id-token: write only for Sigstore steps)
Deterministic tarball build (positive-list filter + sorted file list)
Docker image base verified via cosign verify docker.io/library/python:3.12-slim

10.3 Artefact signing#

Cosign keyless tarball signing via GitHub OIDC
Cosign keyless SBOM signing
Ephemeral certificates (10-min TTL from Fulcio)
Rekor transparency log entry per artefact
Certificate identity pinned to GitHub Actions run

10.4 SBOM#

CycloneDX 1.5 JSON format
Generated via cyclonedx-bom from the Python environment
Published as GitHub Release asset (signed)
34 components at v0.12.0

10.5 Dependency monitoring#

GitHub Dependabot enabled (security + version updates)
SBOM feeds into customer vendor-risk tools (Snyk · Dependabot · Trivy · JFrog Xray)
License compatibility discoverable via SBOM
Dependency hash-pinning planned (pip-tools migration in backlog)

10.6 Release flow#

Single tag push triggers release workflow
Dispatch alternative for re-runs on existing tags
Signed-commit gate → SBOM → base image verify → tarball → sign tarball → sign SBOM → GitHub Release upload
6 assets per release (tarball + sig + cert for each of tarball + SBOM)
Immutable GitHub Release URL per tag

10.7 Verification surface#

Customer runs cosign verify-blob with 2 flags + 3 filenames (one command per artefact)
No private-key material to exchange
No shared-secret ceremony
Reproducible by any engineer with cosign + jq installed

11. Performance + scale POV#

11.1 Latency optimizations#

HTTP connection pool for LLM calls (shared across agents)
Prompt cache (first-prompt Claude pricing)
Schema cache (JSON schemas computed once per tool)
Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
Context compaction at 80% window (avoids context-full crash)
In-memory rate-limit (sliding window, no Redis hop)

11.2 Throughput#

Async FastAPI + uvicorn (single-process async by default)
Multi-worker uvicorn supported (uvicorn --workers N) for horizontal scale
Stateless per-request design (state in SQLite, not in-memory)
SQLite WAL for read-during-write concurrency
Postgres migration path for high-concurrency multi-node

11.3 Storage efficiency#

Conversation JSONL per agent (append-only, grep-friendly, compressible)
Audit PDF target <500 KB per run (full JSONL not embedded; summaries only)
Retention daemon prunes past-TTL artefacts
Build-time tarball excludes noise (dist went from 624 MB to 2.7 MB)

11.4 Test runtime#

Full suite: 1258 tests in ~58 seconds
Parallelisable via pytest -n auto (implicit via pytest-xdist if installed)
Unit tests isolated by monkeypatch on external binaries (nsjail/docker/cosign/aws not required)

11.5 Benchmarks#

Permission engine: ~30 µs per tool dispatch (~17× baseline subprocess call; negligible vs ~500ms LLM latency)
Conversation flush: batched 10-msg buffer, 1s interval
Regression baseline stored in ml_team/tests/bench/ (nightly diff)

12. UX / Dashboard POV#

12.1 Operator surfaces#

Next.js 15 + React 19 (SSR-first + server components)
Tailwind CSS + shadcn/ui component library
Real-time updates via WebSocket (no polling)
Typed fetch client (useConfig() hook, etc.)
Branding driven by GET /api/v1/config/branding (SSR-fetched, no flash)

Sidebar with 10+ sections (pipelines, deployments, agents, transparency, cron, batch, plugins, knowledge, settings, docs)
Breadcrumb trails on deep pages
Role-based section visibility

12.3 Live feedback#

Pipeline run live-stream (every tool call, every LLM response, every HITL gate)
Conversation tree drill-down per agent
Token + cost breakdown per agent
Timeline view with span overlap visualization
Real-time denial surfacing on the /transparency page

12.4 Admin UX#

Feature flag toggle UI (all 3 tiers surfaced)
Permission denial filter + search
HITL gate approve/reject inline
Cron job create + manual-run buttons
Plugin install + inspect wizard

12.5 Auditor UX#

Per-run audit PDF download button
Per-run full conversation JSONL export
Permission denial CSV export (planned)
Retention log view (/transparency)

Counts summary#

Area	Count
Agents (ML team)	40
Teams	7
Native tools	25
Guardrails	15
Permission rule sources	8
Hook events	10
LLM providers	5
Deployment templates	3
Pipeline configs (pre-built)	3
Backends	3
HITL gate types	6
Roles (RBAC)	3
API routers	23
Dashboard pages	10+
CI workflows	5
Prometheus metrics	28+
Tests	1258
Compliance frameworks mapped	8
Release assets per tag	6
CAIQ Lite questions answered	60
Residual risks registered	10 (top)
Critical assets in STRIDE	9
Training modules	8 (of 10 planned)
Implementation phase docs	27
PII detectors (G4)	12
Sandbox drivers (G2)	3
Audit signing drivers (G11)	4
Key providers (G13)	5

Maintained by: TheAiSingularity · security@theaisingularity.org Update trigger: every release, every architectural ADR, every new feature. Drift is a bug.