Skip to content

Features Catalogue#

Canonical source: docs/FEATURES.md in the repo. This page is rendered via include-markdown and stays in sync automatically on every push to master.

Every feature organized by point of view — product / technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply chain / performance / UX. 573 numbered features. Each has a file-path or test-file citation for verifiability.

Version: v0.12.0 Last updated: 2026-04-22 Scope: every feature, organized by point of view. Technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply-chain. Every feature has a file-path or test-file citation for verifiability.

For module-organized reference, see docs/REFERENCE.md. For architecture narrative, see MASTER_README.md.


Table of contents#

  1. Product / Customer POV — what you can do with it
  2. Technical / Engineering POV — what's under the hood
  3. Security POV — guardrails + controls
  4. Compliance / Regulatory POV — framework mappings
  5. Operator POV — day-to-day running
  6. Developer / Integrator POV — how to extend
  7. ML Practitioner POV — ML-specific capabilities
  8. Auditor POV — evidence artifacts
  9. CISO / Procurement POV — evaluation artifacts
  10. Supply Chain POV — build + release integrity
  11. Performance + scale POV
  12. UX / Dashboard POV

1. Product / Customer POV#

1.1 Autonomous end-to-end ML pipeline#

  1. Accept a problem statement + a dataset path; return a trained model, fairness audit, drift report, SHAP explanations, model card, audit PDF, Docker image, and K8s manifests — end-to-end autonomous (POST /api/v1/pipelines)
  2. 3 pre-built pipeline configs (fast_prototype · default_ml_pipeline · parallel_research) + customer-authored YAML workflows (lib/workflows/)
  3. Human-in-the-loop gates at 6 points: deploy · data request · manual · security · cost · custom (ml_team/core/approval.py)
  4. Checkpoint-and-resume on every HITL gate — pipeline state persists through pauses (ml_team/core/approval.py::ApprovalGate)
  5. Run cancellation via DELETE /api/v1/pipelines/{run_id} — graceful shutdown with partial artefact preservation
  6. Real-time run status + WebSocket live-streaming (/api/v1/pipelines/{id}/ws)

1.2 Multi-agent architecture#

  1. 40 specialized ML agents across 7 teams (Data · Algorithm · Training · Evaluation · Deployment · Quality · Management) (lib/agents/base/)
  2. 3-tier hierarchy: director → coordinator → worker with transfer_to_* delegation (ml_team/core/orchestrator.py)
  3. ReAct loop per agent — LLM decides → calls tool → consumes result → iterates (ml_team/core/agent_runner.py)
  4. Evaluator-generator separation — optional per-agent rubric grading (ml_team/core/evaluator.py)
  5. Context compaction at 80% of window to keep long runs alive (ml_team/core/context_compaction.py)
  6. Customer-specific agents via YAML extends: overlays (ml_team/core/agent_composer.py)

1.3 Multi-provider LLM support#

  1. Anthropic Claude (first-class)
  2. OpenAI GPT-4o / o1 (first-class)
  3. vLLM — local GPU serving for air-gapped deployments
  4. Ollama — quantised local models
  5. Single-model override for all agents (dev/testing)
  6. Per-agent provider selection (mix small local + frontier in one pipeline) (ml_team/core/llm_client.py)

1.4 Customer-composable deployment model#

  1. Three-layer architecture: core/ (never forked) + lib/ (versioned shelf) + deployments/<customer>/ (composition) (v0.12.0)
  2. SWARM_DEPLOYMENT=deployments/hdfc_bank at boot selects which customer's full stack runs (ml_team/core/deployment_loader.py)
  3. Three deployment templates: generic_ml · bfsi_baseline (RBI FREE-AI) · hipaa_baseline (stub) (lib/templates/)
  4. Per-deployment branding (product_name · logo · colors · compliance badges) surfaced at /api/v1/config/branding (ml_team/api/routers/config.py)
  5. Per-deployment knowledge bases (RAG corpus) in deployments/<customer>/knowledge/
  6. Per-deployment permission policy + retention overrides (lib/templates/<>/retention.yaml)

1.5 Dashboard#

  1. Next.js 15 + React 19 + TypeScript operator dashboard
  2. Live pipeline view with conversation tree · trace · cost breakdown · timeline
  3. Real-time updates via WebSocket (no polling)
  4. OIDC login (Okta · Azure AD · Google Workspace) + username/password fallback
  5. Champion / challenger deployment management view
  6. 40-agent roster with per-agent tool sets + conversation trails

1.6 CLI (swarm)#

  1. Auth: login / logout / whoami / health
  2. Feature flags: features list|get|set|reset
  3. Pipelines: pipelines list|run|status|cancel
  4. Plugins: plugins list|inspect|install
  5. Cron: cron list|create|run|delete|runs
  6. Batch: batch list|submit|status|results|resume
  7. Deployments (runtime): deployments list|promote|retire
  8. Ship pipeline (v0.12.0): deploy new|validate|ship|whitepaper

1.7 Pricing + deployment posture (product positioning)#

  1. VPC-installable — data never leaves customer network (on-prem / AWS ap-south-1 / Azure Pune / Yotta)
  2. Single-tenant per customer (multi-tenancy deferred)
  3. $25K pilot → $75K deployment → $10K/mo retainer framework (see MASTER_README.md § Pricing)
  4. First production model target: 8 weeks from kickoff

2. Technical / Engineering POV#

2.1 Runtime#

  1. Python 3.11+ (tested on 3.11 · 3.12)
  2. FastAPI 0.115+ async backend, Starlette ASGI
  3. Pydantic v2.9+ with extra="forbid" on every manifest schema
  4. Next.js 15 SSR + server components dashboard
  5. Thread-local SQLite connection pool with WAL mode + FK enforcement (ml_team/api/database.py)
  6. HTTP connection pool + prompt cache + schema cache (W1 optimizations)

2.2 Data persistence tiers#

  1. SQLite (primary) — runs · users · permission_denials · lineage (G14) · champion-challenger · approvals · plugin installs (12 tables)
  2. JSONL per agent — conversation store with buffered flushes (10 msgs or 1s threshold)
  3. Postgres (optional) — cross-project memory for multi-project orgs
  4. ChromaDB + keyword fallback — RAG
  5. Per-run work_dir — scratch for metrics · model cards · fairness JSON · SHAP · drift · audit PDFs

2.3 Agent runtime internals#

  1. Supervisor-worker orchestration, custom implementation (not LangChain) (ml_team/core/orchestrator.py)
  2. Two swappable backends: native + LangGraph sharing one agent config
  3. CrewAI adapter (legacy migration path)
  4. Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
  5. Per-agent memory (ephemeral during run) + cross-run learning via save_agent_learning
  6. Delegation tool-call synthesis (transfer_to_<agent>) injected at supervisor level

2.4 Permission engine (8 rule sources)#

  1. RBAC source — translates require_role(min) into DENY rules
  2. Agent allowlist source — per-agent tool_set enforcement
  3. Feature flag source — auto-skip tools when their flag is off
  4. HITL source — approval-required tools
  5. Policy source — operator-authored permission_policies.yaml
  6. Profile source — template-baked rules (BFSI etc.); emits at priority 45 (ask) or 60 (deny)
  7. Compliance-gate source — runtime gate verdicts become DENY rules (priority 55)
  8. Egress allowlist source (G1, v0.12.0) — URL walker + host classifier
  9. Tier-aware resolution: ALLOW > DENY > ASK > default with priority tiebreaks (ml_team/core/permissions.py)
  10. Invariant-DENY floor at priority 60 — profile rules beat operator POLICY ALLOW at 50

2.5 Hook lifecycle (10 integration points)#

  1. SESSION_START · PRE_TOOL · POST_TOOL · PRE_COMPACTION · POST_COMPACTION (pre-Track 2)
  2. PRE_LLM · POST_LLM · STORAGE_WRITE · LLM_CALL_WRAPPER · AGENT_DELEGATE (v0.12.0)
  3. Plugin-loaded hooks compose with core hooks (same pipeline)
  4. Shell-command hooks ({"type": "command", ...}) executed behind feature flag with rlimits (ml_team/core/shell_hook_runner.py)
  5. Per-execution audit rows in plugin_shell_executions SQLite table

2.6 Guardrails runtime (v0.12.0)#

  1. GuardrailRegistry (thread-safe) with @register(point, id, priority) decorator
  2. Priority-sorted handler evaluation per integration point
  3. Outcome model: ALLOW chains · REDACT threads payload · DENY short-circuits · ERROR fails open unless invariant
  4. 15 guardrails split across 6 categories (detail in § 3)
  5. Auto-configuration from deployment guardrail_configs block via guardrail_bootstrap.bootstrap_from_deployment()
  6. Prometheus counters (guardrail_triggered_total, guardrail_bypass_attempts_total) + duration histogram

2.7 Authentication + authorization#

  1. JWT Bearer (HS256, 24h TTL, rotatable secret via Doppler)
  2. Legacy X-API-Key header (admin-equivalent, backcompat)
  3. OIDC SSO — Okta / Azure AD / Google Workspace via Authlib (PKCE + state cookie)
  4. IdP group → swarm role mapping via ML_TEAM_OIDC_ROLE_MAP
  5. 3 roles: viewer / operator / admin
  6. require_role(min_role) FastAPI dependency routes denials through permission engine
  7. bcrypt password hashing (12 rounds)
  8. Admin bootstrap from env vars at first boot

2.8 Observability#

  1. Prometheus metrics on every subsystem (28+ counters/histograms) at /api/v1/metrics
  2. OpenTelemetry spans with parent/child, tokens, costs
  3. Real-time WebSocket streaming of pipeline events
  4. Structured JSON logs (ml_team/core/logging_config.py)
  5. G6 credential filter at root logger (all logs scrubbed at source)
  6. Per-agent conversation JSONL (durable, grep-friendly)

2.9 Plugin ecosystem (Claude Code marketplace format)#

  1. Skills · MCPs · hooks · commands · agents — all installable via one .mcp.json + manifest
  2. MCP client over JSON-RPC (stdio + SSE) (ml_team/core/mcp_client.py)
  3. Install-time manifest validation + install-drop tracking (scan_install_drops)
  4. Namespace isolation — plugin agents forced to plugin-<name>::<agent> so no shadowing
  5. Feature-flag gated plugin shell hooks with invoke-time validation
  6. CC marketplace compat — tested against superpowers v5.0.7 (100% surface retention)

2.10 Ship pipeline (v0.12.0)#

  1. swarm deploy new <customer> --template=<> scaffolds deployments/<customer>/ (ml_team/deploy/scaffold.py)
  2. swarm deploy validate — lint config + lib refs + customer-name match (ml_team/deploy/validator.py)
  3. swarm deploy ship — build-time positive-list filter · MANIFEST.yaml · whitepaper (ml_team/deploy/ship.py)
  4. swarm deploy whitepaper — 5-section markdown whitepaper (ml_team/deploy/whitepaper.py)
  5. Per-customer build-time isolation — other customers' deployments/<other>/ excluded at tarball build time (tested at tar-member level in test_ship_excludes_other_customer_dirs)
  6. Deterministic MANIFEST.yaml with pinned lib versions + config SHA-256 + build commit + host + timestamp (sort-keyed YAML)

2.11 Ops primitives (W7)#

  1. Cron scheduler — 60s tick, file-backed store, 4 task kinds (retrain · drift_check · audit_pdf · custom) (ml_team/core/cron.py)
  2. Batch runner — JSONL → processor → results.jsonl; checkpoints every 10 records; resume-on-restart (ml_team/core/batch.py)
  3. Retention daemon — 24h sweep, per-artefact TTLs (2555d BFSI default) (ml_team/core/retention.py)
  4. Feature-flag registry with 3 tiers: INVARIANT / FLAG / USER_OVERRIDE (ml_team/core/feature_flags.py)

2.12 Tests (1258 total, 2 skipped)#

  1. Unit tests per guardrail (17-30 each)
  2. Integration tests via FastAPI TestClient (23 routers)
  3. End-to-end BFSI baseline test — biased-model → deployment blocked
  4. Tamper-evident bundle hash regression test
  5. Snapshot parity tests (pre- vs post-refactor byte-identical dicts)
  6. Full regression gate in CI (matrix py3.11 + py3.12)
  7. Nightly real-LLM golden-path run (.github/workflows/nightly-e2e.yml)
  8. Performance bench baselines (ml_team/tests/bench/) with nightly diff

2.13 CI/CD#

  1. GitHub Actions on every push + PR to master (5 workflows)
  2. Lint (ruff + mypy) · pytest (matrix) · bandit HIGH gate · semgrep ERROR gate (ci.yml)
  3. Release workflow on v*.*.* tags with signed-commit gate + SBOM + Cosign (release-supply-chain.yml)
  4. SARIF upload to GitHub Security tab for semgrep findings
  5. Coverage XML artifact upload
  6. Advisory doc-drift check (doc-drift.yml)

3. Security POV#

3.1 Network controls#

  1. G1 Egress allowlist (in-process) — URL walker + RFC1918/loopback/link-local/ULA block (ml_team/core/egress_allowlist.py)
  2. Suffix patterns (*.example.com) + fnmatch allow-patterns
  3. Scheme-level block list (file, gopher, dict, etc.)
  4. Literal-string host matching (no DNS); mitmproxy sidecar for DNS-rebind coming in follow-up
  5. Audit trail via permission_denials with source=egress_allowlist

3.2 Execution sandbox#

  1. G2 Python sandbox driver abstraction — SandboxDriver Protocol (ml_team/core/python_sandbox.py)
  2. nsjail driver (Linux production) — seccomp + user namespace + RO rootfs + net namespace
  3. Docker driver (macOS dev) — throwaway python:3.11-slim with --network=none --read-only --rm
  4. Subprocess driver (portable fallback) with loud WARN on each invocation
  5. strict=True + unavailable driver → API boot aborts (no silent degrade)
  6. Per-call resource limits: memory_mb / cpu_time_sec / allow_network / allow_paths

3.3 Input safety#

  1. G3 Prompt-injection heuristic — 25 patterns (12 high + 13 medium severity) at PRE_LLM priority 70
  2. Occurrence counting via re.findall (multi-hit aware)
  3. OpenAI multi-part + bare-string + single-dict payload shapes handled
  4. System role skipped by default (operator authoring system prompts isn't attacking)
  5. Honest ceiling disclosed in manifest: ~55-65% recall vs Lakera ~80-87%

3.4 Output + persistence safety#

  1. G4 PII detection — 12 regex detectors with structural validation (ml_team/core/pii/regex_detectors.py)
  2. Luhn check on credit cards (drops false-positive digit runs)
  3. Verhoeff check on Aadhaar (kills false-positive 12-digit strings)
  4. Indian BFSI recognisers: Aadhaar · PAN · IFSC · IN_PHONE
  5. International: EMAIL · CREDIT_CARD · IBAN · IPV4 · IPV6 · PRIVATE_KEY_BLOCK · US_SSN · US_PHONE
  6. 3 action modes: redact / mask (keep first 2 + last 2) / hash (SHA-256 prefix)
  7. Overlap resolver: higher-confidence wins → longer span → registration order
  8. Registered at POST_LLM + POST_TOOL + STORAGE_WRITE (three integration points)
  9. Optional Microsoft Presidio shim (lazy import; 300MB spaCy model opt-in) (ml_team/core/pii/presidio_shim.py)
  10. G5 Conversation JSONL scrubber — wraps ConversationStore._flush_locked (ml_team/core/conversation_scrubber.py)
  11. Recursive value walk (scans strings in dicts + lists)
  12. Byte-identical output on zero-mutation lines (no disk churn)
  13. _redacted: true tag on mutated lines (audit grep)
  14. Non-JSON lines pass through untouched (defensive)
  15. G6 Logs credential filterlogging.Filter subclass at root logger
  16. 13 known-secret regex patterns (sk-…, ghp_…, AKIA…, BEGIN PRIVATE KEY, JWT, etc.)
  17. Anthropic before OpenAI via negative lookahead sk-(?!ant-)
  18. Shannon-entropy fallback ≥4.2 bits/char on ≥32-char tokens
  19. Scrubs record.__dict__ fields too (structured logging caught)
  20. Replaces with [REDACTED_SECRET_<sha256[:8]>] (stable placeholder allows correlation without leaking value)

3.5 Rate limiting + cost controls#

  1. G7 Per-user rate limit — composite (caller_identity, endpoint_class) key (ml_team/api/rate_limit.py)
  2. Identity precedence: X-API-Key (SHA-256[:12]) → JWT sub → client IP
  3. Per-role limits: viewer 100r/0w · operator 600r/20w · admin 2000r/100w per minute (env-configurable)
  4. Response headers: X-RateLimit-{Limit,Remaining,Role}

3.6 Agent safety#

  1. G10 Delegation loop detector — stack depth cap + fan-out cap + same-args dedup (lib/guardrails/platform_integrity/delegation_loop_detector/)
  2. Default caps: max_depth=5, max_delegations_per_run=50
  3. Two scopes: strict (agent + args_hash) or name_only
  4. Per-run state dict (isolated across runs)
  5. G16 HITL TTL + escalationApprovalGate fields + cron sweep (ml_team/core/hitl_sweep.py)
  6. Pure function sweep(store, notifier, now) → SweepReport
  7. Escalation fires before expiry in the same sweep (documented ordering)
  8. Monotonic timer for expiry + wall-clock for display (clock-skew safe)

3.7 Data protection at rest#

  1. G12 Encryption at rest — AES-GCM-256 envelope (ml_team/core/encryption.py)
  2. 3 at-rest driver options: sqlite_host_fs_only (default) · sqlcipher · postgres_pgcrypto
  3. Per-call DEK + customer-wrapped KEK
  4. AAD = sorted context dict; decrypt with mismatched context fails AEAD integrity
  5. WrappedDek + Ciphertext dataclasses, JSON-serialisable end-to-end
  6. G13 BYOK KeyProvider Protocol — 5 implementations
  7. StubProvider (tests, deterministic base64)
  8. EnvKeyProvider (dev via SWARM_KEK env var)
  9. AwsKmsProvider (production — shells to aws kms encrypt/decrypt)
  10. GcpKmsProvider (fail-fast stub pending customer demand)
  11. VaultTransitProvider (fail-fast stub pending customer demand)
  12. Threat model explicitly scoped: "cold DB file / disk image" ≠ "attacker with RCE on running API"

3.8 Audit trail + erasure#

  1. G14 Data lineage — 3 SQLite tables with enforced FKs (ml_team/api/database.py)
  2. datasets (dataset_id PK + consent_doc_ref G15 index)
  3. lineage_models (model_id PK + dataset_id FK ON DELETE SET NULL)
  4. lineage_deployments (deployment_id PK + model_id FK ON DELETE CASCADE)
  5. chain_for_deployment(deployment_id) returns full joined dict for audit PDF
  6. Helper APIs: record_dataset/model/deployment (idempotent upserts) (ml_team/core/lineage.py)
  7. G15 Right-to-be-forgotten — admin-only endpoint + signed receipt (ml_team/api/routers/subjects.py)
  8. GET /api/v1/subjects/{id}/preview (dry-run)
  9. DELETE /api/v1/subjects/{id} (execute)
  10. Cascade delete of dataset rows · FK sets model dataset_id to NULL · conversation JSONL rewritten to tombstones
  11. ErasureReceipt dataclass with SHA-256 signature over sorted JSON (ml_team/core/rtbf.py)
  12. Regex metacharacters in subject_id escaped (no ReDoS)
  13. _subject_pattern uses re.escape (tested in test_regex_metacharacters_in_subject_id_are_escaped)
  14. Tombstone preserves line order + count (downstream JSONL parsers still work)

3.9 Audit PDF signing#

  1. G11 Audit PDF signing — 4-driver abstraction (ml_team/core/audit_signer.py)
  2. Stub driver (deterministic test signature)
  3. Offline Ed25519 driver (air-gap, key on build host)
  4. Cosign KMS driver (AWS/GCP/Vault — BFSI default)
  5. Cosign keyless driver (Sigstore + GitHub OIDC — SaaS default)
  6. Sidecar receipt JSON written alongside .sig + .pem files (canonical source for downstream verifiers)
  7. Rekor log index captured for keyless signatures

3.10 Supply-chain integrity#

  1. G17 SBOM + signed commits — CI gate (.github/workflows/release-supply-chain.yml)
  2. Every commit in release range verified via git log %G? (non-G/U fails workflow)
  3. CycloneDX 1.5 JSON SBOM via cyclonedx-bom (scripts/gen_sbom.py)
  4. Cosign keyless tarball + SBOM signing via GitHub OIDC
  5. Base-image cosign verify on python:3.12-slim (WARN-not-block on upstream policy changes)
  6. 6 signed release assets per v*.*.* tag
  7. Rekor transparency log entry for every signed artefact

3.11 Cryptography inventory#

  1. AES-GCM-256 (envelope encryption)
  2. Ed25519 (offline audit signing)
  3. ECDSA-P256 (Cosign)
  4. bcrypt 12 rounds (password at rest)
  5. HS256 (JWT)
  6. SHA-256 (hashing, fingerprints)
  7. No deprecated algorithms — no MD5 for auth, no SHA-1 for signing, no RC4/DES/3DES anywhere

3.12 RBAC + access control#

  1. 3-role enum: viewer < operator < admin
  2. Per-endpoint role guards on all sensitive routes
  3. Admin-only endpoints: /subjects/* (G15 RTBF), /permissions/denials (audit), /features/* (flag mutation)
  4. JWT revocation: grace-period + secret rotation (documented; no revocation list yet)
  5. Session state in JWT only (no server-side session store to compromise)

3.13 Static security analysis#

  1. bandit (HIGH severity gate in CI) (ml_team/pyproject.toml::[tool.bandit])
  2. semgrep p/python + p/security-audit (ERROR severity gate)
  3. MEDIUM findings reviewed and annotated inline with # nosec BXXX + rationale (4 suppressions)
  4. SARIF upload to GitHub Security tab for every scan
  5. 0 HIGH bandit findings · 0 ERROR semgrep findings at v0.12.0

3.14 Threat model#

  1. STRIDE analysis across 9 critical assets (.project/security/threat_model.md)
  2. DREAD scores on every row
  3. Top-10 residual risk register with owners + status
  4. Cross-cutting attack scenarios documented (prompt-injection exfil chain, compromised dev laptop, backup-exfiltration + RTBF, supply-chain via transitive dep)
  5. Quarterly review cadence documented

4. Compliance / Regulatory POV#

4.1 Framework coverage (8 frameworks)#

  1. RBI FREE-AI (India) — Pillars 2, 3, 5, 6 full; 1, 4, 7 partial
  2. DPDP Act 2023 (India) — §§ 8, 10(8), 12 full
  3. EU AI Act (high-risk) — Arts. 10, 12, 14, 15 full; Art. 13 partial
  4. HIPAA Security Rule — 164.308, 164.312, 164.514, 164.528 (controls present; BAA template pending)
  5. GDPR — Arts. 5, 17, 22, 30, 32 (controls present)
  6. SOC 2 — CC6.x, CC7.x, CC8.1 (design ready; Type I readiness Q2 2026)
  7. OWASP LLM Top 10 — direct controls for LLM01, 02, 04, 05, 06, 08, 10
  8. NIST AI RMF 1.0 — Govern 1.4/1.6/1.7, Map 2.3/4.1, Manage 2.2/2.3

4.2 BFSI / Indian-specific controls#

  1. RBI FREE-AI Pillar 2 (Consent) — G4 PII detection + G15 erasure
  2. Pillar 3 (Robustness) — G3 prompt-injection + G10 delegation-loop + drift/fairness gates
  3. Pillar 5 (Accountability) — G14 data lineage + G11 audit-PDF signing
  4. Pillar 6 (Security) — G1 + G2 + G6 + G12 + G13 + G17
  5. DPDP Act § 8 (data fiduciary) — G14 + G15
  6. DPDP Act § 10(8) (named DPO) — self-designated 2026-04-22 (SECURITY.md)
  7. DPDP Act § 12 (erasure) — G15 with signed receipt
  8. CERT-In 6-hour breach notification — runbook pending; technical capability (audit logs + metrics) in place
  9. RBI 7-year retention (2555 days) — BFSI baseline default

4.3 Compliance artefact generation#

  1. Model card (Markdown, RBI-aligned structure) via tools/model_card.py
  2. Fairness audit JSON (fairlearn MetricFrame, per-group metrics) via tools/fairness.py
  3. Drift report (PSI + KS + chi², BFSI thresholds 0.10/0.25) via tools/drift.py
  4. SHAP explanations JSON via tools/explainability.py
  5. Audit PDF with tamper-evident source-bundle SHA-256 on cover
  6. Signed conversation JSONL per agent (retention-policy governed)
  7. Retention log (retention_log.json) documenting every artefact deletion
  8. permission_denials SQLite table — every denial source-attributed

4.4 Invariant-DENY guarantees#

  1. Profile DENY rules emit at priority 60 → CANNOT be overridden by operator POLICY ALLOW at 50
  2. Invariant tier feature flags CANNOT be toggled at runtime
  3. strict=True guardrails fail API boot if driver unavailable (no silent degrade)
  4. Tested end-to-end in test_bfsi_baseline_e2e.py

4.5 Procurement artefact pack#

  1. 1-page security architecture diagram (.project/security/architecture.md)
  2. STRIDE threat model with DREAD scoring (.project/security/threat_model.md)
  3. Pre-filled CAIQ v4.0.3 questionnaire (60 Qs, ~75% Y/Y+P) (.project/security/caiq_lite.md)
  4. Commit-signing setup guide (.project/security/signing_setup.md)
  5. README pack index with audience-routing guide (.project/security/README.md)

5. Operator POV#

5.1 Installation#

  1. pip install -e "ml_team/.[ml]" — one-step install on a clean Python 3.12 venv
  2. .env template for OPENAI_API_KEY + JWT secret + admin bootstrap
  3. uvicorn ml_team.api.app:app — zero-config local start
  4. SQLite default (no DB service to run)
  5. Optional Docker image (planned)

5.2 Configuration#

  1. 25+ environment variables (documented in README.md § Configuration reference)
  2. YAML deployment config (deployments/<customer>/config.yaml) — declarative, Pydantic-validated
  3. Per-template profile defaults (lib/templates/<>/)
  4. Per-customer overrides (branding · retention · guardrail configs)
  5. Operator-authored custom policy rules (ml_team/config/permission_policies.yaml)

5.3 Dashboards#

  1. / — pipelines list + live feed
  2. /pipelines/[id] — drill-down
  3. /deployments — champion/challenger
  4. /agents — 40-agent roster
  5. /transparency — denial log · retention · cron · batch (one operator-facing "everything" page)
  6. /cron — scheduler
  7. /plugins — marketplace install + inspect
  8. /knowledge — RAG corpus management
  9. /settings — feature flags (admin only)
  10. /docs — in-app documentation browser

5.4 Monitoring + alerting#

  1. Prometheus /api/v1/metrics endpoint (unauthenticated, industry standard)
  2. 28+ counters + histograms covering every subsystem
  3. Key counters: pipelines_started_total · llm_calls_total{agent,model} · tool_calls_total · tool_denied_total · permission_denials_total · guardrail_triggered_total{name,outcome} · guardrail_bypass_attempts_total · active_pipelines
  4. Key histograms: pipeline_duration_seconds · llm_call_duration_seconds · guardrail_evaluation_duration_seconds
  5. OpenTelemetry traces (parent/child) with token/cost metadata
  6. Structured JSON logs (Loki/Splunk/CloudWatch-ready)
  7. WebSocket event streaming for real-time dashboard updates

5.5 Ops primitives#

  1. Cron scheduler with 4 task kinds (retrain, drift_check, audit_pdf, custom)
  2. Batch runner (JSONL → processor → results.jsonl, 10-record checkpoints)
  3. Retention daemon (24h sweep, per-artefact TTLs)
  4. Feature flag admin UI (/settings)
  5. Runtime feature-flag overrides via POST /api/v1/features/{name} (admin)
  6. HITL approval UI at /pipelines/[id] (gate type + rationale surfaced)

5.6 Secrets management#

  1. Doppler per-customer projects (swarm-<customer>-{dev,staging,prod})
  2. .env file fallback for dev
  3. SOPS + age migration path documented (no vendor lock-in)
  4. swarm deploy rotate-secret planned (not yet implemented)

5.7 Backups + disaster recovery#

  1. SQLite dumpable via sqlite3 .dump (trivial backup)
  2. JSONL files rsync-friendly (append-only)
  3. Host-FS encryption recommended (LUKS · EBS-KMS · GCP PD · Azure Disk encryption)
  4. Customer-controlled backup policy (documented in deployment runbook)
  5. Postgres migration path for multi-node HA (not yet executed)

5.8 Governance contacts#

  1. Named DPO (DPDP Act § 10(8)) — security@theaisingularity.org
  2. Named Security Officer — same contact
  3. 24h acknowledgement SLA + 5-business-day substantive SLA
  4. CERT-In 6h mandate when applicable

6. Developer / Integrator POV#

6.1 Tool authoring#

  1. Plain Python functions with type hints become tools automatically
  2. Docstring doubles as LLM-facing tool description
  3. JSON schema auto-generated from type hints (Pydantic under the hood)
  4. Lib asset manifest for versioned distribution (lib/tools/<id>/tool.yaml)
  5. Per-tool tests in lib/tools/<id>/tests/

6.2 Agent authoring#

  1. Author YAML manifest under lib/agents/base/<id>/agent.yaml — no Python required
  2. extends: lib/agents/base/<base>@vX.Y.Z composition
  3. Overlay actions: system_prompt_append · system_prompt_prepend · system_prompt_replace · field replace
  4. knowledge_bases: list pointing at Markdown/PDF corpus files
  5. tool_set: referencing lib tools + custom per-deployment tools
  6. tier: (director / coordinator / worker) with routing implications
  7. model: per-agent override (frontier for directors, small for workers)
  8. evaluator_rubric: optional grading criteria

6.3 Guardrail authoring#

  1. Create lib/guardrails/<category>/<id>/guardrail.yaml + enforce.py
  2. Register at one or more integration points via @register(point, "id", priority=N)
  3. Handler signature: (payload: dict) -> GuardrailResult
  4. configure() function at module level → add to guardrail_bootstrap._CONFIGURATORS dispatch table
  5. Tests alongside the guardrail (recommended: one red-team test minimum)

6.4 Plugin authoring (CC marketplace format)#

  1. Standard Claude Code plugin layout: .mcp.json + skills/ + hooks/ + commands/ + agents/
  2. Install via swarm plugins install <name>
  3. Hooks can be Python or shell (shell gated by plugin_shell_hooks_enabled flag)
  4. Commands with $ARGUMENTS substitution
  5. Agents namespace-isolated (plugin-<name>::<agent>)
  6. Install-drop telemetry shows silently-skipped surfaces

6.5 Workflow authoring#

  1. YAML manifest at lib/workflows/<id>/workflow.yaml
  2. Graph model: nodes (stage/router/checkpoint) + edges (source/target, priority, metadata)
  3. Node types: stage (tool call), router (branching), checkpoint (HITL pause)
  4. Hot-reload via SWARM_DEPLOYMENT restart (no mid-run reload)

6.6 Permission rule authoring#

  1. YAML in permission_baseline.yaml for profile templates OR ml_team/config/permission_policies.yaml for operator rules
  2. Fields: id · tool_name (glob) · behavior (allow/deny/ask) · pattern (regex on args JSON) · priority · reason
  3. Rules loaded at boot; restart to reload

6.7 Compliance gate authoring#

  1. YAML in compliance_gates.yaml per profile template
  2. Fields: id · triggers_on_tool · computes_via_tool · deny_if (restricted-eval expression) · blocks_tool
  3. Restricted eval uses _SAFE_BUILTINS (no open/exec/import)

6.8 Deployment template authoring#

  1. Create lib/templates/<name>/ with template.yaml · permission_baseline.yaml · compliance_gates.yaml · retention.yaml · branding.json · README.md
  2. Referenced from deployments via based_on: lib/templates/<name>@vX.Y.Z
  3. Auto-inherited by every deployment using the template

6.9 REST API#

  1. OpenAPI 3.0 auto-generated at /docs (dev mode)
  2. 23 routers covering pipelines · agents · models · evaluations · MCP · knowledge · chat · datasets · inference · deployments · features · plugins · permissions · cron · batch · subjects · auth · config · docs
  3. Consistent auth: Bearer JWT or X-API-Key
  4. Structured error responses via FastAPI/Pydantic

6.10 CLI library#

  1. stdlib argparse + httpx (no new deps)
  2. Persistent JWT at ~/.swarm/token
  3. JSON output by default; --table for compact view on list commands
  4. Exit non-zero on API errors

7. ML Practitioner POV#

7.1 Classification#

  1. train_classifier tool — LightGBM / XGBoost / RandomForest / Logistic (ml_team/tools/training.py)
  2. Stratified split (sklearn) with customer-configurable ratio
  3. Metrics sidecar (JSON): accuracy · precision · recall · F1 · AUC · per-class
  4. Post-save verification (model must load + predict correctly)
  5. Optional MLflow logging

7.2 Drift detection#

  1. detect_drift tool — 3 statistical tests (ml_team/tools/drift.py)
  2. Population Stability Index (PSI) with BFSI thresholds 0.10 / 0.25
  3. Kolmogorov-Smirnov test (continuous features)
  4. Chi-squared test (categorical features)
  5. Per-feature JSON output + aggregate drift score
  6. bfsi_drift_baseline_gate — deploy refused when max PSI > 0.25

7.3 Fairness audit#

  1. audit_fairness tool — fairlearn MetricFrame (ml_team/tools/fairness.py)
  2. Per-protected-attribute group metrics: accuracy · precision · recall · F1 · selection rate
  3. Binary disparate-impact scalars: demographic parity · equal opportunity · equalized odds
  4. bfsi_fairness_gate — deploy refused when demographic parity > 0.1

7.4 Explainability#

  1. explain_model tool — SHAP (ml_team/tools/explainability.py)
  2. Tree-native TreeExplainer fast path (100× faster than KernelExplainer)
  3. Generic KernelExplainer fallback for non-tree models
  4. Top-N feature importance + per-sample attribution
  5. JSON output consumed by model card generator

7.5 Model card#

  1. generate_model_card tool (ml_team/tools/model_card.py)
  2. Markdown document following RBI FREE-AI structure
  3. Sections: metadata · training data · metrics · fairness · drift · SHAP · intended use · limitations · contact
  4. Embedded in audit PDF

7.6 Deployment artefacts#

  1. package_model — Dockerfile + serve.py (FastAPI /predict + /health) + requirements.txt (ml_team/tools/deploy.py)
  2. generate_k8s_manifests — Deployment + Service + HPA (Horizontal Pod Autoscaler)
  3. BFSI-hardened defaults: runAsNonRoot · readOnlyRootFilesystem · allowPrivilegeEscalation: false · capabilities.drop: ['ALL']
  4. Optional docker build step
  5. Post-build verification (image must run + respond to /health)

7.7 Champion-challenger#

  1. Model registry with deployment_id · traffic_pct · environment (ml_team/tools/champion_challenger.py)
  2. Atomic champion promotion (old champion → retired; new → active)
  3. Shadow prediction log for challenger A/B analysis
  4. Configurable agreement thresholds before promotion
  5. Full history via GET /api/v1/deployments/{model} + UI at /deployments

7.8 Batch inference#

  1. Batch runner: JSONL input → processor → results.jsonl output (ml_team/core/batch.py)
  2. 3 processor kinds: inference (calls a registered model) · echo (debug) · custom (operator-authored)
  3. Checkpoints every 10 records
  4. Resume-on-restart from last checkpoint
  5. Streams results via GET /api/v1/batch/{id}/results

7.9 Experiment tracking#

  1. Optional MLflow integration (ml_team/tools/mlflow_tools.py)
  2. Auto-log training run if MLflow URI set

7.10 Memory#

  1. Per-run JSON memory in work_dir
  2. Cross-run SQLite memory (recall_past_runs, save_agent_learning)
  3. Cross-project Postgres memory (optional, multi-project orgs)
  4. RAG retrieval via ChromaDB (keyword fallback when embeddings unavailable)

8. Auditor POV#

8.1 On-disk evidence artefacts (per run)#

  1. pipeline_runs/<run_id>/metrics.json — training metrics sidecar
  2. pipeline_runs/<run_id>/*_card.md — model card
  3. pipeline_runs/<run_id>/fairness.json — fairness audit output
  4. pipeline_runs/<run_id>/drift.json — drift report
  5. pipeline_runs/<run_id>/shap.json — SHAP explanations
  6. pipeline_runs/<run_id>/approvals.json — HITL gate state (who approved, when, rationale)
  7. pipeline_runs/<run_id>/audit_report_<run_id>.pdf — signed audit PDF
  8. pipeline_runs/<run_id>/conversations/<agent>.jsonl — per-agent full turn log
  9. pipeline_runs/<run_id>/conversations/_index.json — agent hierarchy + timing

8.2 SQLite compliance ledger#

  1. permission_denials table — every denial with source · reason · run_id · agent · ts
  2. runs table — pipeline run state + profile_at_creation
  3. run_events table — per-run event stream
  4. model_deployments table — champion/challenger history
  5. shadow_predictions table — challenger agreement log
  6. approvals table — HITL gate state
  7. datasets table — dataset lineage with consent_doc_ref
  8. lineage_models table — training run lineage
  9. lineage_deployments table — deployment lineage
  10. plugin_installations + plugin_shell_executions — plugin audit

8.3 Audit PDF content#

  1. Cover page with: run ID · profile · build timestamp · bundle SHA-256 · signature + cert fingerprint · Rekor index
  2. Metrics section (training + per-class)
  3. Model card (Markdown embedded)
  4. Fairness section (per-group + scalar disparate-impact)
  5. Drift section (per-feature + aggregate)
  6. SHAP section (top feature importance)
  7. Approval gates section (who approved each HITL gate)
  8. Conversation summaries (per-agent message + tool-denial counts, full JSONL not embedded for size)
  9. Data lineage section (G14: dataset → model → deployment chain)

8.4 Retention#

  1. 2555-day (7 year) retention for BFSI — configurable per artefact class
  2. Daemon sweep every 24h (ml_team/core/retention.py)
  3. Deletion logged to retention_log.json (immutable append log)
  4. Classes: conversations_days · run_events_days · audit_pdfs_days · shadow_predictions_days

8.5 Audit queries (SQL-greppable)#

  1. "Who denied what in the last 24h?" — one SQL query on permission_denials
  2. "Which dataset trained which model?" — chain_for_deployment(deployment_id)
  3. "Every G1 egress denial by target host" — GROUP BY on permission_denials.reason
  4. "Every model that trained on subject X" — subjects_in_dataset(consent_doc_ref) + models_for_dataset
  5. "Every model card produced in Q3" — SQLite timestamp range

8.6 RTBF verification#

  1. Signed erasure receipt with SHA-256 over sorted JSON (excluding signature field)
  2. ErasureReceipt.verify() — rehash + compare
  3. Tombstone marker [SUBJECT_DELETED] with deleted_subject + deleted_at fields
  4. Audit row preserved (tombstone, not true delete — regulator's evidence survives)
  5. Line-for-line replacement (preserves ordering + line count)

8.7 Conversation tamper-evidence#

  1. Per-line sequence numbers (monotonic per agent)
  2. JSONL flush lock (thread-safe buffered writes)
  3. Audit PDF bundle hash includes conversation filename + size manifest
  4. Bundle hash stable under identical source artefacts

9. CISO / Procurement POV#

9.1 One-command verification#

  1. cosign verify-blob on the release tarball (reproducible offline via Rekor)
  2. cosign verify-blob on the SBOM
  3. SHA-256 reproduction of deployment_config_sha256 from MANIFEST
  4. jq '.components | length' on SBOM for component count
  5. SQL audit queries on permission_denials

9.2 Independent verifiability#

  1. No network access required beyond Rekor transparency log
  2. No Sigstore-corp dependency (Rekor is public, offline-queryable)
  3. GitHub Release assets are immutable
  4. Rekor entries are append-only + censorship-resistant
  5. Every release is reproducible from signed source commits

9.3 Procurement artefacts#

  1. 1-page architecture + verification commands (.project/security/architecture.md)
  2. STRIDE threat model with 9 assets · DREAD scoring · top-10 residual risks (.project/security/threat_model.md)
  3. CAIQ v4.0.3 pre-filled (60 Qs, ~75% Y/Y+P) (.project/security/caiq_lite.md)
  4. Commit-signing setup guide (.project/security/signing_setup.md)
  5. Comprehensive reference (docs/REFERENCE.md)
  6. This features catalogue (docs/FEATURES.md)

9.4 Certification track (in-flight)#

  1. SOC 2 Type I readiness — Q2 2026 via Drata or Vanta
  2. SOC 2 Type II — Q4 2026 (requires 6+ months of Type I operating)
  3. Pen test — Q2 2026 (Lucideus / Cobalt quotes in flight)
  4. ISO 27001 — post-SOC 2 Type II
  5. Formal CSA STAR Level 1 — post-SOC 2 Type I
  1. MSA template — BFSI-savvy Indian tech lawyer engagement pending
  2. DPA template — same
  3. DPIA template — same
  4. Cyber liability insurance — pending
  5. E&O insurance — pending
  6. HIPAA BAA template — pending

9.6 Honest disclosures (residual risks, documented)#

  1. Prompt-injection heuristic recall ~55-65% (defense-in-depth via G1 + G2 + G4)
  2. nsjail blocks syscall escape, not app-logic via allowed imports
  3. Paraphrased PII ("number ends in 4729") not caught
  4. Encryption at rest defends cold disk, not RCE on running API
  5. G15 RTBF has no reach into customer backups (runbook must document)
  6. Solo-dev bus factor — documentation system + ADR log mitigate
  7. Single-node SQLite writer (Postgres migration documented when needed)

9.7 Status reporting#

  1. CAIQ Lite scorecard — ~75% Y/Y+P aggregate
  2. Top-10 gaps enumerated with rupee-effort estimates (.project/security/caiq_lite.md § Top-10 gaps)
  3. Quarterly review cadence documented
  4. Update triggers documented (per release tag, per ADR, per pen-test finding)

10. Supply Chain POV#

10.1 Source integrity#

  1. All commits on master since v0.12.0 are SSH-signed
  2. GitHub "Verified" badge on all post-v0.12.0 commits (signing key registered)
  3. Branch protection: require signed commits (recommended config documented)
  4. G17 CI gate rejects unsigned commits in release range

10.2 Build integrity#

  1. CI runs on GitHub Actions ubuntu-latest (ephemeral, reproducible)
  2. Minimum permissions per job (permissions: contents: read default; id-token: write only for Sigstore steps)
  3. Deterministic tarball build (positive-list filter + sorted file list)
  4. Docker image base verified via cosign verify docker.io/library/python:3.12-slim

10.3 Artefact signing#

  1. Cosign keyless tarball signing via GitHub OIDC
  2. Cosign keyless SBOM signing
  3. Ephemeral certificates (10-min TTL from Fulcio)
  4. Rekor transparency log entry per artefact
  5. Certificate identity pinned to GitHub Actions run

10.4 SBOM#

  1. CycloneDX 1.5 JSON format
  2. Generated via cyclonedx-bom from the Python environment
  3. Published as GitHub Release asset (signed)
  4. 34 components at v0.12.0

10.5 Dependency monitoring#

  1. GitHub Dependabot enabled (security + version updates)
  2. SBOM feeds into customer vendor-risk tools (Snyk · Dependabot · Trivy · JFrog Xray)
  3. License compatibility discoverable via SBOM
  4. Dependency hash-pinning planned (pip-tools migration in backlog)

10.6 Release flow#

  1. Single tag push triggers release workflow
  2. Dispatch alternative for re-runs on existing tags
  3. Signed-commit gate → SBOM → base image verify → tarball → sign tarball → sign SBOM → GitHub Release upload
  4. 6 assets per release (tarball + sig + cert for each of tarball + SBOM)
  5. Immutable GitHub Release URL per tag

10.7 Verification surface#

  1. Customer runs cosign verify-blob with 2 flags + 3 filenames (one command per artefact)
  2. No private-key material to exchange
  3. No shared-secret ceremony
  4. Reproducible by any engineer with cosign + jq installed

11. Performance + scale POV#

11.1 Latency optimizations#

  1. HTTP connection pool for LLM calls (shared across agents)
  2. Prompt cache (first-prompt Claude pricing)
  3. Schema cache (JSON schemas computed once per tool)
  4. Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
  5. Context compaction at 80% window (avoids context-full crash)
  6. In-memory rate-limit (sliding window, no Redis hop)

11.2 Throughput#

  1. Async FastAPI + uvicorn (single-process async by default)
  2. Multi-worker uvicorn supported (uvicorn --workers N) for horizontal scale
  3. Stateless per-request design (state in SQLite, not in-memory)
  4. SQLite WAL for read-during-write concurrency
  5. Postgres migration path for high-concurrency multi-node

11.3 Storage efficiency#

  1. Conversation JSONL per agent (append-only, grep-friendly, compressible)
  2. Audit PDF target <500 KB per run (full JSONL not embedded; summaries only)
  3. Retention daemon prunes past-TTL artefacts
  4. Build-time tarball excludes noise (dist went from 624 MB to 2.7 MB)

11.4 Test runtime#

  1. Full suite: 1258 tests in ~58 seconds
  2. Parallelisable via pytest -n auto (implicit via pytest-xdist if installed)
  3. Unit tests isolated by monkeypatch on external binaries (nsjail/docker/cosign/aws not required)

11.5 Benchmarks#

  1. Permission engine: ~30 µs per tool dispatch (~17× baseline subprocess call; negligible vs ~500ms LLM latency)
  2. Conversation flush: batched 10-msg buffer, 1s interval
  3. Regression baseline stored in ml_team/tests/bench/ (nightly diff)

12. UX / Dashboard POV#

12.1 Operator surfaces#

  1. Next.js 15 + React 19 (SSR-first + server components)
  2. Tailwind CSS + shadcn/ui component library
  3. Real-time updates via WebSocket (no polling)
  4. Typed fetch client (useConfig() hook, etc.)
  5. Branding driven by GET /api/v1/config/branding (SSR-fetched, no flash)

12.2 Navigation#

  1. Sidebar with 10+ sections (pipelines, deployments, agents, transparency, cron, batch, plugins, knowledge, settings, docs)
  2. Breadcrumb trails on deep pages
  3. Role-based section visibility

12.3 Live feedback#

  1. Pipeline run live-stream (every tool call, every LLM response, every HITL gate)
  2. Conversation tree drill-down per agent
  3. Token + cost breakdown per agent
  4. Timeline view with span overlap visualization
  5. Real-time denial surfacing on the /transparency page

12.4 Admin UX#

  1. Feature flag toggle UI (all 3 tiers surfaced)
  2. Permission denial filter + search
  3. HITL gate approve/reject inline
  4. Cron job create + manual-run buttons
  5. Plugin install + inspect wizard

12.5 Auditor UX#

  1. Per-run audit PDF download button
  2. Per-run full conversation JSONL export
  3. Permission denial CSV export (planned)
  4. Retention log view (/transparency)

Counts summary#

Area Count
Agents (ML team) 40
Teams 7
Native tools 25
Guardrails 15
Permission rule sources 8
Hook events 10
LLM providers 5
Deployment templates 3
Pipeline configs (pre-built) 3
Backends 3
HITL gate types 6
Roles (RBAC) 3
API routers 23
Dashboard pages 10+
CI workflows 5
Prometheus metrics 28+
Tests 1258
Compliance frameworks mapped 8
Release assets per tag 6
CAIQ Lite questions answered 60
Residual risks registered 10 (top)
Critical assets in STRIDE 9
Training modules 8 (of 10 planned)
Implementation phase docs 27
PII detectors (G4) 12
Sandbox drivers (G2) 3
Audit signing drivers (G11) 4
Key providers (G13) 5

Maintained by: TheAiSingularity · security@theaisingularity.org Update trigger: every release, every architectural ADR, every new feature. Drift is a bug.