Features Catalogue#
Canonical source: docs/FEATURES.md in the repo. This page is rendered via include-markdown and stays in sync automatically on every push to master.
Every feature organized by point of view — product / technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply chain / performance / UX. 573 numbered features. Each has a file-path or test-file citation for verifiability.
Version: v0.12.0 Last updated: 2026-04-22 Scope: every feature, organized by point of view. Technical / security / compliance / operator / developer / ML practitioner / auditor / CISO / supply-chain. Every feature has a file-path or test-file citation for verifiability.
For module-organized reference, see
docs/REFERENCE.md. For architecture narrative, seeMASTER_README.md.
Table of contents#
- Product / Customer POV — what you can do with it
- Technical / Engineering POV — what's under the hood
- Security POV — guardrails + controls
- Compliance / Regulatory POV — framework mappings
- Operator POV — day-to-day running
- Developer / Integrator POV — how to extend
- ML Practitioner POV — ML-specific capabilities
- Auditor POV — evidence artifacts
- CISO / Procurement POV — evaluation artifacts
- Supply Chain POV — build + release integrity
- Performance + scale POV
- UX / Dashboard POV
1. Product / Customer POV#
1.1 Autonomous end-to-end ML pipeline#
- Accept a problem statement + a dataset path; return a trained model, fairness audit, drift report, SHAP explanations, model card, audit PDF, Docker image, and K8s manifests — end-to-end autonomous (
POST /api/v1/pipelines) - 3 pre-built pipeline configs (
fast_prototype·default_ml_pipeline·parallel_research) + customer-authored YAML workflows (lib/workflows/) - Human-in-the-loop gates at 6 points: deploy · data request · manual · security · cost · custom (
ml_team/core/approval.py) - Checkpoint-and-resume on every HITL gate — pipeline state persists through pauses (
ml_team/core/approval.py::ApprovalGate) - Run cancellation via
DELETE /api/v1/pipelines/{run_id}— graceful shutdown with partial artefact preservation - Real-time run status + WebSocket live-streaming (
/api/v1/pipelines/{id}/ws)
1.2 Multi-agent architecture#
- 40 specialized ML agents across 7 teams (Data · Algorithm · Training · Evaluation · Deployment · Quality · Management) (
lib/agents/base/) - 3-tier hierarchy: director → coordinator → worker with
transfer_to_*delegation (ml_team/core/orchestrator.py) - ReAct loop per agent — LLM decides → calls tool → consumes result → iterates (
ml_team/core/agent_runner.py) - Evaluator-generator separation — optional per-agent rubric grading (
ml_team/core/evaluator.py) - Context compaction at 80% of window to keep long runs alive (
ml_team/core/context_compaction.py) - Customer-specific agents via YAML
extends:overlays (ml_team/core/agent_composer.py)
1.3 Multi-provider LLM support#
- Anthropic Claude (first-class)
- OpenAI GPT-4o / o1 (first-class)
- vLLM — local GPU serving for air-gapped deployments
- Ollama — quantised local models
- Single-model override for all agents (dev/testing)
- Per-agent provider selection (mix small local + frontier in one pipeline) (
ml_team/core/llm_client.py)
1.4 Customer-composable deployment model#
- Three-layer architecture:
core/(never forked) +lib/(versioned shelf) +deployments/<customer>/(composition) (v0.12.0) SWARM_DEPLOYMENT=deployments/hdfc_bankat boot selects which customer's full stack runs (ml_team/core/deployment_loader.py)- Three deployment templates:
generic_ml·bfsi_baseline(RBI FREE-AI) ·hipaa_baseline(stub) (lib/templates/) - Per-deployment branding (product_name · logo · colors · compliance badges) surfaced at
/api/v1/config/branding(ml_team/api/routers/config.py) - Per-deployment knowledge bases (RAG corpus) in
deployments/<customer>/knowledge/ - Per-deployment permission policy + retention overrides (
lib/templates/<>/retention.yaml)
1.5 Dashboard#
- Next.js 15 + React 19 + TypeScript operator dashboard
- Live pipeline view with conversation tree · trace · cost breakdown · timeline
- Real-time updates via WebSocket (no polling)
- OIDC login (Okta · Azure AD · Google Workspace) + username/password fallback
- Champion / challenger deployment management view
- 40-agent roster with per-agent tool sets + conversation trails
1.6 CLI (swarm)#
- Auth:
login/logout/whoami/health - Feature flags:
features list|get|set|reset - Pipelines:
pipelines list|run|status|cancel - Plugins:
plugins list|inspect|install - Cron:
cron list|create|run|delete|runs - Batch:
batch list|submit|status|results|resume - Deployments (runtime):
deployments list|promote|retire - Ship pipeline (v0.12.0):
deploy new|validate|ship|whitepaper
1.7 Pricing + deployment posture (product positioning)#
- VPC-installable — data never leaves customer network (on-prem / AWS ap-south-1 / Azure Pune / Yotta)
- Single-tenant per customer (multi-tenancy deferred)
- $25K pilot → $75K deployment → $10K/mo retainer framework (see MASTER_README.md § Pricing)
- First production model target: 8 weeks from kickoff
2. Technical / Engineering POV#
2.1 Runtime#
- Python 3.11+ (tested on 3.11 · 3.12)
- FastAPI 0.115+ async backend, Starlette ASGI
- Pydantic v2.9+ with
extra="forbid"on every manifest schema - Next.js 15 SSR + server components dashboard
- Thread-local SQLite connection pool with WAL mode + FK enforcement (
ml_team/api/database.py) - HTTP connection pool + prompt cache + schema cache (W1 optimizations)
2.2 Data persistence tiers#
- SQLite (primary) — runs · users · permission_denials · lineage (G14) · champion-challenger · approvals · plugin installs (12 tables)
- JSONL per agent — conversation store with buffered flushes (10 msgs or 1s threshold)
- Postgres (optional) — cross-project memory for multi-project orgs
- ChromaDB + keyword fallback — RAG
- Per-run work_dir — scratch for metrics · model cards · fairness JSON · SHAP · drift · audit PDFs
2.3 Agent runtime internals#
- Supervisor-worker orchestration, custom implementation (not LangChain) (
ml_team/core/orchestrator.py) - Two swappable backends: native + LangGraph sharing one agent config
- CrewAI adapter (legacy migration path)
- Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
- Per-agent memory (ephemeral during run) + cross-run learning via
save_agent_learning - Delegation tool-call synthesis (
transfer_to_<agent>) injected at supervisor level
2.4 Permission engine (8 rule sources)#
- RBAC source — translates
require_role(min)into DENY rules - Agent allowlist source — per-agent tool_set enforcement
- Feature flag source — auto-skip tools when their flag is off
- HITL source — approval-required tools
- Policy source — operator-authored
permission_policies.yaml - Profile source — template-baked rules (BFSI etc.); emits at priority 45 (ask) or 60 (deny)
- Compliance-gate source — runtime gate verdicts become DENY rules (priority 55)
- Egress allowlist source (G1, v0.12.0) — URL walker + host classifier
- Tier-aware resolution: ALLOW > DENY > ASK > default with priority tiebreaks (
ml_team/core/permissions.py) - Invariant-DENY floor at priority 60 — profile rules beat operator POLICY ALLOW at 50
2.5 Hook lifecycle (10 integration points)#
- SESSION_START · PRE_TOOL · POST_TOOL · PRE_COMPACTION · POST_COMPACTION (pre-Track 2)
- PRE_LLM · POST_LLM · STORAGE_WRITE · LLM_CALL_WRAPPER · AGENT_DELEGATE (v0.12.0)
- Plugin-loaded hooks compose with core hooks (same pipeline)
- Shell-command hooks (
{"type": "command", ...}) executed behind feature flag with rlimits (ml_team/core/shell_hook_runner.py) - Per-execution audit rows in
plugin_shell_executionsSQLite table
2.6 Guardrails runtime (v0.12.0)#
GuardrailRegistry(thread-safe) with@register(point, id, priority)decorator- Priority-sorted handler evaluation per integration point
- Outcome model: ALLOW chains · REDACT threads payload · DENY short-circuits · ERROR fails open unless invariant
- 15 guardrails split across 6 categories (detail in § 3)
- Auto-configuration from deployment
guardrail_configsblock viaguardrail_bootstrap.bootstrap_from_deployment() - Prometheus counters (
guardrail_triggered_total,guardrail_bypass_attempts_total) + duration histogram
2.7 Authentication + authorization#
- JWT Bearer (HS256, 24h TTL, rotatable secret via Doppler)
- Legacy
X-API-Keyheader (admin-equivalent, backcompat) - OIDC SSO — Okta / Azure AD / Google Workspace via Authlib (PKCE + state cookie)
- IdP group → swarm role mapping via
ML_TEAM_OIDC_ROLE_MAP - 3 roles: viewer / operator / admin
require_role(min_role)FastAPI dependency routes denials through permission engine- bcrypt password hashing (12 rounds)
- Admin bootstrap from env vars at first boot
2.8 Observability#
- Prometheus metrics on every subsystem (28+ counters/histograms) at
/api/v1/metrics - OpenTelemetry spans with parent/child, tokens, costs
- Real-time WebSocket streaming of pipeline events
- Structured JSON logs (
ml_team/core/logging_config.py) - G6 credential filter at root logger (all logs scrubbed at source)
- Per-agent conversation JSONL (durable, grep-friendly)
2.9 Plugin ecosystem (Claude Code marketplace format)#
- Skills · MCPs · hooks · commands · agents — all installable via one
.mcp.json+ manifest - MCP client over JSON-RPC (stdio + SSE) (
ml_team/core/mcp_client.py) - Install-time manifest validation + install-drop tracking (
scan_install_drops) - Namespace isolation — plugin agents forced to
plugin-<name>::<agent>so no shadowing - Feature-flag gated plugin shell hooks with invoke-time validation
- CC marketplace compat — tested against
superpowersv5.0.7 (100% surface retention)
2.10 Ship pipeline (v0.12.0)#
swarm deploy new <customer> --template=<>scaffoldsdeployments/<customer>/(ml_team/deploy/scaffold.py)swarm deploy validate— lint config + lib refs + customer-name match (ml_team/deploy/validator.py)swarm deploy ship— build-time positive-list filter · MANIFEST.yaml · whitepaper (ml_team/deploy/ship.py)swarm deploy whitepaper— 5-section markdown whitepaper (ml_team/deploy/whitepaper.py)- Per-customer build-time isolation — other customers'
deployments/<other>/excluded at tarball build time (tested at tar-member level intest_ship_excludes_other_customer_dirs) - Deterministic
MANIFEST.yamlwith pinned lib versions + config SHA-256 + build commit + host + timestamp (sort-keyed YAML)
2.11 Ops primitives (W7)#
- Cron scheduler — 60s tick, file-backed store, 4 task kinds (retrain · drift_check · audit_pdf · custom) (
ml_team/core/cron.py) - Batch runner — JSONL → processor → results.jsonl; checkpoints every 10 records; resume-on-restart (
ml_team/core/batch.py) - Retention daemon — 24h sweep, per-artefact TTLs (2555d BFSI default) (
ml_team/core/retention.py) - Feature-flag registry with 3 tiers: INVARIANT / FLAG / USER_OVERRIDE (
ml_team/core/feature_flags.py)
2.12 Tests (1258 total, 2 skipped)#
- Unit tests per guardrail (17-30 each)
- Integration tests via FastAPI
TestClient(23 routers) - End-to-end BFSI baseline test — biased-model → deployment blocked
- Tamper-evident bundle hash regression test
- Snapshot parity tests (pre- vs post-refactor byte-identical dicts)
- Full regression gate in CI (matrix py3.11 + py3.12)
- Nightly real-LLM golden-path run (
.github/workflows/nightly-e2e.yml) - Performance bench baselines (
ml_team/tests/bench/) with nightly diff
2.13 CI/CD#
- GitHub Actions on every push + PR to master (5 workflows)
- Lint (ruff + mypy) · pytest (matrix) · bandit HIGH gate · semgrep ERROR gate (
ci.yml) - Release workflow on
v*.*.*tags with signed-commit gate + SBOM + Cosign (release-supply-chain.yml) - SARIF upload to GitHub Security tab for semgrep findings
- Coverage XML artifact upload
- Advisory doc-drift check (
doc-drift.yml)
3. Security POV#
3.1 Network controls#
- G1 Egress allowlist (in-process) — URL walker + RFC1918/loopback/link-local/ULA block (
ml_team/core/egress_allowlist.py) - Suffix patterns (
*.example.com) + fnmatch allow-patterns - Scheme-level block list (file, gopher, dict, etc.)
- Literal-string host matching (no DNS); mitmproxy sidecar for DNS-rebind coming in follow-up
- Audit trail via
permission_denialswith source=egress_allowlist
3.2 Execution sandbox#
- G2 Python sandbox driver abstraction —
SandboxDriverProtocol (ml_team/core/python_sandbox.py) - nsjail driver (Linux production) — seccomp + user namespace + RO rootfs + net namespace
- Docker driver (macOS dev) — throwaway
python:3.11-slimwith--network=none --read-only --rm - Subprocess driver (portable fallback) with loud WARN on each invocation
strict=True+ unavailable driver → API boot aborts (no silent degrade)- Per-call resource limits:
memory_mb/cpu_time_sec/allow_network/allow_paths
3.3 Input safety#
- G3 Prompt-injection heuristic — 25 patterns (12 high + 13 medium severity) at PRE_LLM priority 70
- Occurrence counting via
re.findall(multi-hit aware) - OpenAI multi-part + bare-string + single-dict payload shapes handled
- System role skipped by default (operator authoring system prompts isn't attacking)
- Honest ceiling disclosed in manifest: ~55-65% recall vs Lakera ~80-87%
3.4 Output + persistence safety#
- G4 PII detection — 12 regex detectors with structural validation (
ml_team/core/pii/regex_detectors.py) - Luhn check on credit cards (drops false-positive digit runs)
- Verhoeff check on Aadhaar (kills false-positive 12-digit strings)
- Indian BFSI recognisers: Aadhaar · PAN · IFSC · IN_PHONE
- International: EMAIL · CREDIT_CARD · IBAN · IPV4 · IPV6 · PRIVATE_KEY_BLOCK · US_SSN · US_PHONE
- 3 action modes: redact / mask (keep first 2 + last 2) / hash (SHA-256 prefix)
- Overlap resolver: higher-confidence wins → longer span → registration order
- Registered at POST_LLM + POST_TOOL + STORAGE_WRITE (three integration points)
- Optional Microsoft Presidio shim (lazy import; 300MB spaCy model opt-in) (
ml_team/core/pii/presidio_shim.py) - G5 Conversation JSONL scrubber — wraps
ConversationStore._flush_locked(ml_team/core/conversation_scrubber.py) - Recursive value walk (scans strings in dicts + lists)
- Byte-identical output on zero-mutation lines (no disk churn)
_redacted: truetag on mutated lines (audit grep)- Non-JSON lines pass through untouched (defensive)
- G6 Logs credential filter —
logging.Filtersubclass at root logger - 13 known-secret regex patterns (sk-…, ghp_…, AKIA…, BEGIN PRIVATE KEY, JWT, etc.)
- Anthropic before OpenAI via negative lookahead
sk-(?!ant-) - Shannon-entropy fallback ≥4.2 bits/char on ≥32-char tokens
- Scrubs
record.__dict__fields too (structured logging caught) - Replaces with
[REDACTED_SECRET_<sha256[:8]>](stable placeholder allows correlation without leaking value)
3.5 Rate limiting + cost controls#
- G7 Per-user rate limit — composite
(caller_identity, endpoint_class)key (ml_team/api/rate_limit.py) - Identity precedence: X-API-Key (SHA-256[:12]) → JWT sub → client IP
- Per-role limits: viewer 100r/0w · operator 600r/20w · admin 2000r/100w per minute (env-configurable)
- Response headers:
X-RateLimit-{Limit,Remaining,Role}
3.6 Agent safety#
- G10 Delegation loop detector — stack depth cap + fan-out cap + same-args dedup (
lib/guardrails/platform_integrity/delegation_loop_detector/) - Default caps: max_depth=5, max_delegations_per_run=50
- Two scopes: strict (agent + args_hash) or name_only
- Per-run state dict (isolated across runs)
- G16 HITL TTL + escalation —
ApprovalGatefields + cron sweep (ml_team/core/hitl_sweep.py) - Pure function
sweep(store, notifier, now) → SweepReport - Escalation fires before expiry in the same sweep (documented ordering)
- Monotonic timer for expiry + wall-clock for display (clock-skew safe)
3.7 Data protection at rest#
- G12 Encryption at rest — AES-GCM-256 envelope (
ml_team/core/encryption.py) - 3 at-rest driver options:
sqlite_host_fs_only(default) ·sqlcipher·postgres_pgcrypto - Per-call DEK + customer-wrapped KEK
- AAD = sorted context dict; decrypt with mismatched context fails AEAD integrity
WrappedDek+Ciphertextdataclasses, JSON-serialisable end-to-end- G13 BYOK
KeyProviderProtocol — 5 implementations StubProvider(tests, deterministic base64)EnvKeyProvider(dev viaSWARM_KEKenv var)AwsKmsProvider(production — shells toaws kms encrypt/decrypt)GcpKmsProvider(fail-fast stub pending customer demand)VaultTransitProvider(fail-fast stub pending customer demand)- Threat model explicitly scoped: "cold DB file / disk image" ≠ "attacker with RCE on running API"
3.8 Audit trail + erasure#
- G14 Data lineage — 3 SQLite tables with enforced FKs (
ml_team/api/database.py) datasets(dataset_id PK +consent_doc_refG15 index)lineage_models(model_id PK + dataset_id FK ON DELETE SET NULL)lineage_deployments(deployment_id PK + model_id FK ON DELETE CASCADE)chain_for_deployment(deployment_id)returns full joined dict for audit PDF- Helper APIs:
record_dataset/model/deployment(idempotent upserts) (ml_team/core/lineage.py) - G15 Right-to-be-forgotten — admin-only endpoint + signed receipt (
ml_team/api/routers/subjects.py) GET /api/v1/subjects/{id}/preview(dry-run)DELETE /api/v1/subjects/{id}(execute)- Cascade delete of dataset rows · FK sets model dataset_id to NULL · conversation JSONL rewritten to tombstones
ErasureReceiptdataclass with SHA-256 signature over sorted JSON (ml_team/core/rtbf.py)- Regex metacharacters in subject_id escaped (no ReDoS)
_subject_patternusesre.escape(tested intest_regex_metacharacters_in_subject_id_are_escaped)- Tombstone preserves line order + count (downstream JSONL parsers still work)
3.9 Audit PDF signing#
- G11 Audit PDF signing — 4-driver abstraction (
ml_team/core/audit_signer.py) - Stub driver (deterministic test signature)
- Offline Ed25519 driver (air-gap, key on build host)
- Cosign KMS driver (AWS/GCP/Vault — BFSI default)
- Cosign keyless driver (Sigstore + GitHub OIDC — SaaS default)
- Sidecar receipt JSON written alongside
.sig+.pemfiles (canonical source for downstream verifiers) - Rekor log index captured for keyless signatures
3.10 Supply-chain integrity#
- G17 SBOM + signed commits — CI gate (
.github/workflows/release-supply-chain.yml) - Every commit in release range verified via
git log %G?(non-G/U fails workflow) - CycloneDX 1.5 JSON SBOM via
cyclonedx-bom(scripts/gen_sbom.py) - Cosign keyless tarball + SBOM signing via GitHub OIDC
- Base-image
cosign verifyonpython:3.12-slim(WARN-not-block on upstream policy changes) - 6 signed release assets per
v*.*.*tag - Rekor transparency log entry for every signed artefact
3.11 Cryptography inventory#
- AES-GCM-256 (envelope encryption)
- Ed25519 (offline audit signing)
- ECDSA-P256 (Cosign)
- bcrypt 12 rounds (password at rest)
- HS256 (JWT)
- SHA-256 (hashing, fingerprints)
- No deprecated algorithms — no MD5 for auth, no SHA-1 for signing, no RC4/DES/3DES anywhere
3.12 RBAC + access control#
- 3-role enum: viewer < operator < admin
- Per-endpoint role guards on all sensitive routes
- Admin-only endpoints:
/subjects/*(G15 RTBF),/permissions/denials(audit),/features/*(flag mutation) - JWT revocation: grace-period + secret rotation (documented; no revocation list yet)
- Session state in JWT only (no server-side session store to compromise)
3.13 Static security analysis#
- bandit (HIGH severity gate in CI) (
ml_team/pyproject.toml::[tool.bandit]) - semgrep
p/python+p/security-audit(ERROR severity gate) - MEDIUM findings reviewed and annotated inline with
# nosec BXXX+ rationale (4 suppressions) - SARIF upload to GitHub Security tab for every scan
- 0 HIGH bandit findings · 0 ERROR semgrep findings at v0.12.0
3.14 Threat model#
- STRIDE analysis across 9 critical assets (
.project/security/threat_model.md) - DREAD scores on every row
- Top-10 residual risk register with owners + status
- Cross-cutting attack scenarios documented (prompt-injection exfil chain, compromised dev laptop, backup-exfiltration + RTBF, supply-chain via transitive dep)
- Quarterly review cadence documented
4. Compliance / Regulatory POV#
4.1 Framework coverage (8 frameworks)#
- RBI FREE-AI (India) — Pillars 2, 3, 5, 6 full; 1, 4, 7 partial
- DPDP Act 2023 (India) — §§ 8, 10(8), 12 full
- EU AI Act (high-risk) — Arts. 10, 12, 14, 15 full; Art. 13 partial
- HIPAA Security Rule — 164.308, 164.312, 164.514, 164.528 (controls present; BAA template pending)
- GDPR — Arts. 5, 17, 22, 30, 32 (controls present)
- SOC 2 — CC6.x, CC7.x, CC8.1 (design ready; Type I readiness Q2 2026)
- OWASP LLM Top 10 — direct controls for LLM01, 02, 04, 05, 06, 08, 10
- NIST AI RMF 1.0 — Govern 1.4/1.6/1.7, Map 2.3/4.1, Manage 2.2/2.3
4.2 BFSI / Indian-specific controls#
- RBI FREE-AI Pillar 2 (Consent) — G4 PII detection + G15 erasure
- Pillar 3 (Robustness) — G3 prompt-injection + G10 delegation-loop + drift/fairness gates
- Pillar 5 (Accountability) — G14 data lineage + G11 audit-PDF signing
- Pillar 6 (Security) — G1 + G2 + G6 + G12 + G13 + G17
- DPDP Act § 8 (data fiduciary) — G14 + G15
- DPDP Act § 10(8) (named DPO) — self-designated 2026-04-22 (
SECURITY.md) - DPDP Act § 12 (erasure) — G15 with signed receipt
- CERT-In 6-hour breach notification — runbook pending; technical capability (audit logs + metrics) in place
- RBI 7-year retention (2555 days) — BFSI baseline default
4.3 Compliance artefact generation#
- Model card (Markdown, RBI-aligned structure) via
tools/model_card.py - Fairness audit JSON (fairlearn MetricFrame, per-group metrics) via
tools/fairness.py - Drift report (PSI + KS + chi², BFSI thresholds 0.10/0.25) via
tools/drift.py - SHAP explanations JSON via
tools/explainability.py - Audit PDF with tamper-evident source-bundle SHA-256 on cover
- Signed conversation JSONL per agent (retention-policy governed)
- Retention log (
retention_log.json) documenting every artefact deletion permission_denialsSQLite table — every denial source-attributed
4.4 Invariant-DENY guarantees#
- Profile DENY rules emit at priority 60 → CANNOT be overridden by operator POLICY ALLOW at 50
- Invariant tier feature flags CANNOT be toggled at runtime
strict=Trueguardrails fail API boot if driver unavailable (no silent degrade)- Tested end-to-end in
test_bfsi_baseline_e2e.py
4.5 Procurement artefact pack#
- 1-page security architecture diagram (
.project/security/architecture.md) - STRIDE threat model with DREAD scoring (
.project/security/threat_model.md) - Pre-filled CAIQ v4.0.3 questionnaire (60 Qs, ~75% Y/Y+P) (
.project/security/caiq_lite.md) - Commit-signing setup guide (
.project/security/signing_setup.md) - README pack index with audience-routing guide (
.project/security/README.md)
5. Operator POV#
5.1 Installation#
pip install -e "ml_team/.[ml]"— one-step install on a clean Python 3.12 venv.envtemplate for OPENAI_API_KEY + JWT secret + admin bootstrapuvicorn ml_team.api.app:app— zero-config local start- SQLite default (no DB service to run)
- Optional Docker image (planned)
5.2 Configuration#
- 25+ environment variables (documented in README.md § Configuration reference)
- YAML deployment config (
deployments/<customer>/config.yaml) — declarative, Pydantic-validated - Per-template profile defaults (
lib/templates/<>/) - Per-customer overrides (branding · retention · guardrail configs)
- Operator-authored custom policy rules (
ml_team/config/permission_policies.yaml)
5.3 Dashboards#
/— pipelines list + live feed/pipelines/[id]— drill-down/deployments— champion/challenger/agents— 40-agent roster/transparency— denial log · retention · cron · batch (one operator-facing "everything" page)/cron— scheduler/plugins— marketplace install + inspect/knowledge— RAG corpus management/settings— feature flags (admin only)/docs— in-app documentation browser
5.4 Monitoring + alerting#
- Prometheus
/api/v1/metricsendpoint (unauthenticated, industry standard) - 28+ counters + histograms covering every subsystem
- Key counters:
pipelines_started_total·llm_calls_total{agent,model}·tool_calls_total·tool_denied_total·permission_denials_total·guardrail_triggered_total{name,outcome}·guardrail_bypass_attempts_total·active_pipelines - Key histograms:
pipeline_duration_seconds·llm_call_duration_seconds·guardrail_evaluation_duration_seconds - OpenTelemetry traces (parent/child) with token/cost metadata
- Structured JSON logs (Loki/Splunk/CloudWatch-ready)
- WebSocket event streaming for real-time dashboard updates
5.5 Ops primitives#
- Cron scheduler with 4 task kinds (retrain, drift_check, audit_pdf, custom)
- Batch runner (JSONL → processor → results.jsonl, 10-record checkpoints)
- Retention daemon (24h sweep, per-artefact TTLs)
- Feature flag admin UI (
/settings) - Runtime feature-flag overrides via
POST /api/v1/features/{name}(admin) - HITL approval UI at
/pipelines/[id](gate type + rationale surfaced)
5.6 Secrets management#
- Doppler per-customer projects (
swarm-<customer>-{dev,staging,prod}) .envfile fallback for dev- SOPS + age migration path documented (no vendor lock-in)
swarm deploy rotate-secretplanned (not yet implemented)
5.7 Backups + disaster recovery#
- SQLite dumpable via
sqlite3 .dump(trivial backup) - JSONL files rsync-friendly (append-only)
- Host-FS encryption recommended (LUKS · EBS-KMS · GCP PD · Azure Disk encryption)
- Customer-controlled backup policy (documented in deployment runbook)
- Postgres migration path for multi-node HA (not yet executed)
5.8 Governance contacts#
- Named DPO (DPDP Act § 10(8)) —
security@theaisingularity.org - Named Security Officer — same contact
- 24h acknowledgement SLA + 5-business-day substantive SLA
- CERT-In 6h mandate when applicable
6. Developer / Integrator POV#
6.1 Tool authoring#
- Plain Python functions with type hints become tools automatically
- Docstring doubles as LLM-facing tool description
- JSON schema auto-generated from type hints (Pydantic under the hood)
- Lib asset manifest for versioned distribution (
lib/tools/<id>/tool.yaml) - Per-tool tests in
lib/tools/<id>/tests/
6.2 Agent authoring#
- Author YAML manifest under
lib/agents/base/<id>/agent.yaml— no Python required extends: lib/agents/base/<base>@vX.Y.Zcomposition- Overlay actions:
system_prompt_append·system_prompt_prepend·system_prompt_replace· field replace knowledge_bases:list pointing at Markdown/PDF corpus filestool_set:referencing lib tools + custom per-deployment toolstier:(director / coordinator / worker) with routing implicationsmodel:per-agent override (frontier for directors, small for workers)evaluator_rubric:optional grading criteria
6.3 Guardrail authoring#
- Create
lib/guardrails/<category>/<id>/guardrail.yaml+enforce.py - Register at one or more integration points via
@register(point, "id", priority=N) - Handler signature:
(payload: dict) -> GuardrailResult configure()function at module level → add toguardrail_bootstrap._CONFIGURATORSdispatch table- Tests alongside the guardrail (recommended: one red-team test minimum)
6.4 Plugin authoring (CC marketplace format)#
- Standard Claude Code plugin layout:
.mcp.json+skills/+hooks/+commands/+agents/ - Install via
swarm plugins install <name> - Hooks can be Python or shell (shell gated by
plugin_shell_hooks_enabledflag) - Commands with
$ARGUMENTSsubstitution - Agents namespace-isolated (
plugin-<name>::<agent>) - Install-drop telemetry shows silently-skipped surfaces
6.5 Workflow authoring#
- YAML manifest at
lib/workflows/<id>/workflow.yaml - Graph model: nodes (stage/router/checkpoint) + edges (source/target, priority, metadata)
- Node types: stage (tool call), router (branching), checkpoint (HITL pause)
- Hot-reload via
SWARM_DEPLOYMENTrestart (no mid-run reload)
6.6 Permission rule authoring#
- YAML in
permission_baseline.yamlfor profile templates ORml_team/config/permission_policies.yamlfor operator rules - Fields:
id·tool_name(glob) ·behavior(allow/deny/ask) ·pattern(regex on args JSON) ·priority·reason - Rules loaded at boot; restart to reload
6.7 Compliance gate authoring#
- YAML in
compliance_gates.yamlper profile template - Fields:
id·triggers_on_tool·computes_via_tool·deny_if(restricted-eval expression) ·blocks_tool - Restricted eval uses
_SAFE_BUILTINS(no open/exec/import)
6.8 Deployment template authoring#
- Create
lib/templates/<name>/withtemplate.yaml·permission_baseline.yaml·compliance_gates.yaml·retention.yaml·branding.json·README.md - Referenced from deployments via
based_on: lib/templates/<name>@vX.Y.Z - Auto-inherited by every deployment using the template
6.9 REST API#
- OpenAPI 3.0 auto-generated at
/docs(dev mode) - 23 routers covering pipelines · agents · models · evaluations · MCP · knowledge · chat · datasets · inference · deployments · features · plugins · permissions · cron · batch · subjects · auth · config · docs
- Consistent auth: Bearer JWT or X-API-Key
- Structured error responses via FastAPI/Pydantic
6.10 CLI library#
- stdlib
argparse+httpx(no new deps) - Persistent JWT at
~/.swarm/token - JSON output by default;
--tablefor compact view on list commands - Exit non-zero on API errors
7. ML Practitioner POV#
7.1 Classification#
train_classifiertool — LightGBM / XGBoost / RandomForest / Logistic (ml_team/tools/training.py)- Stratified split (sklearn) with customer-configurable ratio
- Metrics sidecar (JSON): accuracy · precision · recall · F1 · AUC · per-class
- Post-save verification (model must load + predict correctly)
- Optional MLflow logging
7.2 Drift detection#
detect_drifttool — 3 statistical tests (ml_team/tools/drift.py)- Population Stability Index (PSI) with BFSI thresholds 0.10 / 0.25
- Kolmogorov-Smirnov test (continuous features)
- Chi-squared test (categorical features)
- Per-feature JSON output + aggregate drift score
bfsi_drift_baseline_gate— deploy refused when max PSI > 0.25
7.3 Fairness audit#
audit_fairnesstool — fairlearn MetricFrame (ml_team/tools/fairness.py)- Per-protected-attribute group metrics: accuracy · precision · recall · F1 · selection rate
- Binary disparate-impact scalars: demographic parity · equal opportunity · equalized odds
bfsi_fairness_gate— deploy refused when demographic parity > 0.1
7.4 Explainability#
explain_modeltool — SHAP (ml_team/tools/explainability.py)- Tree-native TreeExplainer fast path (100× faster than KernelExplainer)
- Generic KernelExplainer fallback for non-tree models
- Top-N feature importance + per-sample attribution
- JSON output consumed by model card generator
7.5 Model card#
generate_model_cardtool (ml_team/tools/model_card.py)- Markdown document following RBI FREE-AI structure
- Sections: metadata · training data · metrics · fairness · drift · SHAP · intended use · limitations · contact
- Embedded in audit PDF
7.6 Deployment artefacts#
package_model— Dockerfile +serve.py(FastAPI/predict+/health) + requirements.txt (ml_team/tools/deploy.py)generate_k8s_manifests— Deployment + Service + HPA (Horizontal Pod Autoscaler)- BFSI-hardened defaults:
runAsNonRoot·readOnlyRootFilesystem·allowPrivilegeEscalation: false·capabilities.drop: ['ALL'] - Optional
docker buildstep - Post-build verification (image must run + respond to /health)
7.7 Champion-challenger#
- Model registry with deployment_id · traffic_pct · environment (
ml_team/tools/champion_challenger.py) - Atomic champion promotion (old champion → retired; new → active)
- Shadow prediction log for challenger A/B analysis
- Configurable agreement thresholds before promotion
- Full history via
GET /api/v1/deployments/{model}+ UI at/deployments
7.8 Batch inference#
- Batch runner: JSONL input → processor → results.jsonl output (
ml_team/core/batch.py) - 3 processor kinds: inference (calls a registered model) · echo (debug) · custom (operator-authored)
- Checkpoints every 10 records
- Resume-on-restart from last checkpoint
- Streams results via
GET /api/v1/batch/{id}/results
7.9 Experiment tracking#
- Optional MLflow integration (
ml_team/tools/mlflow_tools.py) - Auto-log training run if MLflow URI set
7.10 Memory#
- Per-run JSON memory in work_dir
- Cross-run SQLite memory (
recall_past_runs,save_agent_learning) - Cross-project Postgres memory (optional, multi-project orgs)
- RAG retrieval via ChromaDB (keyword fallback when embeddings unavailable)
8. Auditor POV#
8.1 On-disk evidence artefacts (per run)#
pipeline_runs/<run_id>/metrics.json— training metrics sidecarpipeline_runs/<run_id>/*_card.md— model cardpipeline_runs/<run_id>/fairness.json— fairness audit outputpipeline_runs/<run_id>/drift.json— drift reportpipeline_runs/<run_id>/shap.json— SHAP explanationspipeline_runs/<run_id>/approvals.json— HITL gate state (who approved, when, rationale)pipeline_runs/<run_id>/audit_report_<run_id>.pdf— signed audit PDFpipeline_runs/<run_id>/conversations/<agent>.jsonl— per-agent full turn logpipeline_runs/<run_id>/conversations/_index.json— agent hierarchy + timing
8.2 SQLite compliance ledger#
permission_denialstable — every denial with source · reason · run_id · agent · tsrunstable — pipeline run state + profile_at_creationrun_eventstable — per-run event streammodel_deploymentstable — champion/challenger historyshadow_predictionstable — challenger agreement logapprovalstable — HITL gate statedatasetstable — dataset lineage with consent_doc_reflineage_modelstable — training run lineagelineage_deploymentstable — deployment lineageplugin_installations+plugin_shell_executions— plugin audit
8.3 Audit PDF content#
- Cover page with: run ID · profile · build timestamp · bundle SHA-256 · signature + cert fingerprint · Rekor index
- Metrics section (training + per-class)
- Model card (Markdown embedded)
- Fairness section (per-group + scalar disparate-impact)
- Drift section (per-feature + aggregate)
- SHAP section (top feature importance)
- Approval gates section (who approved each HITL gate)
- Conversation summaries (per-agent message + tool-denial counts, full JSONL not embedded for size)
- Data lineage section (G14: dataset → model → deployment chain)
8.4 Retention#
- 2555-day (7 year) retention for BFSI — configurable per artefact class
- Daemon sweep every 24h (
ml_team/core/retention.py) - Deletion logged to
retention_log.json(immutable append log) - Classes:
conversations_days·run_events_days·audit_pdfs_days·shadow_predictions_days
8.5 Audit queries (SQL-greppable)#
- "Who denied what in the last 24h?" — one SQL query on
permission_denials - "Which dataset trained which model?" —
chain_for_deployment(deployment_id) - "Every G1 egress denial by target host" — GROUP BY on
permission_denials.reason - "Every model that trained on subject X" —
subjects_in_dataset(consent_doc_ref)+models_for_dataset - "Every model card produced in Q3" — SQLite timestamp range
8.6 RTBF verification#
- Signed erasure receipt with SHA-256 over sorted JSON (excluding signature field)
ErasureReceipt.verify()— rehash + compare- Tombstone marker
[SUBJECT_DELETED]withdeleted_subject+deleted_atfields - Audit row preserved (tombstone, not true delete — regulator's evidence survives)
- Line-for-line replacement (preserves ordering + line count)
8.7 Conversation tamper-evidence#
- Per-line sequence numbers (monotonic per agent)
- JSONL flush lock (thread-safe buffered writes)
- Audit PDF bundle hash includes conversation filename + size manifest
- Bundle hash stable under identical source artefacts
9. CISO / Procurement POV#
9.1 One-command verification#
cosign verify-blobon the release tarball (reproducible offline via Rekor)cosign verify-blobon the SBOM- SHA-256 reproduction of
deployment_config_sha256from MANIFEST jq '.components | length'on SBOM for component count- SQL audit queries on
permission_denials
9.2 Independent verifiability#
- No network access required beyond Rekor transparency log
- No Sigstore-corp dependency (Rekor is public, offline-queryable)
- GitHub Release assets are immutable
- Rekor entries are append-only + censorship-resistant
- Every release is reproducible from signed source commits
9.3 Procurement artefacts#
- 1-page architecture + verification commands (
.project/security/architecture.md) - STRIDE threat model with 9 assets · DREAD scoring · top-10 residual risks (
.project/security/threat_model.md) - CAIQ v4.0.3 pre-filled (60 Qs, ~75% Y/Y+P) (
.project/security/caiq_lite.md) - Commit-signing setup guide (
.project/security/signing_setup.md) - Comprehensive reference (
docs/REFERENCE.md) - This features catalogue (
docs/FEATURES.md)
9.4 Certification track (in-flight)#
- SOC 2 Type I readiness — Q2 2026 via Drata or Vanta
- SOC 2 Type II — Q4 2026 (requires 6+ months of Type I operating)
- Pen test — Q2 2026 (Lucideus / Cobalt quotes in flight)
- ISO 27001 — post-SOC 2 Type II
- Formal CSA STAR Level 1 — post-SOC 2 Type I
9.5 Legal artefacts (track, not yet landed)#
- MSA template — BFSI-savvy Indian tech lawyer engagement pending
- DPA template — same
- DPIA template — same
- Cyber liability insurance — pending
- E&O insurance — pending
- HIPAA BAA template — pending
9.6 Honest disclosures (residual risks, documented)#
- Prompt-injection heuristic recall ~55-65% (defense-in-depth via G1 + G2 + G4)
- nsjail blocks syscall escape, not app-logic via allowed imports
- Paraphrased PII ("number ends in 4729") not caught
- Encryption at rest defends cold disk, not RCE on running API
- G15 RTBF has no reach into customer backups (runbook must document)
- Solo-dev bus factor — documentation system + ADR log mitigate
- Single-node SQLite writer (Postgres migration documented when needed)
9.7 Status reporting#
- CAIQ Lite scorecard — ~75% Y/Y+P aggregate
- Top-10 gaps enumerated with rupee-effort estimates (
.project/security/caiq_lite.md§ Top-10 gaps) - Quarterly review cadence documented
- Update triggers documented (per release tag, per ADR, per pen-test finding)
10. Supply Chain POV#
10.1 Source integrity#
- All commits on master since v0.12.0 are SSH-signed
- GitHub "Verified" badge on all post-v0.12.0 commits (signing key registered)
- Branch protection: require signed commits (recommended config documented)
- G17 CI gate rejects unsigned commits in release range
10.2 Build integrity#
- CI runs on GitHub Actions ubuntu-latest (ephemeral, reproducible)
- Minimum permissions per job (
permissions: contents: readdefault;id-token: writeonly for Sigstore steps) - Deterministic tarball build (positive-list filter + sorted file list)
- Docker image base verified via
cosign verify docker.io/library/python:3.12-slim
10.3 Artefact signing#
- Cosign keyless tarball signing via GitHub OIDC
- Cosign keyless SBOM signing
- Ephemeral certificates (10-min TTL from Fulcio)
- Rekor transparency log entry per artefact
- Certificate identity pinned to GitHub Actions run
10.4 SBOM#
- CycloneDX 1.5 JSON format
- Generated via
cyclonedx-bomfrom the Python environment - Published as GitHub Release asset (signed)
- 34 components at v0.12.0
10.5 Dependency monitoring#
- GitHub Dependabot enabled (security + version updates)
- SBOM feeds into customer vendor-risk tools (Snyk · Dependabot · Trivy · JFrog Xray)
- License compatibility discoverable via SBOM
- Dependency hash-pinning planned (pip-tools migration in backlog)
10.6 Release flow#
- Single tag push triggers release workflow
- Dispatch alternative for re-runs on existing tags
- Signed-commit gate → SBOM → base image verify → tarball → sign tarball → sign SBOM → GitHub Release upload
- 6 assets per release (tarball + sig + cert for each of tarball + SBOM)
- Immutable GitHub Release URL per tag
10.7 Verification surface#
- Customer runs
cosign verify-blobwith 2 flags + 3 filenames (one command per artefact) - No private-key material to exchange
- No shared-secret ceremony
- Reproducible by any engineer with
cosign+jqinstalled
11. Performance + scale POV#
11.1 Latency optimizations#
- HTTP connection pool for LLM calls (shared across agents)
- Prompt cache (first-prompt Claude pricing)
- Schema cache (JSON schemas computed once per tool)
- Intra-agent parallel tool calls (3-5× speedup, experiment-flagged)
- Context compaction at 80% window (avoids context-full crash)
- In-memory rate-limit (sliding window, no Redis hop)
11.2 Throughput#
- Async FastAPI + uvicorn (single-process async by default)
- Multi-worker uvicorn supported (
uvicorn --workers N) for horizontal scale - Stateless per-request design (state in SQLite, not in-memory)
- SQLite WAL for read-during-write concurrency
- Postgres migration path for high-concurrency multi-node
11.3 Storage efficiency#
- Conversation JSONL per agent (append-only, grep-friendly, compressible)
- Audit PDF target <500 KB per run (full JSONL not embedded; summaries only)
- Retention daemon prunes past-TTL artefacts
- Build-time tarball excludes noise (dist went from 624 MB to 2.7 MB)
11.4 Test runtime#
- Full suite: 1258 tests in ~58 seconds
- Parallelisable via
pytest -n auto(implicit via pytest-xdist if installed) - Unit tests isolated by
monkeypatchon external binaries (nsjail/docker/cosign/aws not required)
11.5 Benchmarks#
- Permission engine: ~30 µs per tool dispatch (~17× baseline subprocess call; negligible vs ~500ms LLM latency)
- Conversation flush: batched 10-msg buffer, 1s interval
- Regression baseline stored in
ml_team/tests/bench/(nightly diff)
12. UX / Dashboard POV#
12.1 Operator surfaces#
- Next.js 15 + React 19 (SSR-first + server components)
- Tailwind CSS + shadcn/ui component library
- Real-time updates via WebSocket (no polling)
- Typed fetch client (
useConfig()hook, etc.) - Branding driven by
GET /api/v1/config/branding(SSR-fetched, no flash)
12.2 Navigation#
- Sidebar with 10+ sections (pipelines, deployments, agents, transparency, cron, batch, plugins, knowledge, settings, docs)
- Breadcrumb trails on deep pages
- Role-based section visibility
12.3 Live feedback#
- Pipeline run live-stream (every tool call, every LLM response, every HITL gate)
- Conversation tree drill-down per agent
- Token + cost breakdown per agent
- Timeline view with span overlap visualization
- Real-time denial surfacing on the
/transparencypage
12.4 Admin UX#
- Feature flag toggle UI (all 3 tiers surfaced)
- Permission denial filter + search
- HITL gate approve/reject inline
- Cron job create + manual-run buttons
- Plugin install + inspect wizard
12.5 Auditor UX#
- Per-run audit PDF download button
- Per-run full conversation JSONL export
- Permission denial CSV export (planned)
- Retention log view (
/transparency)
Counts summary#
| Area | Count |
|---|---|
| Agents (ML team) | 40 |
| Teams | 7 |
| Native tools | 25 |
| Guardrails | 15 |
| Permission rule sources | 8 |
| Hook events | 10 |
| LLM providers | 5 |
| Deployment templates | 3 |
| Pipeline configs (pre-built) | 3 |
| Backends | 3 |
| HITL gate types | 6 |
| Roles (RBAC) | 3 |
| API routers | 23 |
| Dashboard pages | 10+ |
| CI workflows | 5 |
| Prometheus metrics | 28+ |
| Tests | 1258 |
| Compliance frameworks mapped | 8 |
| Release assets per tag | 6 |
| CAIQ Lite questions answered | 60 |
| Residual risks registered | 10 (top) |
| Critical assets in STRIDE | 9 |
| Training modules | 8 (of 10 planned) |
| Implementation phase docs | 27 |
| PII detectors (G4) | 12 |
| Sandbox drivers (G2) | 3 |
| Audit signing drivers (G11) | 4 |
| Key providers (G13) | 5 |
Maintained by: TheAiSingularity · security@theaisingularity.org Update trigger: every release, every architectural ADR, every new feature. Drift is a bug.