Skip to content

Changelog#

Full release history. Follows Keep a Changelog / Semantic Versioning where possible — swarm is pre-1.0, so minors may include breaking changes but each is flagged below.

All notable changes to swarm (the ML team agent platform).

Format follows Keep a Changelog. Version bumps are retro-fitted to commit history — no git tags exist yet (see "Next steps" at the bottom).

[Unreleased]#

Major in-flight workstream (started 2026-04-21): platform refactor from ML-specific product → composable platform (three-layer architecture: core engine + library shelf + per-customer deployments) plus a comprehensive guardrails subsystem (17 guardrails across 6 categories, INVARIANT tier for regulator-facing controls). 12-week solo-dev plan tracked in /Users/rvk/.claude/plans/resume-idempotent-wreath.md. Per-phase implementation notes under .project/implementation/. ADR: .project/decisions.md § 2026-04-21.

Completed — Track 1 / P0 — library shelf + agent_defs.py shim#

Branch feat/p0-platform-refactor. Landed 2026-04-21 across 6 commits.

Added: - Library-shelf root (/lib/) with nine category subdirectories and a generated-JSON-Schema sub-folder (lib/_schema/). - ml_team/core/lib_schemas.py — seven Pydantic v2 manifest models (Agent, Teams, Tool, Workflow, Guardrail, Profile, Deployment) with extra='forbid', SemVer-pinned AssetRef grammar, enum vocabularies for tier / integration-point / guardrail category. - scripts/gen_lib_schemas.py — regenerates JSON Schemas from Pydantic models. --check drift guard wired into the test suite. - ml_team/core/lib_loader.py — runtime asset resolver with mtime-cached LibLoader class + module-level singleton, WORKER_SUFFIX auto-application to worker agents, pinned-ref version enforcement, CLI (python -m ml_team.core.lib_loader {validate,list}). - lib/agents/base/<40 agents>/agent.yaml + lib/agents/teams/ml_pack_teams/teams.yaml — the entire ML agent set migrated out of Python dict literals. - ml_team/tests/fixtures/agent_defs_snapshot.json — immutable 50 KB ground truth snapshot of pre-flip dict shapes. - .project/implementation/01..05-*.md — per-phase implementation notebook. - .project/training/modules/01-platform-architecture.md — first platform onboarding module. - +173 new tests (39 schemas + 36 loader + 98 shim parity).

Changed: - ml_team/config/agent_defs.py (906 → 147 lines): now a PEP-562 __getattr__ shim resolving AGENT_DEFS / TEAMS / AGENT_TIER_MAP / ALL_AGENTS / SUPERVISOR_AGENTS / WORKER_AGENTS lazily from the library shelf. WORKER_SUFFIX stays a literal to avoid circular imports.

Regression: 628 baseline → 800 passed, 2 skipped (environmental: Docker daemon, kubectl) with zero behaviour change.

Deferred to P1: pipeline/workflow YAML extraction + TOOL_SETS extraction — the pipeline YAML shape doesn't yet match WorkflowManifest (uses source/target edges, node_type, etc.). Will unify when composition work lands.

Completed — Track 1 / P1 — pipeline unification + agent composition#

Added: - ml_team/core/agent_composer.py — customer agent composition primitive. compose_agent(spec) resolves extends: lib/agents/base/<id>@vX.Y.Z references, applies overlay semantics (field replacement, system_prompt_append/prepend/full-replace), correctly handles WORKER_SUFFIX across worker/supervisor boundaries, validates the composed result against AgentManifest. Includes ComposedAgent dataclass, exception hierarchy (AgentComposerError, ExtendsChainCycleError, ExtendsChainTooDeepError, InvalidComposedSpecError), defensive cycle + depth guards for future multi-tier templates. - lib/workflows/{default_ml_pipeline,fast_prototype,parallel_research}/workflow.yaml — migrated pipeline manifests with lib-shelf metadata header (schema_version, kind, id, full SemVer version). - scripts/migrate_pipelines.py — idempotent pipeline migration + drift-guard (--verify used in CI). - ml_team/tests/test_agent_composer.py — 15 tests covering standalone + extends composition, three overlay modes, worker-suffix correctness, validation errors, module singleton. - ml_team/tests/test_workflow_migration_parity.py — 8 tests proving the two path variants (legacy vs lib) produce structurally-equivalent PipelineStateGraph objects. - .project/training/modules/03-agent-composition.md — training module on customer agent authoring.

Changed: - ml_team/core/lib_schemas.py::WorkflowNode/WorkflowEdge refactored to match real pipeline shape (source/target edges — no more legacy from/to; node_type enum with stage/router/checkpoint; label, priority, metadata fields). lib/_schema/workflow.schema.json regenerated. - ml_team/core/state_graph.py::PipelineStateGraph.from_dict now prefers data["id"] over data["name"] so lib-shelf manifests (where name is human-readable) load cleanly; legacy YAMLs continue to work. - ml_team/backends/native_backend.py pipeline-config resolution order: explicit path → lib/workflows/<id>/workflow.yaml (preferred) → ml_team/config/pipelines/<id>.yaml (legacy fallback, one release) → hardcoded default.

Regression: 800 → 826 passing tests, 2 skipped. Zero failures.

Completed — Track 1 / P2 — Deployment loader + shim wiring#

Added: - lib/templates/generic_ml/profile.yaml — minimal profile baseline (no pre-activations — deployments are the source of truth). - deployments/_dev_scaffold/config.yaml — default canary deployment. Activates all 40 ML agents + 3 workflows → reproduces today's behaviour exactly. - deployments/README.md — customer deployment layout + shipping flow docs. - ml_team/core/deployment_loader.pyDeploymentLoader class + LoadedDeployment dataclass + module singleton + SWARM_DEPLOYMENT env-var resolver + error hierarchy. - ml_team/tests/test_deployment_loader.py — 13 tests (resolution priority, _dev_scaffold integration load, fixture-library isolation, custom-agent override, cache semantics, error surfaces). - .project/training/modules/04-deployments-and-activation.md — training module on deployment authoring.

Changed: - ml_team/config/agent_defs.py shim: AGENT_DEFS / AGENT_TIER_MAP now source from the active deployment (via deployment_loader) instead of the full library shelf. Fall back to the library only if deployment resolution fails (missing config, CI bootstrap). - ml_team/core/lib_schemas.py::DeploymentAgentOverride — expanded to match compose_agent overlay vocabulary (system_prompt / system_prompt_append / system_prompt_prepend) with extra='allow'; composed manifest still revalidated strictly against AgentManifest. - lib/_schema/deployment.schema.json regenerated.

End-to-end demo:

SWARM_DEPLOYMENT=/path/to/3-agent-deployment python -c \
  "from ml_team.config.agent_defs import AGENT_DEFS; print(sorted(AGENT_DEFS))"
# → 3 agents (customer composability live)

# No env var → default _dev_scaffold → 40 agents (today's behaviour)

Regression: 826 → 839 passing, 2 skipped. Zero failures.

Completed — Track 1 / P3 — Dashboard API-driven config#

Added: - ml_team/api/routers/config.pyGET /api/v1/config/branding (unauth); BrandingResponse Pydantic model; merges profile defaults + deployment overrides into nav / demo presets / pipeline config / compliance badges. - ml_team/tests/test_config_router.py — 8 tests (response shape, nav dedupe, pipeline enumeration, custom branding override, default fallback, unauth OK). - ml_team/dashboard/src/lib/config.ts — typed branding client + BRANDING_FALLBACK for SSR resilience. - ml_team/dashboard/src/components/config-provider.tsx — React context carrying SSR-fetched config.

Changed: - ml_team/dashboard/src/app/layout.tsxgenerateMetadata() + async RootLayout fetch branding server-side; wrap children in ConfigProvider. - ml_team/dashboard/src/components/sidebar.tsx — dynamic lucide-react icon lookup; product name + subtitle + compliance badges sourced from useConfig(). - ml_team/dashboard/src/app/page.tsx — overview tagline from useConfig(). - ml_team/dashboard/src/app/pipelines/page.tsx — demo presets + pipeline config dropdown sourced from useConfig() (no more hardcoded preset arrays). - ml_team/api/app.py — mount config_router unauth under /api/v1.

End-to-end proof: same Next.js binary renders different branding per SWARM_DEPLOYMENT. HDFC deployment with compliance badges + custom product name + BFSI pipeline options on the same dashboard code that renders "ML Team Agent" for _dev_scaffold.

Regression: 839 → 847 passing, 2 skipped. TypeScript type-check silent-success.

Completed — Track 1 / P4 — Profile loader + compliance gates + invariant-DENY floor#

Added: - ml_team/core/profile_loader.pyProfileLoader + LoadedProfile reading permission_baseline.yaml, compliance_gates.yaml, retention.yaml siblings of a profile manifest. - ml_team/core/compliance_gates.py — run-scoped gate verdict cache + record_tool_result / check_blocked hooks + safe-builtins deny_if evaluator + placeholder resolver for ${last_tool_result.X} / ${run_context.Y}. - ml_team/core/lib_schemas.pyPermissionRuleSpec, PermissionBaselineManifest, ComplianceGateSpec, ComplianceGatesManifest, RetentionOverridesManifest. - ml_team/core/permission_sources.py: profile_source (profile DENY → priority 60, ASK → 45) + compliance_gate_source (gate verdicts → priority 55 DENY). - ml_team/tests/test_profile_loader.py (8 tests), test_compliance_gates.py (13 tests), test_profile_permission_sources.py (13 tests).

Changed: - ml_team/core/permissions.py: INVARIANT_DENY_PRIORITY_FLOOR = 60 — DENY rules at/above this priority beat ALLOW rules. Profile DENYs are now regulator-facing controls that operator POLICY ALLOW cannot override. - ml_team/core/deployment_loader.py: LoadedDeployment.profile carries the resolved based_on profile; compliance gates activated at deployment-load time.

Engine behaviour change: pre-P4 resolution was strictly ALLOW > DENY > ASK. Post-P4 an invariant-DENY first pass runs before ALLOW — only fires for priority ≥ 60. No rule before P4 had priority > 50 so the behaviour is backward compatible.

Regression: 847 → 876 passing, 2 skipped (+29 new tests). One test-isolation fix applied (teardown uses mark_uninitialized() instead of leaving engine blank).

Completed — Track 1 / P5 — BFSI baseline template#

Added: - lib/templates/bfsi_baseline/profile.yaml — inherits generic_ml, sets branding (Swarm BFSI Edition) + compliance badges (RBI FREE-AI). - lib/templates/bfsi_baseline/permission_baseline.yaml — 5 rules: DENY execute_shell, DENY execute_python with subprocess|os.system|eval|__import__ regex, ASK register_model_deployment, DENY write_file with PAN regex, DENY write_file with Aadhaar regex. - lib/templates/bfsi_baseline/compliance_gates.yaml — 2 runtime gates: fairness (demographic-parity > 10% blocks deploy) + drift-baseline (PSI > 0.25 blocks deploy). - lib/templates/bfsi_baseline/retention.yaml — 2555-day (7-year) retention on all four artefact classes (RBI default). - lib/templates/bfsi_baseline/README.md — compliance-citation matrix mapping every control to RBI FREE-AI / HIPAA / GDPR / EU AI Act / SOC 2 / OWASP LLM clauses. - ml_team/tests/test_bfsi_baseline_e2e.py — 11 end-to-end tests: profile load, each DENY/ASK rule at engine level, invariant-DENY vs operator POLICY ALLOW, biased classifier blocked by fairness gate, unbiased classifier unblocked, branding surfacing.

Regression: 876 → 887 passing, 2 skipped.

Track 1 complete — 628 → 887 passing tests, zero regressions across 6 phases.#

Completed — Track 2 / G-runtime — Guardrails runtime scaffold#

Added: - ml_team/core/hooks.py — 5 new HookEvent values (PRE_LLM, POST_LLM, STORAGE_WRITE, LLM_CALL_WRAPPER, AGENT_DELEGATE) preserving the original 5. - ml_team/core/guardrails/ — runtime package with: - types.py: IntegrationPoint enum (7 values), GuardrailOutcome (ALLOW/REDACT/DENY/ERROR), Severity, GuardrailResult dataclass. - registry.py: thread-safe GuardrailRegistry with module singleton + @register(integration_point, id, priority) decorator. - evaluator.py: priority-ordered multi-guardrail pass per integration point. ALLOW chain / REDACT thread / DENY short-circuit / ERROR fails open unless invariant. Returns EvaluationReport. - metrics.py: Prometheus counters (guardrail_triggered_total, guardrail_bypass_attempts_total) + duration histogram. Null-stub when prometheus_client absent.

Completed — Track 2 / G3 — Prompt-injection heuristic (PRE_LLM)#

Added: - lib/guardrails/input_safety/prompt_injection_heuristic/{guardrail.yaml,patterns.yaml,enforce.py} — 12 high-severity + 13 medium-severity patterns covering chat-template tokens, "ignore previous instructions" family, developer-mode jailbreaks, role-play preambles, base64 blobs >200 chars. Block on 2+ medium or 1+ high. - Registered at PRE_LLM priority 70. Uses re.findall to count occurrences, not distinct pattern hits — "bypass X and ignore Y" correctly counts as 2 mediums. - ml_team/tests/test_guardrail_prompt_injection.py — 30+ tests.

Completed — Track 2 / G6 — Logs credential filter#

Added: - lib/guardrails/persistence/logs_credential_filter/{guardrail.yaml,enforce.py}CredentialScrubFilter(logging.Filter) with 13 known-secret patterns (anthropic before openai via negative lookahead sk-(?!ant-)), Shannon-entropy fallback for unknown high-entropy tokens ≥32 chars. install()/uninstall() attach at root logger. - ml_team/core/logging_config.py — attaches the filter at startup via importlib.util.spec_from_file_location with unique sys.modules name (avoids collisions with other guardrails' enforce.py).

Completed — Track 2 / G10 — Delegation-loop detector (AGENT_DELEGATE)#

Added: - lib/guardrails/platform_integrity/delegation_loop_detector/{guardrail.yaml,enforce.py} — per-run state _state[run_id]._RunState(stack, total). Detects (a) same (agent, args_hash) already on stack, (b) depth > max_depth (default 5), (c) fan-out > max_delegations_per_run (default 50). Two modes: strict (agent+args_hash) or name_only. - Exposes pop_delegation(run_id), reset_run, reset_all.

Completed — Track 2 / G7 — Per-user rate limit#

Changed: - ml_team/api/rate_limit.py — full rewrite. Composite key (caller_identity, endpoint_class). Identity precedence: X-API-Key (SHA-256[:12]) → JWT sub claim → client IP. Per-role limits via env vars (RATE_LIMIT_ROLE_<ROLE>_READS/WRITES). Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Role.

Completed — Track 2 / G16 — HITL timeout + escalation#

Added: - ml_team/core/approval.py — extended ApprovalGate with 4 optional fields: ttl_seconds, escalate_after_seconds, escalated_at, notify_channels. Tolerant from_dict for older on-disk gates. - ml_team/core/hitl_sweep.py — pure sweep(store, notifier, now) function. Returns SweepReport(expired_gate_ids, escalated_gate_ids, swept_at). Escalation fires before expiry in the same sweep (documented ordering). _safe_notify swallows callback exceptions.

Completed — Track 2 / G14 — Data lineage#

Added: - ml_team/api/database.py — three tables (datasets, lineage_models, lineage_deployments) with enforced foreign keys. ON DELETE CASCADE on model→deployments; ON DELETE SET NULL on dataset→models. - ml_team/core/lineage.pymake_dataset_id(uri, sha256) deterministic; record_dataset/model/deployment idempotent upserts. Queries: chain_for_deployment, subjects_in_dataset, models_for_dataset, deployments_for_model, retire_deployment. - lib/guardrails/platform_integrity/data_lineage/guardrail.yaml — manifest.

Completed — Track 2 / G15 — Right-to-be-forgotten#

Added: - lib/guardrails/persistence/right_to_be_forgotten/guardrail.yaml — manifest documenting tombstone semantics. - ml_team/core/rtbf.pyErasureReceipt dataclass with SHA-256 signature over sorted JSON payload (sign/verify round-trip). erase_subject(subject_id, requested_by, conversation_roots, dry_run): walks subjects_in_dataset → collects downstream models+deployments → deletes datasets (FK cascades) → rewrites JSONL lines matching re.escape(subject_id) to a structured [SUBJECT_DELETED] tombstone preserving line order + count. - ml_team/api/routers/subjects.pyGET /api/v1/subjects/{id}/preview (dry-run) + DELETE /api/v1/subjects/{id}. Admin-only via Depends(require_role(Role.ADMIN)). Returns {dry_run, receipt} with signed receipt. - ml_team/tests/test_guardrail_rtbf.py — 15 tests covering receipt crypto + tamper detection, lineage walk + empty-subject, dry-run no-mutation, dataset FK cascade, JSONL tombstone preservation, regex-metachar escaping, multi-file/multi-root walk, HTTP preview + delete + blank-id rejection.

Changed: - ml_team/api/app.py — registers subjects.router with the standard _auth dependency (verify_api_key); per-endpoint RBAC inside the router.

Completed — Track 2 / G1 — Egress allowlist (in-process)#

Added: - lib/guardrails/network/egress_allowlist/guardrail.yaml — manifest (tier=flag) documenting allow_hosts + allow_patterns + block_private_networks + block_schemes surface. - ml_team/core/egress_allowlist.pyEgressConfig + configure()/reset() + URL extractor (recursive walk of ctx.arguments) + host classifier (exact match, *.suffix, fnmatch glob, RFC1918/loopback/link-local/ULA check via ipaddress). egress_source(ctx) is the permission-engine rule source. - Registered in ml_team/core/permission_sources.py::init_default_sources alongside the existing 7 sources. Emits DENY rules with source="egress_allowlist"; denials land in permission_denials + permission_denials_total{source="egress_allowlist"} metric automatically. - ml_team/tests/test_guardrail_egress_allowlist.py — 28 tests: config lifecycle, private-network block (8 parametrized IPs incl. AWS IMDS 169.254.169.254, IPv6 ::1, ULA fd00::1), suffix + fnmatch patterns, exotic-scheme block, deep-nested URL extraction, dedupe, prose-false-positive guard, engine integration.

Design notes: - Module unconfigured = no-op (default ML-team deployment keeps working). - Literal-string hostname matching — no DNS resolution here (SSRF via DNS rebind is the mitmproxy sidecar's job, landing in a follow-up phase). - block_private_networks=True overrides allow-list hits (catches operator misconfig + classic SSRF targets).

Completed — Track 2 / G2 — Python execution sandbox (driver abstraction)#

Added: - lib/guardrails/execution/python_sandbox/guardrail.yaml — manifest (tier=flag, category=execution). Documents nsjail / docker / subprocess driver surface + config schema + the honest ceiling (nsjail blocks syscall escape, not app-logic bugs in allowed imports). - ml_team/core/python_sandbox.pySandboxDriver Protocol + three concrete drivers: - SubprocessDriver — portable fallback, same behaviour as today's execute_python. - NsjailDriver — builds nsjail CLI with seccomp deny-list + user ns + RO rootfs + net ns. Fails closed on non-Linux or missing binary. - DockerDriver — throwaway python:3.11-slim container with --network=none, --read-only, --rm. Probes daemon at check_available. - configure(driver, strict, memory_mb, cpu_time_sec, allow_network, read_only_rootfs, allow_paths) — module-level config. strict=True + missing driver → DriverUnavailable propagates out of configure, failing API boot (BFSI guarantee: no silent degrade). - run_python(code, timeout, cwd, env) — dispatch entrypoint. Stripped env by default (PATH/HOME/LANG/LC_ALL/PYTHONPATH + PYTHONDONTWRITEBYTECODE). - ml_team/tests/test_guardrail_python_sandbox.py — 17 tests: config lifecycle, strict vs non-strict fallback, driver availability checks (nsjail-non-linux, missing binaries), actual subprocess execution + stderr capture + timeout handling, execute_python integration preserving pre-G2 output shape when unconfigured.

Changed: - ml_team/tools/execution.py::execute_python — +40 lines of opt-in pre-block. When python_sandbox.is_configured(), routes through run_python() and emits a driver field on the response; otherwise falls through to the pre-G2 direct-subprocess path byte-for-byte unchanged.

Design notes: - Default deployment unaffected — is_configured() is False, legacy path runs. Refactor-first discipline: zero existing tests touched. - Protocol-based driver contract — adding a Firecracker / gVisor driver later is two methods + one registration line. - CLI-based nsjail/docker invocation (not Python bindings) — greppable in ops logs, portable, matches how teams already run the tools.

Completed — Track 2 / G4 — PII detection (regex + Presidio interface)#

Added: - ml_team/core/pii/ — shared core package: - types.py: PiiEntity enum (12 entity types incl. Indian BFSI set), PiiFinding dataclass, AnonymizeAction enum (redact/mask/hash). - regex_detectors.py: 12 detectors with structural validation — Luhn checksum for credit cards, Verhoeff for Aadhaar. Overlap resolver prefers higher confidence, then longer span. - anonymizer.py: right-to-left applier that substitutes findings via the three action modes. - presidio_shim.py: lazy-import wrapper for Microsoft Presidio; raises PresidioUnavailable(ImportError) when the package isn't installed. Entity-name bridge maps Presidio's output to our taxonomy. - lib/guardrails/pii/regex_pii/{guardrail.yaml,enforce.py} — always-on baseline. Registers at POST_LLM, POST_TOOL, STORAGE_WRITE at priority 60. Supports bare-string + dict payload shapes. configure(entities=[...], action=...) is the profile hook. - lib/guardrails/pii/presidio_pii/guardrail.yaml — manifest (tier=flag, unregistered by default — opt-in via profile when the customer has Presidio installed). - ml_team/tests/test_guardrail_pii.py — 29 tests: all 12 detectors positive, Luhn/Verhoeff drop invalid matches, overlap resolution, action-mode round-trips, enforce module registration + payload shapes + configure.

Design notes: - Default deployment unaffected — legacy core/guardrails.py::check_output remains in place; regex_pii is additive. Removal planned in a follow-up phase after customer sign-off. - Presidio's 300 MB weight keeps it opt-in. BFSI profile will activate; generic won't. - Verhoeff on Aadhaar dramatically reduces false positives — raw 12-digit timestamps don't validate.

Completed — Track 2 / G5 — Conversation JSONL scrubber#

Added: - lib/guardrails/persistence/conversation_scrubber/guardrail.yaml — manifest (tier=flag) documenting scrub-at-flush semantics + the on-disk-only-redaction tradeoff (no unredacted original is kept anywhere). - ml_team/core/conversation_scrubber.py — module-level ScrubberConfig + configure(active, entities, action, use_presidio, tag_redacted_lines) + scrub_line(raw). Recursive walk over dict + list values; leaf strings go through scan_regex + apply_findings from the G4 core. Presidio runs too when use_presidio=True AND the package imports successfully; falls back to regex-only with a single WARN otherwise. Non-JSON lines pass through untouched; zero-mutation lines return the original raw byte-identical. - ml_team/tests/test_guardrail_conversation_scrubber.py — 16 tests: lifecycle, field-level scrubbing (content + nested metadata + list values), clean-line byte-preservation, entity filter, action modes, _redacted tag toggle, Presidio-fallback WARN, ConversationStore integration (active → scrubbed on disk, inactive → byte-identical).

Changed: - ml_team/core/conversation_store.py::_flush_locked — +11 lines passing each buffered line through conversation_scrubber.scrub_line before disk write. Defensive try/except ensures a scrubber bug never breaks flush. Fixed a subtle shadow-vs-held-reference bug: the flush now writes from the local scrubbed list while clearing self._buffers[agent_name] explicitly.

Design notes: - Scrub at flush, not at record — live WebSocket streams see unredacted content (operator's live view); JSONL on disk sees redacted (regulator's audit view). Intentional split. - Default deployment unchanged — scrubber inactive unless the profile opts in. BFSI/HIPAA profiles will activate at bootstrap. - _redacted: true tag on mutated lines so audit dashboards can highlight them without diffing against a pre-redacted version (which doesn't exist).

Completed — Track 2 / G11 — Audit PDF signing (interface + 4 drivers)#

Added: - lib/guardrails/integrity/audit_pdf_signing/guardrail.yaml — manifest (tier=flag) documenting the 4 driver options + verification surface (the cosign verify-blob --certificate-identity=... command the customer runbook will reference). - ml_team/core/audit_signer.pySignerDriver Protocol + 4 drivers: - StubDriver — deterministic STUB_SIG_<sha256[:16]> for tests + early dev; no crypto. - OfflineEd25519Driver — pure Python via cryptography (lazy import). For air-gapped deployments; key stays on the build host. - CosignKmsDriver — shells out to cosign sign-blob --key=<kms_key_uri>. Default for BFSI (AWS/GCP/Vault Transit). - CosignKeylessDriver — Cosign + Sigstore + Rekor. Default for SaaS tenants. Parses Rekor log index from cosign stderr. - SignatureReceipt dataclass — persisted as {pdf}.signature.json sidecar so downstream verifiers have one canonical source. - configure(driver, strict, ...) matches G2's pattern — strict=True + unavailable driver → SignerUnavailable propagates, aborting API boot (BFSI no-silent-degrade guarantee). strict=False falls back to stub with a WARN. - ml_team/tests/test_guardrail_audit_signer.py — 18 tests: config lifecycle, strict vs fallback, all 4 driver availability probes, stub round-trip + determinism, receipt serialisation, export_audit_report integration.

Changed: - ml_team/tools/audit_pdf.py::export_audit_report — +21 lines post-render. When audit_signer.is_configured(), calls sign_pdf(output_path) and adds signature to the response JSON. Signing failures WARN but don't break PDF export (default PDF still writes).

Design notes: - Four drivers share one Protocol — adding a 5th (HSM smart card, KMIP) is two methods + one _build_driver branch. - Sidecar receipt pattern (.sig + .pem + .signature.json) keeps verification one-command-per-artifact. Customer runbook reads the JSON and runs the command it specifies. - Stub ships in production tree (not tests/) so operators can wire it during integration testing without test-only code. Manifest clearly tags it test-only.

Completed — Track 2 / G12 + G13 — Encryption at rest + BYOK (KeyProvider)#

Added: - lib/guardrails/persistence/encryption_at_rest/guardrail.yaml — manifest (tier=flag) covering both BYOK surface + at-rest driver selection, with the whitepaper-grade threat-model callout ("cold DB file or disk image, NOT RCE on running API"). - ml_team/core/encryption.py — one module for both G12 and G13: - KeyProvider Protocol — name, kek_uri, check_available, wrap(plaintext_dek, context), unwrap(wrapped_dek, context). - Four provider implementations: StubProvider (deterministic base64, tests only), EnvKeyProvider (AES-GCM-256 via SWARM_KEK env var, dev), AwsKmsProvider (shells out to aws kms encrypt/decrypt with encryption-context), plus fail-fast gcp_kms / vault_transit branches that surface a clear "not implemented" message. - WrappedDek + Ciphertext dataclasses with to_dict/from_dict — JSON-serialisable end-to-end. - envelope_encrypt(plaintext, context) / envelope_decrypt(ct, context) — AES-GCM-256 with per-call DEK. AAD = sorted context dict. Decrypting with mismatched context fails AEAD integrity. - configure(provider, kek_uri, at_rest_driver, context, strict) — same strict=True fail-closed pattern as G2/G11. at_rest_driver ∈ {sqlite_host_fs_only, sqlcipher, postgres_pgcrypto}; actual DB migration is customer-scheduled (not automatic — irreversible). - ml_team/tests/test_guardrail_encryption.py — 21 tests: config lifecycle, provider availability probes (stub, env, aws_kms, gcp_kms, vault_transit), strict vs fallback, envelope round-trip, context AAD mismatch, serialisation, JSON transport.

Design notes: - aws CLI rather than boto3 — keeps default deployment dep footprint clean, matches how ops teams invoke AWS KMS manually. - Env provider is explicitly dev-only and documented as such. - Stub wraps via base64 (not identity) so tests can't mask production encryption bugs. - Module is unconfigured by default — no integration into ConversationStore / DB writes yet. Profile bootstrap wiring lands with G1/G2/G4/G5/G11 plumbing in a follow-up phase.

Completed — Track 2 / G17 — SBOM + signed commits (CI gate)#

Added: - .github/workflows/release-supply-chain.yml — release-tag workflow that enforces: (1) signed commits in the release range via git log --pretty=format:%G? filtering (anything not G/U fails), (2) CycloneDX SBOM generation, (3) Cosign-keyless signing of tarball + SBOM with GitHub OIDC, (4) base-image signature verification for python:3.12-slim (WARN on failure — upstream policy out of our control). - scripts/gen_sbom.py — stdlib-only wrapper around cyclonedx-py CLI. Probes both the console-script + python -m cyclonedx_py fallback. Exits with install hint when the dep is missing. - lib/guardrails/platform_integrity/sbom_signed_commits/guardrail.yaml — manifest documenting the CI gate + customer verification surface. - SECURITY.md — new "Supply-chain integrity (G17)" section with the three cosign verify-blob / jq / cosign verify commands a CISO runs to reproduce the chain offline via Sigstore Rekor. - ml_team/tests/test_guardrail_sbom.py — 11 tests: script importability + --help, missing-cyclonedx-py exit path, workflow YAML validity, step-presence assertions (signed-commit check, SBOM generation, Cosign sign of tarball + SBOM, base-image verify).

Design notes: - CI-only — no runtime Python code. Regression surface is entirely test-level assertions against the YAML + helper script. - Keyless Sigstore over GPG-keyed signatures — Rekor transparency log gives customer CISOs non-repudiation without us managing long-lived keys. BFSI customers who want offline verification outside Sigstore re-sign with G11's cosign_kms driver using their own KMS. - CycloneDX 1.5 over SPDX — better Python tooling (cyclonedx-bom actively maintained). SPDX via conversion if a customer asks.

Track 2 COMPLETE — 887 → 1217 passing tests (+330), all 15 in-scope guardrails landed.#

Track 2 summary#

# Guardrail Phase Tests added Commit
G-runtime Runtime scaffold (registry + evaluator + hook events) 12 (wave 1)
G3 Prompt-injection heuristic 13 (wave 1)
G6 Logs credential filter 14 (wave 1)
G10 Delegation loop detector 15 (wave 1)
G7 Per-user rate limit 16 (wave 1)
G16 HITL timeout + escalation 17 (wave 1)
G14 Data lineage 18 +22 9c6535c
G15 Right-to-be-forgotten 19 +15 33fd65d
G1 Egress allowlist (in-process) 20 +28 e0da426
G2 Python execution sandbox 21 +17 4377577
G4 PII detection (regex + Presidio) 22 +29 bf71daa
G5 Conversation scrubber 23 +16 cd26d55
G11 Audit PDF signing 24 +18 b4cd0b5
G12+G13 Encryption at rest + BYOK 25 +21 c483f09
G17 SBOM + signed commits 26 +11 (this commit)

Completed — Ship tooling — swarm deploy CLI#

Added: - ml_team/deploy/__init__.py — package re-exports (scaffold, validator, manifest, ship, whitepaper). - ml_team/deploy/scaffold.pynew_deployment(customer, template) writes deployments/<customer>/config.yaml + README + branding/knowledge placeholders. Customer name validation (snake_case, 2-40 chars), three templates (generic_ml / bfsi_baseline / hipaa_baseline), overwrite protection. - ml_team/deploy/validator.pyvalidate(customer) lints: missing / unparseable config, customer-name mismatch, bad lib ref format, missing lib assets on disk. Returns ValidationReport with error/warning findings. - ml_team/deploy/manifest.pybuild_manifest(customer) produces a MANIFEST.yaml with swarm_core_version, swarm_lib_manifest (every activated lib ref + pinned version), deployment_config_sha256, build_timestamp, build_commit, build_host. Sort-keyed YAML for determinism. - ml_team/deploy/ship.pyship(customer, version) produces dist/<customer>-vX.Y.Z/<customer>-vX.Y.Z.tar.gz + MANIFEST.yaml. Build-time positive-list filter includes ml_team/, lib/, scripts/, top-level files + deployments/<customer>/ + deployments/_dev_scaffold/; excludes every other deployments/<other>/, .git/, .venv/, __pycache__/, ml_team/tests/, pipeline_runs/, node_modules/. Validation runs by default; --skip-validate for emergencies. - ml_team/deploy/whitepaper.pygenerate(customer) produces a 5-section security whitepaper (Deployment Summary · Active Guardrails table · Compliance Mapping · Verification commands · Residual Risks honesty callout). Markdown output. BFSI / HIPAA / generic templates drive framework selection. - ml_team/cli.py — +84 lines: new deploy subparser with four subcommands (new, validate, ship, whitepaper) + handler functions that emit JSON. - ml_team/tests/test_deploy_cli.py — 24 tests: scaffold happy path + name validation + overwrite guard, validator with 5 failure modes + happy path, manifest lib-ref extraction + deterministic YAML + missing-git fallback, ship build-time filter verified at tar-member level (other customers excluded, _dev_scaffold + customer included), whitepaper 5-section generation + BFSI framework selection, CLI argparse tree + dispatch to handlers.

End-to-end smoke test:

python -m ml_team.cli deploy new alice_corp --template=bfsi_baseline
python -m ml_team.cli deploy validate alice_corp
python -m ml_team.cli deploy ship alice_corp --output=./dist
# → dist/alice_corp-v0.1.0/{alice_corp-v0.1.0.tar.gz, MANIFEST.yaml, alice_corp_whitepaper.md}

Design notes: - Positive-list filter — every new customer is isolated by default. Adding customers doesn't require exclude-list maintenance. - No Cosign shell-out from the CLI — signing is the CI pipeline's job (G17's release workflow or G11's audit_signer called manually). Keeps the local dev loop painless. - Whitepaper is Markdown, not PDF — grep-able, diffable, versionable. PDF rendering via tools/audit_pdf.py's ReportLab layer is a follow-up when a customer asks for formal branding.

Ship tooling status — 1217 → 1241 passing tests (+24), swarm deploy CLI operational.#

12-week plan — END STATE#

  • Track 1 (P0-P5): 6 phases · zero regressions · customer-composable codebase (628 → 887 tests)
  • Track 2 (G1-G17): 9 waves · 15 guardrails · BFSI-ready security posture (887 → 1217 tests)
  • Ship tooling: 1 phase · per-customer tarball + MANIFEST + whitepaper (1217 → 1241 tests)
  • 628 → 1241 tests (+613), zero regressions across the entire arc
  • Every commit signed TheAiSingularity <singularitytheai@gmail.com>, no Co-Authored-By trailers

Planned — post-plan follow-ups#

  • profile_loader bootstrap wiring for the 7 configure() hooks (G1, G2, G4, G5, G11, G12+G13, conversation_scrubber) — one consolidated commit, ~40 lines
  • First real release tag v0.12.0 — will shake out 1-2 CI release-workflow adjustments (OIDC permissions, tag parsing)
  • swarm deploy diff + swarm deploy rotate-secret subcommands (Doppler-wired)
  • mitmproxy sidecar for G1's second ring (when a customer's infra team requires network-level egress enforcement)
  • GCP / Vault Transit KeyProvider implementations (when a customer demands)
  • PDF whitepaper rendering via tools/audit_pdf.py ReportLab layer (when a customer requests formal branding)

Planned — Ship tooling#

  • swarm deploy CLI (new, validate, dev, ship, diff, rotate-secret)
  • Doppler secrets per customer · Cosign signing · SBOM generation · MANIFEST.yaml · per-customer security whitepaper PDF

Prior roadmap (superseded where overlapping)#

  • Pilot polish for BFSI: screencast demo, single "compliance architecture" doc for CISO review, scripted auditor-reply flow → rolled into P5 (BFSI baseline template) + auto-generated security whitepaper
  • Plugin compat Phase 2 (if marketplace adoption grows): prompt/agent/http hook command types; 10+ additional CC hook events; team_factory integration of plugin-contributed agents → still relevant; slotted after Track 2
  • Async orchestration core (only with a ≥4-concurrent-pipelines customer signal) → deferred, customer-signal-gated
  • Postgres migration path for multi-node deployments → absorbed into G12 (encryption at rest) for BFSI path

[0.11.0] — 2026-04-20 — Claude Code marketplace plugin compatibility#

Commits: dd44c5365438bc (5 commits — one per phase)

Added#

  • Phase A — Drop telemetry (dd44c53): PluginInstallDrops dataclass + install_drops_json SQLite column record every plugin surface seen on disk but not registered. Silent skips in hooks.load_from_plugin upgraded from DEBUG to WARNING. Drops payload surfaces in GET /api/v1/plugins/{name}.
  • Phase B — Shell-command hooks (34b2fde): ml_team/core/shell_hook_runner.py executes CC's {"type": "command", ...} hooks behind the new plugin_shell_hooks_enabled feature flag (EXPERIMENT tier, default OFF). Security model: invoke-time validation reusing run_bash allowlist, ${CLAUDE_PLUGIN_ROOT} substitution, scrubbed env, rlimits on Linux, hard timeout (10s default, 60s max), per-execution audit row in new plugin_shell_executions SQLite table. Exit 2 blocks; JSON stdout {"mutation": {...}} lifted into HookResult.
  • Phase C — commands/ directory (f405eab): ml_team/core/commands_registry.py scans commands/*.md, registers each with optional $ARGUMENTS substitution. REST at GET /plugins/commands + POST /plugins/commands/{qname}/invoke. Feature flag plugin_commands_enabled (FLAG tier, default ON).
  • Phase D — agents/ directory (65438bc): ml_team/core/agents_registry.py scans agents/*.md, forces plugin-{name}::{agent} namespacing so no plugin can shadow a built-in AGENT_DEFS. REST at GET /plugins/agents[?plugin=] + GET /plugins/agents/{qname}. Feature flag plugin_agents_enabled (FLAG tier, default ON).
  • Phase E — Smoke + docs: test_plugin_compat_smoke.py installs real superpowers v5.0.7 from the CC cache and asserts 100% surface registration (14 skills + 1 shell hook + 3 commands + 1 agent, zero silent drops). Automatically skipped in CI when the cache isn't present.

Changed#

  • hooks.load_from_plugin now parses CC's nested {matcher, hooks: [{type: command|python, ...}]} shape (Phase B).
  • scan_install_drops updated phase by phase: command no longer counts as an unsupported type (Phase B); commands/ + agents/ dirs no longer count as drops (Phases C + D). unsupported_hook_types is now strictly prompt / agent / http.
  • PluginInstallation dataclass + _row_to_installation + _save_installation all carry the drops payload.

Security#

  • All shell-hook execution is feature-flag gated, default OFF. Install-time whitelist still applies + runtime command validation + BFSI-grade audit trail.

Tests#

  • +59 new: test_plugin_install_drops.py (13), test_plugin_shell_hooks.py (14), test_plugin_commands.py (16), test_plugin_agents.py (14), test_plugin_compat_smoke.py (2).
  • Regression: 603 passing (was 545), 1 skipped, 0 failing.

Empirical result#

Installing superpowers v5.0.7 before this cycle: 14/14 skills + 0/1 hooks + 0/3 commands + 0/1 agents = ~25% surface retention. After: 14/14 + 1/1 + 3/3 + 1/1 = 100% surface retention, 0 silent drops.


[0.10.2] — 2026-04-20 — Documentation Phase 2#

Commit: 4ad7a9b

Added#

  • ml_team/tools/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/backends/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/config/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/dashboard/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/tests/IMPLEMENTATION_README.md (IMPL-only by design — tests are their own doc)

Changed#

  • .github/workflows/doc-drift.yml — advisory CI guard now covers 7 subsystems (Phase 1 + Phase 2)

Rationale#

Closes the two-layer doc rollout. hello-swarm deliberately stays on the Phase-1 plugin-README shape (plugins aren't subsystems).


[0.10.1] — 2026-04-20 — Documentation Phase 1#

Commit: f65c27c

Added#

  • MASTER_README.md at repo root — product + system source of truth
  • ml_team/core/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/api/{IMPLEMENTATION,LEARNING}_README.md
  • ml_team/api/routers/{IMPLEMENTATION,LEARNING}_README.md
  • .github/workflows/doc-drift.yml — advisory CI guard for documented subsystems

Rationale#

New-engineer ramp-up was measured in days. Two-layer model — IMPL (engineering contract) + LEARNING (conceptual) — cuts it to hours while giving regulators a stable doc surface to quote from.


[0.10.0] — 2026-04-20 — Week 7: Compliance + Ops pack#

Commits: 89562631a1a0c1 (7 commits, +113 tests; 432 → 545 green)

Added#

  • W7-1 Unified permission engine (8956263, 72b6c3c)
  • ml_team/core/permissions.py — ALLOW > DENY > ASK > default pipeline with glob tool matching, optional arg regex, priority tiebreak, lazy init
  • ml_team/core/permission_sources.py — 5 default sources: RBAC, agent allowlist, feature flag, HITL, YAML policy
  • ml_team/core/permission_audit.py — SQLite permission_denials persistence + ml_team_permission_decisions_total metric
  • ml_team/api/routers/permissions.pyGET /api/v1/permissions/denials?since=&tool=&agent=
  • ml_team/config/permission_policies.yaml — operator-authored rules (empty default)
  • W7-2 Hook lifecycle (e6e6268)
  • ml_team/core/hooks.py — 5 events: SESSION_START, PRE/POST_TOOL, PRE/POST_COMPACTION
  • AgentRunner integration; plugin-loader ingestion of hooks/hooks.json
  • Reference PII-mask handler in examples/plugins/hello-swarm/
  • W7-3 Cron scheduler (6d1f17e) — vendored from Hermes
  • ml_team/core/cron.py + cron_tasks.py — 4 task kinds (retrain / drift_check / audit_pdf / custom)
  • File-backed store at ~/.swarm/cron/jobs.json, 60s daemon tick
  • REST at /api/v1/cron/*, /cron dashboard page, swarm cron CLI subcommand
  • W7-4 Batch runner (aa4e87a) — vendored from Hermes
  • ml_team/core/batch.py + batch_processors.py — JSONL → inference / echo / custom processors
  • Checkpoints every 10 records, streams results.jsonl, resume-on-restart
  • REST at /api/v1/pipelines/{run_id}/batch

Changed#

  • ToolExecutor.execute + CompositeToolExecutor.execute + require_role + require_approval all route through the permission engine
  • feature_flags.py — added hooks_enabled, cron_scheduler, batch_runner
  • api/database.py::init_db() — adds permission_denials table

Fixed#

  • Cron first-run sentinel flake: interval schedules now fire immediately on boot (previously drifted past the 60s tick)
  • Cron output filename collisions under sub-second job runs (microsecond precision in filename)

Docs#

  • ADRs for all four W7 items in .project/decisions.md (3b0b18c)
  • /transparency dashboard refresh with denial panel

[0.9.0] — 2026-04-20 — Context compaction + evaluator separation#

Commits: fc6c188, d65e305

Added#

  • ml_team/core/context_compaction.py — summarise oldest middle-messages at 80% context window; mechanical fallback on summariser failure
  • ml_team/core/evaluator.py — clean-context grade on agent terminal response with 0–5 score + verdict-override

[0.8.0] — 2026-04-20 — Plugin ecosystem + MCP streamable-HTTP + CLI#

Commits: 54007f3, 1c1a42f, 3fa5aa5, ac7c931, 00db188, f3e018c, a6d1bd7

Added#

  • ml_team/cli.pyswarm CLI: auth, features, pipelines, deployments
  • ml_team/core/plugin_loader.py (Phase A) — Claude Code plugin install/uninstall/reload with SHA-256 manifest pinning; .mcp.json ingestion
  • Plugin skill ingestion (Phase B) — SKILL.md parsing, keyword match, system-prompt injection
  • Streamable-HTTP (SSE) MCP transport — spec 2025-11-25
  • /plugins dashboard page + expanded /transparency
  • Intra-agent parallel tool dispatch (2.9× speedup on multi-tool turns)

Fixed#

  • Critical: ApprovalRequired now propagates through ToolExecutor (was being swallowed by a broad except Exception). HITL gates now fire reliably.

[0.7.0] — 2026-04-20 — Perf + feature flags + retention + transparency#

Commits: b7c82a68f972bb, plus bd330bc, d5fb344, 335660a, 3da2a4a, 96a0fd1, 54007f3

Added#

  • ml_team/core/feature_flags.py — central registry with 3 tiers: INVARIANT / FLAG / EXPERIMENT; resolution order runtime → env → alias → default
  • ml_team/core/retention.py — daemon that prunes conversation JSONL, run_events rows, audit PDFs, shadow predictions past TTL
  • /settings admin controls + /transparency read-only flags + metrics catalogue
  • Micro-benchmark harness + frozen baseline + nightly bench workflow

Changed#

  • Week-1 perf: shared HTTP pool + shared schema cache + Anthropic prompt caching
  • Batched SQLite event writes (6.5× speedup) + per-agent JSONL buffers (3.9× speedup)

[0.6.0] — 2026-04-20 — Week 2: Dashboard UI + CI + docs browser#

Commits: 7142f248152239, fb263eb, 55eeec4, 84cac56, 9e5799f, de3709b, 32dc01a

Added#

  • Login page + deployments view + auth context + OIDC helpers in the dashboard
  • In-app docs browser + sidebar login/logout footer
  • PR workflow + nightly real-LLM golden path
  • System-design brief + extending guide wired into in-app browser

Fixed#

  • CI blockers — broken build-backend, ruff import-sort cleanup
  • Dashboard: missing highlight.js dep for docs page

[0.5.0] — 2026-04-20 — Week 1: BFSI compliance MVP#

Commits: 1e699ef0e92e03 (security, RBAC, tiers, SSO, audit)

Added#

  • Security: per-agent tool allowlists (1e699ef), tool_denied_total metric surfaced at /metrics
  • RBAC MVP: 3 roles (admin/operator/viewer) + JWT + backward-compat API key (817b1b8)
  • Deploy pipeline: real model packaging + Kubernetes manifest generation (0037d9c)
  • Tier-1: train_classifier tool + observability wiring (2e5925a)
  • Tier-2 — RBI FREE-AI compliance bundle: drift detector + fairness audit + SHAP explainability + model cards (712d243)
  • Tier-3 — Champion-challenger MVP: model registry + shadow-traffic log + promotion gate (307ea1c)
  • Audit: single-document PDF export for regulatory sign-off (2a7956e)
  • SSO: OIDC authorization-code flow for Okta / Azure AD / Google (9c6f23f)

Housekeeping#

  • CrewAI / plain-LangGraph prototypes archived under deprecated/ (cb05248)
  • Project scaffolding, architecture docs, AI tool configs (a9a44df)

[0.4.0] — 2026-04-13 — Phase 8: Production readiness#

Commits: 94624bd, 2b090cf, 16e0e94, 410f1c9, 1ab2524, 106c7bb, c44d4b8, 487dc3e, 389d561, 18c1d41, eb2f642, 735d2c7

Added#

  • P0 security: all P0 vulnerabilities fixed
  • P1 infrastructure: bounded thread pool, WebSocket bridge, rate limiting, structured logging
  • P2 reliability: OpenTelemetry tracing, Prometheus metrics, cost budgeting
  • P3 observability: error boundaries, guardrails, Makefile, pre-commit, loading states
  • Dashboard features: chat, HITL UI, controls, persistence, training logs, 3-dot menu, artifact downloads
  • Circuit breaker + 3 specialist agents (LLM, vision, repo_researcher) + dataset explorer + model playground + 8 algorithm repos
  • Comprehensive README (architecture, setup, API, dashboard, tools) + service start/stop/status/logs commands
  • Production readiness test suite: 115/115 pass

Fixed#

  • 6 dashboard bugs: feedback crash, graph status, quality page, polling, error recovery

[0.3.0] — 2026-04-13 — Phases 5–7.5: Hardening + HITL + MCP + StateGraph#

Commits: 532a7bb, ae3781c, 1ca811a, 97a702a

Added#

  • Phase 5 — Agent hardening: 37/37 operational rules, span-based observability, evaluation framework
  • Phase 6 — HITL + persistence: approval gates, project memory (SQLite), org memory (PostgreSQL)
  • Phase 7 — Integration: MCP client, RAG knowledge store, parallel team execution
  • Phase 7.5 — Polish: StateGraph execution, conversation transparency, demo presets

[0.2.0] — 2026-04-12–13 — Phases 0–4: Framework-agnostic core + REST + dashboard#

Commits: 5cba253, 04066e7, 75d8672, 1591d5a, 5a95c60

Added#

  • Phase 1: framework-agnostic core with native / LangGraph / CrewAI backends
  • Phase 2: agent memory, per-agent rules, post-run feedback loop
  • Phase 3: REST API (FastAPI), pipeline execution, agent inspection, WebSocket streaming
  • Phase 4: algorithm repos + customer dashboard

Fixed#

  • Phase 0: INVALID_CHAT_HISTORY crash + path-resolution bugs

[0.1.0] — 2026-04-12 — Initial swarm#

Commits: 8207424, 2c48f38

Added#

  • Multi-agent swarm with 3-model vLLM Docker deployment
  • ML Team Agent baseline: 32 agents, 7 teams, 23 tools

Cross-cutting state (current HEAD 4ad7a9b)#

Dimension Count
Agents 40 across 7 teams (algorithm 9, data 6, deployment 5, evaluation 5, management 4, quality 5, training 6)
Tools 38 callable primitives, 33 tool sets
Algorithm repos 18 (tabular, vision, NLP, fine-tuning)
REST routers 18+ (auth, pipelines, deployments, permissions, cron, batch, plugins, features, …)
Feature flags 20+ registered, 3 tiers
Tests 545 passing, 1 skipped (Docker), 0 failing
Commits 73 total (Apr 12 → Apr 20)
Documented subsystems 7 (two-layer READMEs) + MASTER_README + advisory CI

Next steps for versioning hygiene#

  1. Bump ml_team/pyproject.toml + ml_team/dashboard/package.json to 0.10.2 — both are still 0.1.0.
  2. Annotate git tags retroactivelygit tag -a v0.5.0 0e92e03 -m "Week 1: BFSI compliance MVP" through v0.10.2 4ad7a9b. Signed tags if you maintain a signing key.
  3. Adopt semver going forward. Customer-facing API changes bump minor; bugfixes patch. 1.0.0 when the first BFSI pilot signs off.
  4. Add a PR checklist item — "Did you update CHANGELOG.md under [Unreleased]?" — so this file stops being my job to reconstruct.