Changelog#
Full release history. Follows Keep a Changelog / Semantic Versioning where possible — swarm is pre-1.0, so minors may include breaking changes but each is flagged below.
All notable changes to swarm (the ML team agent platform).
Format follows Keep a Changelog. Version bumps are retro-fitted to commit history — no git tags exist yet (see "Next steps" at the bottom).
[Unreleased]#
Major in-flight workstream (started 2026-04-21): platform refactor from ML-specific product → composable platform (three-layer architecture: core engine + library shelf + per-customer deployments) plus a comprehensive guardrails subsystem (17 guardrails across 6 categories, INVARIANT tier for regulator-facing controls). 12-week solo-dev plan tracked in /Users/rvk/.claude/plans/resume-idempotent-wreath.md. Per-phase implementation notes under .project/implementation/. ADR: .project/decisions.md § 2026-04-21.
Completed — Track 1 / P0 — library shelf + agent_defs.py shim#
Branch feat/p0-platform-refactor. Landed 2026-04-21 across 6 commits.
Added:
- Library-shelf root (/lib/) with nine category subdirectories and a generated-JSON-Schema sub-folder (lib/_schema/).
- ml_team/core/lib_schemas.py — seven Pydantic v2 manifest models (Agent, Teams, Tool, Workflow, Guardrail, Profile, Deployment) with extra='forbid', SemVer-pinned AssetRef grammar, enum vocabularies for tier / integration-point / guardrail category.
- scripts/gen_lib_schemas.py — regenerates JSON Schemas from Pydantic models. --check drift guard wired into the test suite.
- ml_team/core/lib_loader.py — runtime asset resolver with mtime-cached LibLoader class + module-level singleton, WORKER_SUFFIX auto-application to worker agents, pinned-ref version enforcement, CLI (python -m ml_team.core.lib_loader {validate,list}).
- lib/agents/base/<40 agents>/agent.yaml + lib/agents/teams/ml_pack_teams/teams.yaml — the entire ML agent set migrated out of Python dict literals.
- ml_team/tests/fixtures/agent_defs_snapshot.json — immutable 50 KB ground truth snapshot of pre-flip dict shapes.
- .project/implementation/01..05-*.md — per-phase implementation notebook.
- .project/training/modules/01-platform-architecture.md — first platform onboarding module.
- +173 new tests (39 schemas + 36 loader + 98 shim parity).
Changed:
- ml_team/config/agent_defs.py (906 → 147 lines): now a PEP-562 __getattr__ shim resolving AGENT_DEFS / TEAMS / AGENT_TIER_MAP / ALL_AGENTS / SUPERVISOR_AGENTS / WORKER_AGENTS lazily from the library shelf. WORKER_SUFFIX stays a literal to avoid circular imports.
Regression: 628 baseline → 800 passed, 2 skipped (environmental: Docker daemon, kubectl) with zero behaviour change.
Deferred to P1: pipeline/workflow YAML extraction + TOOL_SETS extraction — the pipeline YAML shape doesn't yet match WorkflowManifest (uses source/target edges, node_type, etc.). Will unify when composition work lands.
Completed — Track 1 / P1 — pipeline unification + agent composition#
Added:
- ml_team/core/agent_composer.py — customer agent composition primitive. compose_agent(spec) resolves extends: lib/agents/base/<id>@vX.Y.Z references, applies overlay semantics (field replacement, system_prompt_append/prepend/full-replace), correctly handles WORKER_SUFFIX across worker/supervisor boundaries, validates the composed result against AgentManifest. Includes ComposedAgent dataclass, exception hierarchy (AgentComposerError, ExtendsChainCycleError, ExtendsChainTooDeepError, InvalidComposedSpecError), defensive cycle + depth guards for future multi-tier templates.
- lib/workflows/{default_ml_pipeline,fast_prototype,parallel_research}/workflow.yaml — migrated pipeline manifests with lib-shelf metadata header (schema_version, kind, id, full SemVer version).
- scripts/migrate_pipelines.py — idempotent pipeline migration + drift-guard (--verify used in CI).
- ml_team/tests/test_agent_composer.py — 15 tests covering standalone + extends composition, three overlay modes, worker-suffix correctness, validation errors, module singleton.
- ml_team/tests/test_workflow_migration_parity.py — 8 tests proving the two path variants (legacy vs lib) produce structurally-equivalent PipelineStateGraph objects.
- .project/training/modules/03-agent-composition.md — training module on customer agent authoring.
Changed:
- ml_team/core/lib_schemas.py::WorkflowNode/WorkflowEdge refactored to match real pipeline shape (source/target edges — no more legacy from/to; node_type enum with stage/router/checkpoint; label, priority, metadata fields). lib/_schema/workflow.schema.json regenerated.
- ml_team/core/state_graph.py::PipelineStateGraph.from_dict now prefers data["id"] over data["name"] so lib-shelf manifests (where name is human-readable) load cleanly; legacy YAMLs continue to work.
- ml_team/backends/native_backend.py pipeline-config resolution order: explicit path → lib/workflows/<id>/workflow.yaml (preferred) → ml_team/config/pipelines/<id>.yaml (legacy fallback, one release) → hardcoded default.
Regression: 800 → 826 passing tests, 2 skipped. Zero failures.
Completed — Track 1 / P2 — Deployment loader + shim wiring#
Added:
- lib/templates/generic_ml/profile.yaml — minimal profile baseline (no pre-activations — deployments are the source of truth).
- deployments/_dev_scaffold/config.yaml — default canary deployment. Activates all 40 ML agents + 3 workflows → reproduces today's behaviour exactly.
- deployments/README.md — customer deployment layout + shipping flow docs.
- ml_team/core/deployment_loader.py — DeploymentLoader class + LoadedDeployment dataclass + module singleton + SWARM_DEPLOYMENT env-var resolver + error hierarchy.
- ml_team/tests/test_deployment_loader.py — 13 tests (resolution priority, _dev_scaffold integration load, fixture-library isolation, custom-agent override, cache semantics, error surfaces).
- .project/training/modules/04-deployments-and-activation.md — training module on deployment authoring.
Changed:
- ml_team/config/agent_defs.py shim: AGENT_DEFS / AGENT_TIER_MAP now source from the active deployment (via deployment_loader) instead of the full library shelf. Fall back to the library only if deployment resolution fails (missing config, CI bootstrap).
- ml_team/core/lib_schemas.py::DeploymentAgentOverride — expanded to match compose_agent overlay vocabulary (system_prompt / system_prompt_append / system_prompt_prepend) with extra='allow'; composed manifest still revalidated strictly against AgentManifest.
- lib/_schema/deployment.schema.json regenerated.
End-to-end demo:
SWARM_DEPLOYMENT=/path/to/3-agent-deployment python -c \
"from ml_team.config.agent_defs import AGENT_DEFS; print(sorted(AGENT_DEFS))"
# → 3 agents (customer composability live)
# No env var → default _dev_scaffold → 40 agents (today's behaviour)
Regression: 826 → 839 passing, 2 skipped. Zero failures.
Completed — Track 1 / P3 — Dashboard API-driven config#
Added:
- ml_team/api/routers/config.py — GET /api/v1/config/branding (unauth); BrandingResponse Pydantic model; merges profile defaults + deployment overrides into nav / demo presets / pipeline config / compliance badges.
- ml_team/tests/test_config_router.py — 8 tests (response shape, nav dedupe, pipeline enumeration, custom branding override, default fallback, unauth OK).
- ml_team/dashboard/src/lib/config.ts — typed branding client + BRANDING_FALLBACK for SSR resilience.
- ml_team/dashboard/src/components/config-provider.tsx — React context carrying SSR-fetched config.
Changed:
- ml_team/dashboard/src/app/layout.tsx — generateMetadata() + async RootLayout fetch branding server-side; wrap children in ConfigProvider.
- ml_team/dashboard/src/components/sidebar.tsx — dynamic lucide-react icon lookup; product name + subtitle + compliance badges sourced from useConfig().
- ml_team/dashboard/src/app/page.tsx — overview tagline from useConfig().
- ml_team/dashboard/src/app/pipelines/page.tsx — demo presets + pipeline config dropdown sourced from useConfig() (no more hardcoded preset arrays).
- ml_team/api/app.py — mount config_router unauth under /api/v1.
End-to-end proof: same Next.js binary renders different branding per SWARM_DEPLOYMENT. HDFC deployment with compliance badges + custom product name + BFSI pipeline options on the same dashboard code that renders "ML Team Agent" for _dev_scaffold.
Regression: 839 → 847 passing, 2 skipped. TypeScript type-check silent-success.
Completed — Track 1 / P4 — Profile loader + compliance gates + invariant-DENY floor#
Added:
- ml_team/core/profile_loader.py — ProfileLoader + LoadedProfile reading permission_baseline.yaml, compliance_gates.yaml, retention.yaml siblings of a profile manifest.
- ml_team/core/compliance_gates.py — run-scoped gate verdict cache + record_tool_result / check_blocked hooks + safe-builtins deny_if evaluator + placeholder resolver for ${last_tool_result.X} / ${run_context.Y}.
- ml_team/core/lib_schemas.py — PermissionRuleSpec, PermissionBaselineManifest, ComplianceGateSpec, ComplianceGatesManifest, RetentionOverridesManifest.
- ml_team/core/permission_sources.py: profile_source (profile DENY → priority 60, ASK → 45) + compliance_gate_source (gate verdicts → priority 55 DENY).
- ml_team/tests/test_profile_loader.py (8 tests), test_compliance_gates.py (13 tests), test_profile_permission_sources.py (13 tests).
Changed:
- ml_team/core/permissions.py: INVARIANT_DENY_PRIORITY_FLOOR = 60 — DENY rules at/above this priority beat ALLOW rules. Profile DENYs are now regulator-facing controls that operator POLICY ALLOW cannot override.
- ml_team/core/deployment_loader.py: LoadedDeployment.profile carries the resolved based_on profile; compliance gates activated at deployment-load time.
Engine behaviour change: pre-P4 resolution was strictly ALLOW > DENY > ASK. Post-P4 an invariant-DENY first pass runs before ALLOW — only fires for priority ≥ 60. No rule before P4 had priority > 50 so the behaviour is backward compatible.
Regression: 847 → 876 passing, 2 skipped (+29 new tests). One test-isolation fix applied (teardown uses mark_uninitialized() instead of leaving engine blank).
Completed — Track 1 / P5 — BFSI baseline template#
Added:
- lib/templates/bfsi_baseline/profile.yaml — inherits generic_ml, sets branding (Swarm BFSI Edition) + compliance badges (RBI FREE-AI).
- lib/templates/bfsi_baseline/permission_baseline.yaml — 5 rules: DENY execute_shell, DENY execute_python with subprocess|os.system|eval|__import__ regex, ASK register_model_deployment, DENY write_file with PAN regex, DENY write_file with Aadhaar regex.
- lib/templates/bfsi_baseline/compliance_gates.yaml — 2 runtime gates: fairness (demographic-parity > 10% blocks deploy) + drift-baseline (PSI > 0.25 blocks deploy).
- lib/templates/bfsi_baseline/retention.yaml — 2555-day (7-year) retention on all four artefact classes (RBI default).
- lib/templates/bfsi_baseline/README.md — compliance-citation matrix mapping every control to RBI FREE-AI / HIPAA / GDPR / EU AI Act / SOC 2 / OWASP LLM clauses.
- ml_team/tests/test_bfsi_baseline_e2e.py — 11 end-to-end tests: profile load, each DENY/ASK rule at engine level, invariant-DENY vs operator POLICY ALLOW, biased classifier blocked by fairness gate, unbiased classifier unblocked, branding surfacing.
Regression: 876 → 887 passing, 2 skipped.
Track 1 complete — 628 → 887 passing tests, zero regressions across 6 phases.#
Completed — Track 2 / G-runtime — Guardrails runtime scaffold#
Added:
- ml_team/core/hooks.py — 5 new HookEvent values (PRE_LLM, POST_LLM, STORAGE_WRITE, LLM_CALL_WRAPPER, AGENT_DELEGATE) preserving the original 5.
- ml_team/core/guardrails/ — runtime package with:
- types.py: IntegrationPoint enum (7 values), GuardrailOutcome (ALLOW/REDACT/DENY/ERROR), Severity, GuardrailResult dataclass.
- registry.py: thread-safe GuardrailRegistry with module singleton + @register(integration_point, id, priority) decorator.
- evaluator.py: priority-ordered multi-guardrail pass per integration point. ALLOW chain / REDACT thread / DENY short-circuit / ERROR fails open unless invariant. Returns EvaluationReport.
- metrics.py: Prometheus counters (guardrail_triggered_total, guardrail_bypass_attempts_total) + duration histogram. Null-stub when prometheus_client absent.
Completed — Track 2 / G3 — Prompt-injection heuristic (PRE_LLM)#
Added:
- lib/guardrails/input_safety/prompt_injection_heuristic/{guardrail.yaml,patterns.yaml,enforce.py} — 12 high-severity + 13 medium-severity patterns covering chat-template tokens, "ignore previous instructions" family, developer-mode jailbreaks, role-play preambles, base64 blobs >200 chars. Block on 2+ medium or 1+ high.
- Registered at PRE_LLM priority 70. Uses re.findall to count occurrences, not distinct pattern hits — "bypass X and ignore Y" correctly counts as 2 mediums.
- ml_team/tests/test_guardrail_prompt_injection.py — 30+ tests.
Completed — Track 2 / G6 — Logs credential filter#
Added:
- lib/guardrails/persistence/logs_credential_filter/{guardrail.yaml,enforce.py} — CredentialScrubFilter(logging.Filter) with 13 known-secret patterns (anthropic before openai via negative lookahead sk-(?!ant-)), Shannon-entropy fallback for unknown high-entropy tokens ≥32 chars. install()/uninstall() attach at root logger.
- ml_team/core/logging_config.py — attaches the filter at startup via importlib.util.spec_from_file_location with unique sys.modules name (avoids collisions with other guardrails' enforce.py).
Completed — Track 2 / G10 — Delegation-loop detector (AGENT_DELEGATE)#
Added:
- lib/guardrails/platform_integrity/delegation_loop_detector/{guardrail.yaml,enforce.py} — per-run state _state[run_id]._RunState(stack, total). Detects (a) same (agent, args_hash) already on stack, (b) depth > max_depth (default 5), (c) fan-out > max_delegations_per_run (default 50). Two modes: strict (agent+args_hash) or name_only.
- Exposes pop_delegation(run_id), reset_run, reset_all.
Completed — Track 2 / G7 — Per-user rate limit#
Changed:
- ml_team/api/rate_limit.py — full rewrite. Composite key (caller_identity, endpoint_class). Identity precedence: X-API-Key (SHA-256[:12]) → JWT sub claim → client IP. Per-role limits via env vars (RATE_LIMIT_ROLE_<ROLE>_READS/WRITES). Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Role.
Completed — Track 2 / G16 — HITL timeout + escalation#
Added:
- ml_team/core/approval.py — extended ApprovalGate with 4 optional fields: ttl_seconds, escalate_after_seconds, escalated_at, notify_channels. Tolerant from_dict for older on-disk gates.
- ml_team/core/hitl_sweep.py — pure sweep(store, notifier, now) function. Returns SweepReport(expired_gate_ids, escalated_gate_ids, swept_at). Escalation fires before expiry in the same sweep (documented ordering). _safe_notify swallows callback exceptions.
Completed — Track 2 / G14 — Data lineage#
Added:
- ml_team/api/database.py — three tables (datasets, lineage_models, lineage_deployments) with enforced foreign keys. ON DELETE CASCADE on model→deployments; ON DELETE SET NULL on dataset→models.
- ml_team/core/lineage.py — make_dataset_id(uri, sha256) deterministic; record_dataset/model/deployment idempotent upserts. Queries: chain_for_deployment, subjects_in_dataset, models_for_dataset, deployments_for_model, retire_deployment.
- lib/guardrails/platform_integrity/data_lineage/guardrail.yaml — manifest.
Completed — Track 2 / G15 — Right-to-be-forgotten#
Added:
- lib/guardrails/persistence/right_to_be_forgotten/guardrail.yaml — manifest documenting tombstone semantics.
- ml_team/core/rtbf.py — ErasureReceipt dataclass with SHA-256 signature over sorted JSON payload (sign/verify round-trip). erase_subject(subject_id, requested_by, conversation_roots, dry_run): walks subjects_in_dataset → collects downstream models+deployments → deletes datasets (FK cascades) → rewrites JSONL lines matching re.escape(subject_id) to a structured [SUBJECT_DELETED] tombstone preserving line order + count.
- ml_team/api/routers/subjects.py — GET /api/v1/subjects/{id}/preview (dry-run) + DELETE /api/v1/subjects/{id}. Admin-only via Depends(require_role(Role.ADMIN)). Returns {dry_run, receipt} with signed receipt.
- ml_team/tests/test_guardrail_rtbf.py — 15 tests covering receipt crypto + tamper detection, lineage walk + empty-subject, dry-run no-mutation, dataset FK cascade, JSONL tombstone preservation, regex-metachar escaping, multi-file/multi-root walk, HTTP preview + delete + blank-id rejection.
Changed:
- ml_team/api/app.py — registers subjects.router with the standard _auth dependency (verify_api_key); per-endpoint RBAC inside the router.
Completed — Track 2 / G1 — Egress allowlist (in-process)#
Added:
- lib/guardrails/network/egress_allowlist/guardrail.yaml — manifest (tier=flag) documenting allow_hosts + allow_patterns + block_private_networks + block_schemes surface.
- ml_team/core/egress_allowlist.py — EgressConfig + configure()/reset() + URL extractor (recursive walk of ctx.arguments) + host classifier (exact match, *.suffix, fnmatch glob, RFC1918/loopback/link-local/ULA check via ipaddress). egress_source(ctx) is the permission-engine rule source.
- Registered in ml_team/core/permission_sources.py::init_default_sources alongside the existing 7 sources. Emits DENY rules with source="egress_allowlist"; denials land in permission_denials + permission_denials_total{source="egress_allowlist"} metric automatically.
- ml_team/tests/test_guardrail_egress_allowlist.py — 28 tests: config lifecycle, private-network block (8 parametrized IPs incl. AWS IMDS 169.254.169.254, IPv6 ::1, ULA fd00::1), suffix + fnmatch patterns, exotic-scheme block, deep-nested URL extraction, dedupe, prose-false-positive guard, engine integration.
Design notes:
- Module unconfigured = no-op (default ML-team deployment keeps working).
- Literal-string hostname matching — no DNS resolution here (SSRF via DNS rebind is the mitmproxy sidecar's job, landing in a follow-up phase).
- block_private_networks=True overrides allow-list hits (catches operator misconfig + classic SSRF targets).
Completed — Track 2 / G2 — Python execution sandbox (driver abstraction)#
Added:
- lib/guardrails/execution/python_sandbox/guardrail.yaml — manifest (tier=flag, category=execution). Documents nsjail / docker / subprocess driver surface + config schema + the honest ceiling (nsjail blocks syscall escape, not app-logic bugs in allowed imports).
- ml_team/core/python_sandbox.py — SandboxDriver Protocol + three concrete drivers:
- SubprocessDriver — portable fallback, same behaviour as today's execute_python.
- NsjailDriver — builds nsjail CLI with seccomp deny-list + user ns + RO rootfs + net ns. Fails closed on non-Linux or missing binary.
- DockerDriver — throwaway python:3.11-slim container with --network=none, --read-only, --rm. Probes daemon at check_available.
- configure(driver, strict, memory_mb, cpu_time_sec, allow_network, read_only_rootfs, allow_paths) — module-level config. strict=True + missing driver → DriverUnavailable propagates out of configure, failing API boot (BFSI guarantee: no silent degrade).
- run_python(code, timeout, cwd, env) — dispatch entrypoint. Stripped env by default (PATH/HOME/LANG/LC_ALL/PYTHONPATH + PYTHONDONTWRITEBYTECODE).
- ml_team/tests/test_guardrail_python_sandbox.py — 17 tests: config lifecycle, strict vs non-strict fallback, driver availability checks (nsjail-non-linux, missing binaries), actual subprocess execution + stderr capture + timeout handling, execute_python integration preserving pre-G2 output shape when unconfigured.
Changed:
- ml_team/tools/execution.py::execute_python — +40 lines of opt-in pre-block. When python_sandbox.is_configured(), routes through run_python() and emits a driver field on the response; otherwise falls through to the pre-G2 direct-subprocess path byte-for-byte unchanged.
Design notes:
- Default deployment unaffected — is_configured() is False, legacy path runs. Refactor-first discipline: zero existing tests touched.
- Protocol-based driver contract — adding a Firecracker / gVisor driver later is two methods + one registration line.
- CLI-based nsjail/docker invocation (not Python bindings) — greppable in ops logs, portable, matches how teams already run the tools.
Completed — Track 2 / G4 — PII detection (regex + Presidio interface)#
Added:
- ml_team/core/pii/ — shared core package:
- types.py: PiiEntity enum (12 entity types incl. Indian BFSI set), PiiFinding dataclass, AnonymizeAction enum (redact/mask/hash).
- regex_detectors.py: 12 detectors with structural validation — Luhn checksum for credit cards, Verhoeff for Aadhaar. Overlap resolver prefers higher confidence, then longer span.
- anonymizer.py: right-to-left applier that substitutes findings via the three action modes.
- presidio_shim.py: lazy-import wrapper for Microsoft Presidio; raises PresidioUnavailable(ImportError) when the package isn't installed. Entity-name bridge maps Presidio's output to our taxonomy.
- lib/guardrails/pii/regex_pii/{guardrail.yaml,enforce.py} — always-on baseline. Registers at POST_LLM, POST_TOOL, STORAGE_WRITE at priority 60. Supports bare-string + dict payload shapes. configure(entities=[...], action=...) is the profile hook.
- lib/guardrails/pii/presidio_pii/guardrail.yaml — manifest (tier=flag, unregistered by default — opt-in via profile when the customer has Presidio installed).
- ml_team/tests/test_guardrail_pii.py — 29 tests: all 12 detectors positive, Luhn/Verhoeff drop invalid matches, overlap resolution, action-mode round-trips, enforce module registration + payload shapes + configure.
Design notes:
- Default deployment unaffected — legacy core/guardrails.py::check_output remains in place; regex_pii is additive. Removal planned in a follow-up phase after customer sign-off.
- Presidio's 300 MB weight keeps it opt-in. BFSI profile will activate; generic won't.
- Verhoeff on Aadhaar dramatically reduces false positives — raw 12-digit timestamps don't validate.
Completed — Track 2 / G5 — Conversation JSONL scrubber#
Added:
- lib/guardrails/persistence/conversation_scrubber/guardrail.yaml — manifest (tier=flag) documenting scrub-at-flush semantics + the on-disk-only-redaction tradeoff (no unredacted original is kept anywhere).
- ml_team/core/conversation_scrubber.py — module-level ScrubberConfig + configure(active, entities, action, use_presidio, tag_redacted_lines) + scrub_line(raw). Recursive walk over dict + list values; leaf strings go through scan_regex + apply_findings from the G4 core. Presidio runs too when use_presidio=True AND the package imports successfully; falls back to regex-only with a single WARN otherwise. Non-JSON lines pass through untouched; zero-mutation lines return the original raw byte-identical.
- ml_team/tests/test_guardrail_conversation_scrubber.py — 16 tests: lifecycle, field-level scrubbing (content + nested metadata + list values), clean-line byte-preservation, entity filter, action modes, _redacted tag toggle, Presidio-fallback WARN, ConversationStore integration (active → scrubbed on disk, inactive → byte-identical).
Changed:
- ml_team/core/conversation_store.py::_flush_locked — +11 lines passing each buffered line through conversation_scrubber.scrub_line before disk write. Defensive try/except ensures a scrubber bug never breaks flush. Fixed a subtle shadow-vs-held-reference bug: the flush now writes from the local scrubbed list while clearing self._buffers[agent_name] explicitly.
Design notes:
- Scrub at flush, not at record — live WebSocket streams see unredacted content (operator's live view); JSONL on disk sees redacted (regulator's audit view). Intentional split.
- Default deployment unchanged — scrubber inactive unless the profile opts in. BFSI/HIPAA profiles will activate at bootstrap.
- _redacted: true tag on mutated lines so audit dashboards can highlight them without diffing against a pre-redacted version (which doesn't exist).
Completed — Track 2 / G11 — Audit PDF signing (interface + 4 drivers)#
Added:
- lib/guardrails/integrity/audit_pdf_signing/guardrail.yaml — manifest (tier=flag) documenting the 4 driver options + verification surface (the cosign verify-blob --certificate-identity=... command the customer runbook will reference).
- ml_team/core/audit_signer.py — SignerDriver Protocol + 4 drivers:
- StubDriver — deterministic STUB_SIG_<sha256[:16]> for tests + early dev; no crypto.
- OfflineEd25519Driver — pure Python via cryptography (lazy import). For air-gapped deployments; key stays on the build host.
- CosignKmsDriver — shells out to cosign sign-blob --key=<kms_key_uri>. Default for BFSI (AWS/GCP/Vault Transit).
- CosignKeylessDriver — Cosign + Sigstore + Rekor. Default for SaaS tenants. Parses Rekor log index from cosign stderr.
- SignatureReceipt dataclass — persisted as {pdf}.signature.json sidecar so downstream verifiers have one canonical source.
- configure(driver, strict, ...) matches G2's pattern — strict=True + unavailable driver → SignerUnavailable propagates, aborting API boot (BFSI no-silent-degrade guarantee). strict=False falls back to stub with a WARN.
- ml_team/tests/test_guardrail_audit_signer.py — 18 tests: config lifecycle, strict vs fallback, all 4 driver availability probes, stub round-trip + determinism, receipt serialisation, export_audit_report integration.
Changed:
- ml_team/tools/audit_pdf.py::export_audit_report — +21 lines post-render. When audit_signer.is_configured(), calls sign_pdf(output_path) and adds signature to the response JSON. Signing failures WARN but don't break PDF export (default PDF still writes).
Design notes:
- Four drivers share one Protocol — adding a 5th (HSM smart card, KMIP) is two methods + one _build_driver branch.
- Sidecar receipt pattern (.sig + .pem + .signature.json) keeps verification one-command-per-artifact. Customer runbook reads the JSON and runs the command it specifies.
- Stub ships in production tree (not tests/) so operators can wire it during integration testing without test-only code. Manifest clearly tags it test-only.
Completed — Track 2 / G12 + G13 — Encryption at rest + BYOK (KeyProvider)#
Added:
- lib/guardrails/persistence/encryption_at_rest/guardrail.yaml — manifest (tier=flag) covering both BYOK surface + at-rest driver selection, with the whitepaper-grade threat-model callout ("cold DB file or disk image, NOT RCE on running API").
- ml_team/core/encryption.py — one module for both G12 and G13:
- KeyProvider Protocol — name, kek_uri, check_available, wrap(plaintext_dek, context), unwrap(wrapped_dek, context).
- Four provider implementations: StubProvider (deterministic base64, tests only), EnvKeyProvider (AES-GCM-256 via SWARM_KEK env var, dev), AwsKmsProvider (shells out to aws kms encrypt/decrypt with encryption-context), plus fail-fast gcp_kms / vault_transit branches that surface a clear "not implemented" message.
- WrappedDek + Ciphertext dataclasses with to_dict/from_dict — JSON-serialisable end-to-end.
- envelope_encrypt(plaintext, context) / envelope_decrypt(ct, context) — AES-GCM-256 with per-call DEK. AAD = sorted context dict. Decrypting with mismatched context fails AEAD integrity.
- configure(provider, kek_uri, at_rest_driver, context, strict) — same strict=True fail-closed pattern as G2/G11. at_rest_driver ∈ {sqlite_host_fs_only, sqlcipher, postgres_pgcrypto}; actual DB migration is customer-scheduled (not automatic — irreversible).
- ml_team/tests/test_guardrail_encryption.py — 21 tests: config lifecycle, provider availability probes (stub, env, aws_kms, gcp_kms, vault_transit), strict vs fallback, envelope round-trip, context AAD mismatch, serialisation, JSON transport.
Design notes:
- aws CLI rather than boto3 — keeps default deployment dep footprint clean, matches how ops teams invoke AWS KMS manually.
- Env provider is explicitly dev-only and documented as such.
- Stub wraps via base64 (not identity) so tests can't mask production encryption bugs.
- Module is unconfigured by default — no integration into ConversationStore / DB writes yet. Profile bootstrap wiring lands with G1/G2/G4/G5/G11 plumbing in a follow-up phase.
Completed — Track 2 / G17 — SBOM + signed commits (CI gate)#
Added:
- .github/workflows/release-supply-chain.yml — release-tag workflow that enforces: (1) signed commits in the release range via git log --pretty=format:%G? filtering (anything not G/U fails), (2) CycloneDX SBOM generation, (3) Cosign-keyless signing of tarball + SBOM with GitHub OIDC, (4) base-image signature verification for python:3.12-slim (WARN on failure — upstream policy out of our control).
- scripts/gen_sbom.py — stdlib-only wrapper around cyclonedx-py CLI. Probes both the console-script + python -m cyclonedx_py fallback. Exits with install hint when the dep is missing.
- lib/guardrails/platform_integrity/sbom_signed_commits/guardrail.yaml — manifest documenting the CI gate + customer verification surface.
- SECURITY.md — new "Supply-chain integrity (G17)" section with the three cosign verify-blob / jq / cosign verify commands a CISO runs to reproduce the chain offline via Sigstore Rekor.
- ml_team/tests/test_guardrail_sbom.py — 11 tests: script importability + --help, missing-cyclonedx-py exit path, workflow YAML validity, step-presence assertions (signed-commit check, SBOM generation, Cosign sign of tarball + SBOM, base-image verify).
Design notes:
- CI-only — no runtime Python code. Regression surface is entirely test-level assertions against the YAML + helper script.
- Keyless Sigstore over GPG-keyed signatures — Rekor transparency log gives customer CISOs non-repudiation without us managing long-lived keys. BFSI customers who want offline verification outside Sigstore re-sign with G11's cosign_kms driver using their own KMS.
- CycloneDX 1.5 over SPDX — better Python tooling (cyclonedx-bom actively maintained). SPDX via conversion if a customer asks.
Track 2 COMPLETE — 887 → 1217 passing tests (+330), all 15 in-scope guardrails landed.#
Track 2 summary#
| # | Guardrail | Phase | Tests added | Commit |
|---|---|---|---|---|
| G-runtime | Runtime scaffold (registry + evaluator + hook events) | 12 | — | (wave 1) |
| G3 | Prompt-injection heuristic | 13 | — | (wave 1) |
| G6 | Logs credential filter | 14 | — | (wave 1) |
| G10 | Delegation loop detector | 15 | — | (wave 1) |
| G7 | Per-user rate limit | 16 | — | (wave 1) |
| G16 | HITL timeout + escalation | 17 | — | (wave 1) |
| G14 | Data lineage | 18 | +22 | 9c6535c |
| G15 | Right-to-be-forgotten | 19 | +15 | 33fd65d |
| G1 | Egress allowlist (in-process) | 20 | +28 | e0da426 |
| G2 | Python execution sandbox | 21 | +17 | 4377577 |
| G4 | PII detection (regex + Presidio) | 22 | +29 | bf71daa |
| G5 | Conversation scrubber | 23 | +16 | cd26d55 |
| G11 | Audit PDF signing | 24 | +18 | b4cd0b5 |
| G12+G13 | Encryption at rest + BYOK | 25 | +21 | c483f09 |
| G17 | SBOM + signed commits | 26 | +11 | (this commit) |
Completed — Ship tooling — swarm deploy CLI#
Added:
- ml_team/deploy/__init__.py — package re-exports (scaffold, validator, manifest, ship, whitepaper).
- ml_team/deploy/scaffold.py — new_deployment(customer, template) writes deployments/<customer>/config.yaml + README + branding/knowledge placeholders. Customer name validation (snake_case, 2-40 chars), three templates (generic_ml / bfsi_baseline / hipaa_baseline), overwrite protection.
- ml_team/deploy/validator.py — validate(customer) lints: missing / unparseable config, customer-name mismatch, bad lib ref format, missing lib assets on disk. Returns ValidationReport with error/warning findings.
- ml_team/deploy/manifest.py — build_manifest(customer) produces a MANIFEST.yaml with swarm_core_version, swarm_lib_manifest (every activated lib ref + pinned version), deployment_config_sha256, build_timestamp, build_commit, build_host. Sort-keyed YAML for determinism.
- ml_team/deploy/ship.py — ship(customer, version) produces dist/<customer>-vX.Y.Z/<customer>-vX.Y.Z.tar.gz + MANIFEST.yaml. Build-time positive-list filter includes ml_team/, lib/, scripts/, top-level files + deployments/<customer>/ + deployments/_dev_scaffold/; excludes every other deployments/<other>/, .git/, .venv/, __pycache__/, ml_team/tests/, pipeline_runs/, node_modules/. Validation runs by default; --skip-validate for emergencies.
- ml_team/deploy/whitepaper.py — generate(customer) produces a 5-section security whitepaper (Deployment Summary · Active Guardrails table · Compliance Mapping · Verification commands · Residual Risks honesty callout). Markdown output. BFSI / HIPAA / generic templates drive framework selection.
- ml_team/cli.py — +84 lines: new deploy subparser with four subcommands (new, validate, ship, whitepaper) + handler functions that emit JSON.
- ml_team/tests/test_deploy_cli.py — 24 tests: scaffold happy path + name validation + overwrite guard, validator with 5 failure modes + happy path, manifest lib-ref extraction + deterministic YAML + missing-git fallback, ship build-time filter verified at tar-member level (other customers excluded, _dev_scaffold + customer included), whitepaper 5-section generation + BFSI framework selection, CLI argparse tree + dispatch to handlers.
End-to-end smoke test:
python -m ml_team.cli deploy new alice_corp --template=bfsi_baseline
python -m ml_team.cli deploy validate alice_corp
python -m ml_team.cli deploy ship alice_corp --output=./dist
# → dist/alice_corp-v0.1.0/{alice_corp-v0.1.0.tar.gz, MANIFEST.yaml, alice_corp_whitepaper.md}
Design notes:
- Positive-list filter — every new customer is isolated by default. Adding customers doesn't require exclude-list maintenance.
- No Cosign shell-out from the CLI — signing is the CI pipeline's job (G17's release workflow or G11's audit_signer called manually). Keeps the local dev loop painless.
- Whitepaper is Markdown, not PDF — grep-able, diffable, versionable. PDF rendering via tools/audit_pdf.py's ReportLab layer is a follow-up when a customer asks for formal branding.
Ship tooling status — 1217 → 1241 passing tests (+24), swarm deploy CLI operational.#
12-week plan — END STATE#
- Track 1 (P0-P5): 6 phases · zero regressions · customer-composable codebase (628 → 887 tests)
- Track 2 (G1-G17): 9 waves · 15 guardrails · BFSI-ready security posture (887 → 1217 tests)
- Ship tooling: 1 phase · per-customer tarball + MANIFEST + whitepaper (1217 → 1241 tests)
- 628 → 1241 tests (+613), zero regressions across the entire arc
- Every commit signed
TheAiSingularity <singularitytheai@gmail.com>, no Co-Authored-By trailers
Planned — post-plan follow-ups#
profile_loaderbootstrap wiring for the 7configure()hooks (G1, G2, G4, G5, G11, G12+G13, conversation_scrubber) — one consolidated commit, ~40 lines- First real release tag
v0.12.0— will shake out 1-2 CI release-workflow adjustments (OIDC permissions, tag parsing) swarm deploy diff+swarm deploy rotate-secretsubcommands (Doppler-wired)- mitmproxy sidecar for G1's second ring (when a customer's infra team requires network-level egress enforcement)
- GCP / Vault Transit KeyProvider implementations (when a customer demands)
- PDF whitepaper rendering via
tools/audit_pdf.pyReportLab layer (when a customer requests formal branding)
Planned — Ship tooling#
swarm deployCLI (new,validate,dev,ship,diff,rotate-secret)- Doppler secrets per customer · Cosign signing · SBOM generation · MANIFEST.yaml · per-customer security whitepaper PDF
Prior roadmap (superseded where overlapping)#
- Pilot polish for BFSI: screencast demo, single "compliance architecture" doc for CISO review, scripted auditor-reply flow → rolled into P5 (BFSI baseline template) + auto-generated security whitepaper
- Plugin compat Phase 2 (if marketplace adoption grows):
prompt/agent/httphook command types; 10+ additional CC hook events;team_factoryintegration of plugin-contributed agents → still relevant; slotted after Track 2 - Async orchestration core (only with a ≥4-concurrent-pipelines customer signal) → deferred, customer-signal-gated
- Postgres migration path for multi-node deployments → absorbed into G12 (encryption at rest) for BFSI path
[0.11.0] — 2026-04-20 — Claude Code marketplace plugin compatibility#
Commits: dd44c53 → 65438bc (5 commits — one per phase)
Added#
- Phase A — Drop telemetry (
dd44c53):PluginInstallDropsdataclass +install_drops_jsonSQLite column record every plugin surface seen on disk but not registered. Silent skips inhooks.load_from_pluginupgraded from DEBUG to WARNING. Drops payload surfaces inGET /api/v1/plugins/{name}. - Phase B — Shell-command hooks (
34b2fde):ml_team/core/shell_hook_runner.pyexecutes CC's{"type": "command", ...}hooks behind the newplugin_shell_hooks_enabledfeature flag (EXPERIMENT tier, default OFF). Security model: invoke-time validation reusingrun_bashallowlist,${CLAUDE_PLUGIN_ROOT}substitution, scrubbed env, rlimits on Linux, hard timeout (10s default, 60s max), per-execution audit row in newplugin_shell_executionsSQLite table. Exit 2 blocks; JSON stdout{"mutation": {...}}lifted into HookResult. - Phase C —
commands/directory (f405eab):ml_team/core/commands_registry.pyscanscommands/*.md, registers each with optional$ARGUMENTSsubstitution. REST atGET /plugins/commands+POST /plugins/commands/{qname}/invoke. Feature flagplugin_commands_enabled(FLAG tier, default ON). - Phase D —
agents/directory (65438bc):ml_team/core/agents_registry.pyscansagents/*.md, forcesplugin-{name}::{agent}namespacing so no plugin can shadow a built-in AGENT_DEFS. REST atGET /plugins/agents[?plugin=]+GET /plugins/agents/{qname}. Feature flagplugin_agents_enabled(FLAG tier, default ON). - Phase E — Smoke + docs:
test_plugin_compat_smoke.pyinstalls realsuperpowersv5.0.7 from the CC cache and asserts 100% surface registration (14 skills + 1 shell hook + 3 commands + 1 agent, zero silent drops). Automatically skipped in CI when the cache isn't present.
Changed#
hooks.load_from_pluginnow parses CC's nested{matcher, hooks: [{type: command|python, ...}]}shape (Phase B).scan_install_dropsupdated phase by phase:commandno longer counts as an unsupported type (Phase B);commands/+agents/dirs no longer count as drops (Phases C + D).unsupported_hook_typesis now strictlyprompt/agent/http.PluginInstallationdataclass +_row_to_installation+_save_installationall carry the drops payload.
Security#
- All shell-hook execution is feature-flag gated, default OFF. Install-time whitelist still applies + runtime command validation + BFSI-grade audit trail.
Tests#
- +59 new:
test_plugin_install_drops.py(13),test_plugin_shell_hooks.py(14),test_plugin_commands.py(16),test_plugin_agents.py(14),test_plugin_compat_smoke.py(2). - Regression: 603 passing (was 545), 1 skipped, 0 failing.
Empirical result#
Installing superpowers v5.0.7 before this cycle: 14/14 skills + 0/1 hooks + 0/3 commands + 0/1 agents = ~25% surface retention.
After: 14/14 + 1/1 + 3/3 + 1/1 = 100% surface retention, 0 silent drops.
[0.10.2] — 2026-04-20 — Documentation Phase 2#
Commit: 4ad7a9b
Added#
ml_team/tools/{IMPLEMENTATION,LEARNING}_README.mdml_team/backends/{IMPLEMENTATION,LEARNING}_README.mdml_team/config/{IMPLEMENTATION,LEARNING}_README.mdml_team/dashboard/{IMPLEMENTATION,LEARNING}_README.mdml_team/tests/IMPLEMENTATION_README.md(IMPL-only by design — tests are their own doc)
Changed#
.github/workflows/doc-drift.yml— advisory CI guard now covers 7 subsystems (Phase 1 + Phase 2)
Rationale#
Closes the two-layer doc rollout. hello-swarm deliberately stays on the Phase-1 plugin-README shape (plugins aren't subsystems).
[0.10.1] — 2026-04-20 — Documentation Phase 1#
Commit: f65c27c
Added#
MASTER_README.mdat repo root — product + system source of truthml_team/core/{IMPLEMENTATION,LEARNING}_README.mdml_team/api/{IMPLEMENTATION,LEARNING}_README.mdml_team/api/routers/{IMPLEMENTATION,LEARNING}_README.md.github/workflows/doc-drift.yml— advisory CI guard for documented subsystems
Rationale#
New-engineer ramp-up was measured in days. Two-layer model — IMPL (engineering contract) + LEARNING (conceptual) — cuts it to hours while giving regulators a stable doc surface to quote from.
[0.10.0] — 2026-04-20 — Week 7: Compliance + Ops pack#
Commits: 8956263 → 1a1a0c1 (7 commits, +113 tests; 432 → 545 green)
Added#
- W7-1 Unified permission engine (
8956263,72b6c3c) ml_team/core/permissions.py— ALLOW > DENY > ASK > default pipeline with glob tool matching, optional arg regex, priority tiebreak, lazy initml_team/core/permission_sources.py— 5 default sources: RBAC, agent allowlist, feature flag, HITL, YAML policyml_team/core/permission_audit.py— SQLitepermission_denialspersistence +ml_team_permission_decisions_totalmetricml_team/api/routers/permissions.py—GET /api/v1/permissions/denials?since=&tool=&agent=ml_team/config/permission_policies.yaml— operator-authored rules (empty default)- W7-2 Hook lifecycle (
e6e6268) ml_team/core/hooks.py— 5 events: SESSION_START, PRE/POST_TOOL, PRE/POST_COMPACTION- AgentRunner integration; plugin-loader ingestion of
hooks/hooks.json - Reference PII-mask handler in
examples/plugins/hello-swarm/ - W7-3 Cron scheduler (
6d1f17e) — vendored from Hermes ml_team/core/cron.py+cron_tasks.py— 4 task kinds (retrain / drift_check / audit_pdf / custom)- File-backed store at
~/.swarm/cron/jobs.json, 60s daemon tick - REST at
/api/v1/cron/*,/crondashboard page,swarm cronCLI subcommand - W7-4 Batch runner (
aa4e87a) — vendored from Hermes ml_team/core/batch.py+batch_processors.py— JSONL → inference / echo / custom processors- Checkpoints every 10 records, streams
results.jsonl, resume-on-restart - REST at
/api/v1/pipelines/{run_id}/batch
Changed#
ToolExecutor.execute+CompositeToolExecutor.execute+require_role+require_approvalall route through the permission enginefeature_flags.py— addedhooks_enabled,cron_scheduler,batch_runnerapi/database.py::init_db()— addspermission_denialstable
Fixed#
- Cron first-run sentinel flake: interval schedules now fire immediately on boot (previously drifted past the 60s tick)
- Cron output filename collisions under sub-second job runs (microsecond precision in filename)
Docs#
- ADRs for all four W7 items in
.project/decisions.md(3b0b18c) /transparencydashboard refresh with denial panel
[0.9.0] — 2026-04-20 — Context compaction + evaluator separation#
Commits: fc6c188, d65e305
Added#
ml_team/core/context_compaction.py— summarise oldest middle-messages at 80% context window; mechanical fallback on summariser failureml_team/core/evaluator.py— clean-context grade on agent terminal response with 0–5 score + verdict-override
[0.8.0] — 2026-04-20 — Plugin ecosystem + MCP streamable-HTTP + CLI#
Commits: 54007f3, 1c1a42f, 3fa5aa5, ac7c931, 00db188, f3e018c, a6d1bd7
Added#
ml_team/cli.py—swarmCLI: auth, features, pipelines, deploymentsml_team/core/plugin_loader.py(Phase A) — Claude Code plugin install/uninstall/reload with SHA-256 manifest pinning;.mcp.jsoningestion- Plugin skill ingestion (Phase B) —
SKILL.mdparsing, keyword match, system-prompt injection - Streamable-HTTP (SSE) MCP transport — spec 2025-11-25
/pluginsdashboard page + expanded/transparency- Intra-agent parallel tool dispatch (2.9× speedup on multi-tool turns)
Fixed#
- Critical:
ApprovalRequirednow propagates throughToolExecutor(was being swallowed by a broadexcept Exception). HITL gates now fire reliably.
[0.7.0] — 2026-04-20 — Perf + feature flags + retention + transparency#
Commits: b7c82a6 → 8f972bb, plus bd330bc, d5fb344, 335660a, 3da2a4a, 96a0fd1, 54007f3
Added#
ml_team/core/feature_flags.py— central registry with 3 tiers: INVARIANT / FLAG / EXPERIMENT; resolution order runtime → env → alias → defaultml_team/core/retention.py— daemon that prunes conversation JSONL, run_events rows, audit PDFs, shadow predictions past TTL/settingsadmin controls +/transparencyread-only flags + metrics catalogue- Micro-benchmark harness + frozen baseline + nightly bench workflow
Changed#
- Week-1 perf: shared HTTP pool + shared schema cache + Anthropic prompt caching
- Batched SQLite event writes (6.5× speedup) + per-agent JSONL buffers (3.9× speedup)
[0.6.0] — 2026-04-20 — Week 2: Dashboard UI + CI + docs browser#
Commits: 7142f24 → 8152239, fb263eb, 55eeec4, 84cac56, 9e5799f, de3709b, 32dc01a
Added#
- Login page + deployments view + auth context + OIDC helpers in the dashboard
- In-app docs browser + sidebar login/logout footer
- PR workflow + nightly real-LLM golden path
- System-design brief + extending guide wired into in-app browser
Fixed#
- CI blockers — broken build-backend, ruff import-sort cleanup
- Dashboard: missing
highlight.jsdep for docs page
[0.5.0] — 2026-04-20 — Week 1: BFSI compliance MVP#
Commits: 1e699ef → 0e92e03 (security, RBAC, tiers, SSO, audit)
Added#
- Security: per-agent tool allowlists (
1e699ef),tool_denied_totalmetric surfaced at/metrics - RBAC MVP: 3 roles (admin/operator/viewer) + JWT + backward-compat API key (
817b1b8) - Deploy pipeline: real model packaging + Kubernetes manifest generation (
0037d9c) - Tier-1:
train_classifiertool + observability wiring (2e5925a) - Tier-2 — RBI FREE-AI compliance bundle: drift detector + fairness audit + SHAP explainability + model cards (
712d243) - Tier-3 — Champion-challenger MVP: model registry + shadow-traffic log + promotion gate (
307ea1c) - Audit: single-document PDF export for regulatory sign-off (
2a7956e) - SSO: OIDC authorization-code flow for Okta / Azure AD / Google (
9c6f23f)
Housekeeping#
- CrewAI / plain-LangGraph prototypes archived under
deprecated/(cb05248) - Project scaffolding, architecture docs, AI tool configs (
a9a44df)
[0.4.0] — 2026-04-13 — Phase 8: Production readiness#
Commits: 94624bd, 2b090cf, 16e0e94, 410f1c9, 1ab2524, 106c7bb, c44d4b8, 487dc3e, 389d561, 18c1d41, eb2f642, 735d2c7
Added#
- P0 security: all P0 vulnerabilities fixed
- P1 infrastructure: bounded thread pool, WebSocket bridge, rate limiting, structured logging
- P2 reliability: OpenTelemetry tracing, Prometheus metrics, cost budgeting
- P3 observability: error boundaries, guardrails, Makefile, pre-commit, loading states
- Dashboard features: chat, HITL UI, controls, persistence, training logs, 3-dot menu, artifact downloads
- Circuit breaker + 3 specialist agents (LLM, vision, repo_researcher) + dataset explorer + model playground + 8 algorithm repos
- Comprehensive README (architecture, setup, API, dashboard, tools) + service start/stop/status/logs commands
- Production readiness test suite: 115/115 pass
Fixed#
- 6 dashboard bugs: feedback crash, graph status, quality page, polling, error recovery
[0.3.0] — 2026-04-13 — Phases 5–7.5: Hardening + HITL + MCP + StateGraph#
Commits: 532a7bb, ae3781c, 1ca811a, 97a702a
Added#
- Phase 5 — Agent hardening: 37/37 operational rules, span-based observability, evaluation framework
- Phase 6 — HITL + persistence: approval gates, project memory (SQLite), org memory (PostgreSQL)
- Phase 7 — Integration: MCP client, RAG knowledge store, parallel team execution
- Phase 7.5 — Polish: StateGraph execution, conversation transparency, demo presets
[0.2.0] — 2026-04-12–13 — Phases 0–4: Framework-agnostic core + REST + dashboard#
Commits: 5cba253, 04066e7, 75d8672, 1591d5a, 5a95c60
Added#
- Phase 1: framework-agnostic core with native / LangGraph / CrewAI backends
- Phase 2: agent memory, per-agent rules, post-run feedback loop
- Phase 3: REST API (FastAPI), pipeline execution, agent inspection, WebSocket streaming
- Phase 4: algorithm repos + customer dashboard
Fixed#
- Phase 0:
INVALID_CHAT_HISTORYcrash + path-resolution bugs
[0.1.0] — 2026-04-12 — Initial swarm#
Commits: 8207424, 2c48f38
Added#
- Multi-agent swarm with 3-model vLLM Docker deployment
- ML Team Agent baseline: 32 agents, 7 teams, 23 tools
Cross-cutting state (current HEAD 4ad7a9b)#
| Dimension | Count |
|---|---|
| Agents | 40 across 7 teams (algorithm 9, data 6, deployment 5, evaluation 5, management 4, quality 5, training 6) |
| Tools | 38 callable primitives, 33 tool sets |
| Algorithm repos | 18 (tabular, vision, NLP, fine-tuning) |
| REST routers | 18+ (auth, pipelines, deployments, permissions, cron, batch, plugins, features, …) |
| Feature flags | 20+ registered, 3 tiers |
| Tests | 545 passing, 1 skipped (Docker), 0 failing |
| Commits | 73 total (Apr 12 → Apr 20) |
| Documented subsystems | 7 (two-layer READMEs) + MASTER_README + advisory CI |
Next steps for versioning hygiene#
- Bump
ml_team/pyproject.toml+ml_team/dashboard/package.jsonto0.10.2— both are still0.1.0. - Annotate git tags retroactively —
git tag -a v0.5.0 0e92e03 -m "Week 1: BFSI compliance MVP"throughv0.10.2 4ad7a9b. Signed tags if you maintain a signing key. - Adopt semver going forward. Customer-facing API changes bump minor; bugfixes patch. 1.0.0 when the first BFSI pilot signs off.
- Add a PR checklist item — "Did you update CHANGELOG.md under [Unreleased]?" — so this file stops being my job to reconstruct.